23 – gcc and make
December 5, 2009 – 10:17 pmPlease note… This information no longer exists at the referenced locations. This is only a copy of what was available in 2003.
Basic Linux Training™
gcc and make
John Tilp
Table of Contents
Compiling Source
GNU/Linux operating system distributions include an extensive set of programming tools and libraries collectively referred to as the ‘compiler tool chain’ and commonly referred to as simply ‘the compiler’. It is the compiler that transforms source files, files of programming statements that describe what a program should do, into instructions that the central processing unit (CPU) can execute. It is the compiler that turns the ‘open source’ of Linux into executable programs capable of running on a computer.
In some cases it is necessary for a programmer to piece together parts of a program in a low level assembly language that directly represents machine language, the native numerical instruction code of the CPU. But the vast majority of programs are composed using a high level programming language such as C, C++, or Java. The high level language is then compiled into files that can be run on the particular CPU architecture (i.e. x86, PowerPC, Sparc) of the target platform and under its operating system (i.e. Linux, FreeBSD, SunOS.) A large portion of Linux is written in C, a widely used compiled high level language.
All high level languages describe the operations possible by a CPU in a humanly readable abstracted fashion. The compiler translates this abstract source code into assembly language, which in turn is translated into machine code. Each CPU architecture has its own machine language and in turn its own assembly language, but a high level language can, in many cases, be used to write a program that is transportable across CPUs and platforms. It is this innate quality that makes distributing Linux on a variety of platforms feasible.
Not all programs are compiled into executable files. Some programs, also called scripts, are interpreted, a process where each program statement is translated and executed on the fly. The principle is still the same. The script command, common across platforms, is processed into machine code for the specific platform and then immediately run. In Linux, the shell is the most familiar example of an interpreter. Other scripting languages include Perl, Python, Ruby, AWK, and TCL. Each language has its particular strengths and optimal applications.
The simplest, and usually most foolproof way to install new software on your system is by using the distribution’s package manager. Each distribution has one of several ways of doing this, but all of them will ideally resolve, or at least make you aware of, dependencies and compatibility issues.
There are times however, when no package is available or acceptable. You may want to create your own custom compiled program by writing source code from scratch or compile your own customized version from existing source, as when recompiling the kernel. In either case, you can use the Gnu Compiler Collection (gcc ) to transform the source into object modules, libraries and executable programs.
gcc
Originally called the Gnu C Compiler, gcc is now a multi-language, highly configurable, extremely flexible, state of the art compiler package. When you invoke gcc, the suffix of the files that are passed in, such as .c for C source, .h for headers (additional declarations), and .S for assembly files, is often enough to tell the compiler package which tools to use. For some hands on experience, find a C tutorial, create a file with the text for a ‘hello world’ program, save it as hello.c, type gcc hello.c and you’ll be on your way.
For the C language gcc integrates a front end that combines the included files such as the headers and begins the translation process. After a few intermediate stages, the compiler’s back end produces assembly instructions specifically for the target platform. An assembler turns those assembly instructions into machine code object modules, and finally a linker and archiver combine the results into an executable or library file.
glibc
C programming is broken up into logical chunks of code called functions. Functions that are frequently used can be predefined and compiled into libraries. Among the scores available on the system, the Gnu libc library is an integral part of the C language, the tool chain, and linked to gcc itself. Linking resolves the call for a function and where that function can be found in a library. Libraries can be linked to a program dynamically or statically, determining when they are loaded and how others can share them. Static libraries end with .a (archive), the .so suffix denotes dynamic (shared object) libraries.
The format of the output files produced by gcc is the ‘Executable and Linkable Format’ (ELF) even though gcc‘s default output file name is a.out, a holdover from the original executable format of Linux. By the way, with the right kernel module and libraries you can still run files of that format. In the gcc command line you can specify your own output file name with the -o option. If no errors were encountered, you can run the resulting executable binary file by typing ./a.out (you need ./ before the name of an executable if its path is not in your environment’s PATH variable.)
binutils and gdb
The binary utilities package binutils includes the assembler (as), linker (ld) and archiver (ar), along with other necessary tools used by gcc. NOTE: ld should not be confused with the file links of ln discussed in the lessons on commands and file systems. Another closely related package is gdb, a debugger that enables you to troubleshoot your program, right down to stepping through the code file one instruction at a time and examining how it has affected different parts of the machine. In the case of a core dump, programmers can use it to analyze the core file. With gdb you can also examine the assembly and even the hex code generated from compiling your source. Machine language numerical code, data, and memory addresses are usually displayed in hex (hexadecimal or base 16) notation instead of decimal (base 10) because it is a more convenient way to express numbers in the binary (base 2) environment of the computer.
make
The steps that were followed to build an executable from source code were fairly straightforward and, for the most part, taken care of by gcc. To recap, after writing the source code and invoking gcc with the desired options, barring any errors, gcc produces an executable file. But as a program grows in functionality, it also grows in complexity. Parts of the program may be combined into libraries that are in turn used by other parts. There are even more levels added when portability to additional architectures is supported. A project can soon incorporate dozens, even hundreds of files, involving a maze of inter-dependencies to compile correctly.
The make tool can be used to automate the steps in the build process, speeding up compilation time and reducing the chances for errors. The programmer, or development team, can incrementally work on different sections of code. Rules are then written in make‘s source file describing how and what is needed to build each file. When make is invoked it will read the script, by default the file named Makefile, and will rebuild only the changed sections and their dependencies. Rules for make can also remove the results of a previous build as in make clean or place the finished files in their proper locations as with make install.
configure
To accommodate the differences encountered in building on varied Linux versions and architectures, a way to automatically set up the proper Makefiles is necessary. This is where a configure script comes into play. In order to prepare the source tree to be built for a particular system by make a configure script is first run ( ./configure and sometimes called with option arguments) to set up the build environment. The configure script can also confirm that the tools in your tool chain will properly produce what is required to recreate the program.
For configuration in the X Window System, an imake script called Imakefile and xmkmf produce the necessary Makefile. Sometimes the entire ./configure, make all, make install process is combined into one script aptly named INSTALL. The ‘Golden Rule’ is to read all the README files and look over the scripts before you begin. If the install phase of the build is going to put files into directories outside of your home directory, you may need to have additional permissions. You can always install versions of tools into your space and have them be your default versions by prepending their path to your PATH environment variable.
IDEs
It is possible to go from an idea to an ELF with not much more than a shell to invoke gcc, but there are a few other tools that can help make programming more manageable. The emacs editor offers a number of integrated features to aid the programmer, and many editors now offer syntax coloring that can help you spot errors like misspelled keywords, an unclosed comment, or unbalanced parentheses and braces.
An integrated development environment (IDE) available in the X Windows KDE/Qt arena is KDevelop, a graphical IDE incorporating gcc and make. IDE projects for GTK/GNOME include Glade and Anjunta, and ddd is an X Windows GUI for gdb. Tools such as rcs and cvs offer revision control and concurrent access to keep track of changes, with cervesia as an X graphical interface for cvs. There are even Rapid Application Development (RAD) tools such as VDKBuilder and the commercial Borland Kylix, and other efforts at combining graphics with Perl and Python to create ‘click and drag’ programming environments.
Versions
You can determine the versions of your tools by using the argument –version. Don’t be surprised when you find that as and ld have a different version than gcc, remember they are two different packages. And if you choose to rebuild your kernel, be sure to check the compiler tool chain version requirements, they may be different than the ones that came with the distribution. If you do need another version, you can build it using the current one. Just be sure to give it a new prefix when configuring so that it doesn’t install over your current one.
Assignments
Terms and Concepts:
Define and add these to your glossary:
- .c
- .h
- .o
- ./configure
- as
- cc
- diff
- g++
- gdb
- gcc
- ld
- ldd
- library
- nm
- patch
- readelf
- syntax
- strace
- objdump
- xxd
LINKS
http://gcc.gnu.org/
http://www.tldp.org/HOWTO/GCC-HOWTO/index.html
http://sources.redhat.com/binutils/
http://www.gnu.org/manual/make-3.79.1/
http://linuxassembly.org/resources.html
http://www.info.ucl.ac.be/ingidocs/tutoriels/cornell/gdb/
http://www.gnu.org/manual/diffutils-2.8/html_node/diff.html
http://dmoz.org/Computers/Programming/Languages/C/Tutorials/
Copyright © 1997-2003 Henry White Copyright © 2003 John Tilp . All Rights Reserved.
Reproduction or redistribution without prior written consent is strictly prohibited. Address comments and inquiries to info@basiclinux.net
Sorry, comments for this entry are closed at this time.