Do you know what happen exactly when you use a C compiler like gcc ?
I recently started training to change careers and become a developer. This training involves learning to code, of course, but also to communicate with other people and, in particular, to write technical documentation. With this in mind, I'm writing a popularization article. This article must be designed to be understandable even by a beginner.
That's why, although we'll be tackling specific cases and fairly advanced technical concepts, I'm not going to go into all the details and will take a few shortcuts. The aim is to get straight to the point and concentrate on what's really important to understand.
Socrates is quoted as saying:
“The beginning of wisdom is the definition of terms.”
So let's tackle a few definitions first, so that we understand what we're getting into:
GGC stands for GNU Compiler Collection. It's a set of compilers for compiling various programming languages such as C, C++ and Java, to name but a few (in this article, we'll be focusing mainly on the C language). This definition, though useful, obliges us to look at two other definitions. The first is that of compiler and the second is that of the C language.
The C language was created in 1972 by Dennis Ritchie and Brian Kernighan. This language made it possible to write the UNIX operating system (no less!).
In fact, the late 60s marked the beginning of computing, and operating systems began to be written. To write these systems, instructions are written and sent to computer processors. These instructions are written in a language that the chips can understand: assembler (remember this, we'll go over it later).
But this language isn't very practical to use, firstly because its syntax is fairly abstract and far removed from human language; and secondly because each processor speaks its own language, and you have to adapt this assembler code as soon as you want to make it work on another machine.
Here's an example between C and assembly :
This is why languages like C were created. Not only to have a syntax that more closely resembles human language, but also to be able to port the code we write. In fact, we can convert the same code written in C for a machine A that does not use the same processor as machine B. This is what we call a compiler, and the transition is just right!
A compiler is a program that translates source code (in this case C code) into a machine-executable language.
As we saw earlier, we now understand how a compiler works, and we're going to take a closer look at a particular compiler: GCC (or gcc in the rest of the article, as the commands are often written in lower case).
You'd think that you'd give a file written in C as input (these are files with the .c extension) and it would output executable code.
But in reality, there's a lot going on in the background... Let me show you what's going on under the hood!
Here's everything that's going on under the hood of gcc, and everything it's doing in the background that we don't even know about! Let's look at it step by step...
Let's imagine we want to compile a file I call main.c. Here's the code :
#include <stdio.h>
int main(void)
{
printf("Hello, World !\n");
return (0);
}
This is what we call the source code, i.e. all the instructions needed for our program to execute and do what we want. Here, I want to display the words “Hello World !” on the screen (why is it always Hello World?). To compile my file, I need to type the following command:
gcc main.c
(Of course, you'll need to adapt “main.c” to your file name...This applies to all the times you'll see “main.c” in the rest of this article.)
But what happens next?
Before we look at what happens to our main.c, we also need to know that we have files called headers.
These files will indicate all the functions we want to make available to other files. It's like indexing them to know at a glance the ingredients we'll need for our code, without having to go through the whole code to find out what we need. They also make it possible to structure and share definitions between several source files, and to facilitate the management of large-scale projects by avoiding repeating things several times in several files. To keep things simple, I haven't included it in today's example, but this type of file does exist.
Preprocessing is the very first step performed by the compiler.
It's a kind of preparation for the rest of the tasks. Typically, in the previous example, we see that we're using #include . In other words, we want to use an existing function library. The preprocessor will then fetch the contents of the
If you don't ask gcc to do anything in particular, these tasks are performed invisibly. However, if you want to see what happens at the end of this preprocessing stage, simply type the following code to generate these preprocessed files in .i format and stop compilation after this task.
gcc -E main.c
Next comes the compiler. The compiler takes the source code (after it has been cleaned up by the preprocessor) and translates it into assembly language, which is a textual representation of what the future machine code will be. It also checks that there are no errors in the code (syntax errors, or misused functions). If any errors are detected, gcc reports them to us so we can correct them. Isn't that nice?
Once again, the generation and use of these .s files is hidden, and if you want to display them, type :
gcc -S main.c.
This brings us to the assembler stage. This takes the assembly files seen in step 2 and converts them into machine code (i.e. binary code) in a format that the processor can execute.The files thus created are object files ending in .o
As you'd expect, these files are generated and used invisibly, and if you want to display them, use the command :
gcc -c main.c
If you've been following along, we mentioned assembly language at the beginning of this article. Well, before the invention of C, developers had to code directly in assembler, and this is exactly the type of file that the Assembler step creates in GCC. Nothing is left to chance!
Now we come to the final step. As shown in our diagram, the assembler creates several object files .o . Although these files are in machine language, and the assembler is able to read them independently of each other, it is not able to link these .o files. That's where the linker comes in: it unites these different .o files into a single, complete executable file. It can be used to resolve references (if, for example, object files refer to functions defined in other files, such as printf in our example). It acts as a link between libraries, connecting the right functions and variables to the right places in the program.
If all went well and GCC didn't give us any errors, our command generated an a.out file (the name of the file generated by default) and we can run it to proudly display our “Hello World!”.
gcc main.c
Although it's an old language, C is still widely used today.Why is that? Firstly, because it's the “father” of all modern languages.L anguages such as Python, C++, Java, Javascript and Perl have all been influenced by this language, so knowing C allows you to acquire a certain logic and quickly pick up the habits of these more modern languages. Secondly, by using C, there are very few steps and intermediaries between code instructions and execution by the processor. This means you can keep a tight rein on memory and processor usage, as well as execution times, which is very useful for coding applications requiring high performance and basic computer software.
Thanks for reading this article to the end, and don't hesitate to comment if you have any questions or remarks.
See you soon for new adventures!
Photo by British Library on Unsplash