C++ Development Tutorial 2: Compile Multiple Files (1) — Compiling Process Basics
In the last tutorial, we compiled a simple single source file C++ program. However, most programs consist of multiple source files. It’s also good practice to separate source code into multiple files for better organization. From now on, we are starting to handle multiple file compilation and all the common pitfalls around it.
Specifically, this one focuses on the compiling process. A C++ compilation process can be divided into 3 stages:
In each stage, we only pick the part relevant to compiling multiple files. Let’s go through them one by one.
During preprocessing, the preprocessor will deal with file inclusion that looks like
in source code. It finds all these patterns and copies the file included in angle brackets or double quotes to where #include line is specified in the source file. angle brackets are used for system headers like iostream, c++ standard libraries (vector, queue, map), etc. Double quotes are used for user-defined headers. Let’s check the following example:
We have 2 files here: main.cpp and vars.h. vars.h is included in main.cpp. After preprocessing, main.cpp becomes:
Notice it’s just a “copy and replace” process.
Additionally, If you are interested, here is the command to manually just run preprocessing (-E). Usually, developers don’t have to run this.
You can see the output is a little more complex than just replacement. It also contained the original source code and file name for debugging purposes. This is to make sure debuggers can backtrace to the original source code.
To wrap up, preprocessor replaces line “#include ” with the file it points to.
In the second step, the source file with its included header files will be translated to an object file. An object file is a file containing relocatable format machine code which means it can be later used together with other object files (relocatable). You can use “g++ -c ” to perform preprocessing + compiling in one step.
“-c” here means “preprocessing + compilation, no linking”.
This compile stage contains several sub-stages and can be a topic for a whole course. Here we only focus on one part: symbol declaration and definition. This is an area developers often get confused and make mistakes when dealing with multiple source files.
Definition vs Declaration
A declaration needs to contain the name and type of the entity. A definition provides all details it needs to construct/use such an entity. For example, a function definition should contain its body, a class definition needs to provide all member functions and variables. Definition can be provided together with the declaration.
Let’s look at some examples for declaration:
The rule we must know is:
C++ requires all symbols to be declared at the moment they are used/referenced in source code.
This is also true in almost every programming language because it can help the compiler identify what the symbol is and does not have to search through every possible place. Particularly in C++, the declaration must appear 1. in the same source file before its usage or 2. in other files and included in the same source before its usage. The second part shouldn’t be surprising to you if you read the preprocessing part.
The following 2 examples are legal:
The function “bar” is declared (and defined) before “foo”.
The function “bar” is declared (and defined) in util.h and is included before “foo”.
The next is an illegal example since the declaration of “bar” appears after “foo”.
However, this code works:
At line 1, we only provide the declaration for the function “bar”, that’s enough for the following “foo”() function. We gave the definition later in line 7. This is perfectly legal and does not contradict the statement before. At this moment, you may have a question:
Why separate definition and declaration? Why not always provide definitions when declaring something to make things easy?
Consider the following use case: you have 2 functions and each function calls the other one. According to the C++ rule. The first function must be presented before the second one since the second one uses the first one. This is also true for the second function, it must appear before the first one. How is this achievable? Well, separation of declaration and definition. Check the following example:
Here we have a (very ineffectively implemented) program to check if an integer is odd or even. Unless the input argument is 0, both functions will recursively call each other. We put declarations of both functions at the top so that compilers won’t complain. If you try commenting out the first 2 lines. An error will be shown:
The compiler can’t find is_odd’s definition before it’s used.
A similar scenario happens when you have 2 classes and both reference each other:
The code contains 2 classes House and Owner. Each provides an API to set its house/owner. To make sure C++ can parse the file correctly, we have to forward declare classes before they are used in the top 2 lines. If you delete the first 2 lines, you get an error:
This programming scheme is called “forward declaration”. It is used a lot in real cases, When you see it, you should be able to identify them and not be confused.
Now you know the differences between definition and declaration and some common mistakes and practice to avoid them — forward declaration. At this point, you can probably make sure every single file can be compiled successfully and produced an object file.
Linking is the process of generating executables from multiple object files. During linking, all symbols must find their definition. Otherwise, the linker will complain and issue an error.
We can associate it with previous stages. Preprocessing and compilation translate source files into object files which may have missing parts such as the definition of functions/classes. Linker links object files together to create an executable and fill in the missing part. Let’s look at an example to compile multiple source files including linking:
In the first 2 lines, we compile main.cpp and util.cpp to the object file. The last line is linking, g++ automatically recognizes input files’ format and performs a linking operation to generate an executable file.
Like compilation, the linking also stage has “common errors”. To avoid it, it’s time to introduce “one definition rule”.
One Definition Rule
This rule is commonly referred to as ODR (one definition rule) in C++ standard:
Any unit, template, type, function, or object can have no more than one definition in the program.
The first thing you notice is it didn’t mention declarations at all. Yes, a symbol can be declared several times!
The above code is legal C++ code, you can clearly tell “foo” has been declared twice. This is also true if you have the same declaration statement in different files. The previous one is an example, after preprocessing and compilation, both main.o and util.o contain a declaration of “foo”.
You may have noticed this is needed for multiple files to share headers and be linked together. Otherwise, 2 files can’t include the same file.
But, there can’t be more than one definition for every symbol. You can get a compile stage error or link stage error depends on where you put the redundant definition.
Example: 2 definitions in the same file
Example: definitions in different files
In this example, “foo” is defined in 2 files. You get a link-time error.
Next time if you meet a similar error message, check if your program violates ODR rule.
In this tutorial, we start to compile multiple source files using g++.
We learned C++ compiler works in 3 stages:
- Preprocessing copy header files included into source files.
- Compiling translates preprocessed source files into object files.
- Linking links all objects files together to generate an executable.
You also paid attention to 2 common errors: 1. reference a symbol without declaration and 2. violating the “one definition rule”. Surely you will try to avoid them in your project. In the next tutorial, we will look into more scenarios and more complicated pitfalls that harm C++ developers a lot.