C++20 is underway, and it will bring a lot of big new features along with it. We have the ranges library, coroutines, concepts, and a new string formatting library. But perhaps the feature with the biggest potential impact on our code structure and architecture is that of modules.
C++ modules promise to lessen the need for header files, and they are part of the continuous move to reduce — and eventually remove — the preprocessor. In this blog post, we will explore what C++ modules are, discussing their potential advantages and drawbacks, and provide an example of how they are used.
A Bit of History
The C++ language was built on top of C, and as such, it inherited a lot of C language features — most notably the preprocessor and the concept of the header and source file split. Preprocessor directives are a set of instructions that are executed by the compiler on each translation unit before compilation starts. You can identify them because any line of code starting with the #
character is one.
C++ modules are an attempt to reduce the need for one particular preprocessor directive, #include
. #include
allows us to split our source code into logical parts — in particular, an interface (usually located in an “.h” or “header” file), and an implementation (usually located in a “.cpp” or “source” file). The header and source file split provides a huge amount of benefits, including:
-
Building reusable libraries of code, which are a cornerstone of C and C++
-
Separating the interface and its implementation
-
Modularizing code, which potentially speeds up compilation time (when used correctly)
-
Keeping our code organized into logical and reusable parts
So what are the drawbacks of using header files, and how do we retain the above advantages with modules? Let’s take a closer look.
What’s Wrong with #include?
Using the preprocessor to include a header file is basically a neat way of copying everything from one file and pasting it in another; this is a text-processing hack, to put it bluntly. This comes with some issues we as programmers need to be aware of, and it ultimately gives us more work to do.
For example, we need to make sure to not include a header file twice out of the risk of redefining code. We get around this by wrapping our headers with rather crude #ifndef FILE_ID
guards (or the less standard, but also less prone to error #pragma once
). We also run the risk of declaring a macro somewhere in our code before we include a header, which may inadvertently affect the header file we import. Or, we might duplicate something that was previously defined inside an included header. This can often lead to some very nasty bugs and unintentional behavior. See the following example:
// A preprocessor macro defined above an include directive. #define INNOCUOUS_DEFINE 1 // It’s difficult to know whether or not `INNOCUOUS_DEFINE` is changing behavior // in the following header file or in another header nested inside. // Sometimes this is done on purpose, but it highlights poor architecture and can be misleading. #include "Mysterious.h" #ifdef INNOCUOUS_DEFINE // Do innocuous things. #endif
Another issue with header files is the risk of creating long compilation times, especially for complex templated code such as STL containers. STL containers are often abundantly used, and the compiler has to do a lot of work parsing header files each time they are included. We will see in the next section how modules can help with this process.
What Are C++ Modules and How Might They Improve Our Code?
As discussed in one of our previous blog posts, which covered effective header file management, header files can be easily misused and overused. We need to be aware of techniques to make sure we don’t use them incorrectly, whereas the isolated and modular nature of modules helps the compiler enforce better architecture and provides semantics to only expose what is required.
Modules are an attempt to group code together into compiled binaries, or “modules,” that expose types names, functions, etc. for use in the code that imports them. When code is compiled, something called an abstract syntax tree (AST) is produced as a step in the process. An AST is a serialized and lossless representation of all the function names, classes, templates, etc. of the code. The compiler uses this information to create the final binary module.
This is no different than what happens with header files, but the generation of the AST is a rather slow process, and the slowness is magnified for every time a header file is included. With modules, the compiler only has to perform this step once, no matter how many times the module is imported. Precompiled headers (PCH) provide the same benefit, but modules package everything into an easy-to-use and standardized language feature. More information on Clang’s PCH implementation can be read in the company documentation.
Anything inside a module that is not explicitly exposed with export
will not be available on import
. Modules are compiled in a sandboxed state and will not be altered, no matter what the importing translation unit defines. Additionally, the importing translation unit will have no side effects from anything defined inside the module when it’s imported.
“But what about all my existing code structure and file management?” I hear you ask. Fear not! Modules can be used alongside header files, so there’s no risk of throwing away your current methodology or code structure in favor of modules; they can be gradually introduced one by one.
Modules are essentially a new way to split up code by replacing #include
with import
, while also allowing us to remove the split between interfaces and implementations, thus potentially halving our number of files. Now we can put everything in the same file and only expose to the outside world what we explicitly mark as exported. This can be beneficial in certain cases where the file split is only there to speed up compilation time and doesn’t actually help with the understandability of the code. It is worth noting that the interface/implementation file split is still possible with modules.
Using the import
and export
keywords, our code becomes much more expressive, giving the reader clear semantics about what is publicly useable and what is not. Multiple modules can also be packaged up together in one bundle, which ultimately makes the life of the end user easier because they only have to deal with one logically contained package.
How Do We Use Modules?
In this basic example, we define a named module using export module
, along with a function that we want to expose to the user with export
:
// NumberCruncher.cppm -> We use a new file extension, `cppm`, to distinguish modules from other source code. // This line defines the module name and allows it to be imported. export module NumberCruncher; // Other modules can be imported from within a module. // Here we import an imaginary logging module written elsewhere that exports `logger::info`. import logger; // Any macros defined inside a module are not exposed to the importer. #define CRUNCH_FACTOR 3.14 // Namespaces work as normal and will need to be used by the caller. namespace numbers { // Anything not explicitly exported is internal to the module and cannot be used by the code importing it. float applyCrunchFactor(float number) { return number * CRUNCH_FACTOR; } // We can choose which functions to export using the `export` keyword. export float crunch(float number) { // Calls an internal function. auto crunched = applyCrunchFactor(number); // Use our imaginary logger. logger::info("Crunched {} with result of {}", number, crunched); return crunched; } } // namespace numbers
And we can use it in our application:
// main.cpp import NumberCruncher; // Imports our custom module. int main() { // Use it! auto value = numbers::crunch(42); }
If we want to keep an interface/implementation split, we could also split our implementation code block into another file and have a much simpler cppm
interface file:
// `NumberCruncher.cppm`. export module NumberCruncher; namespace numbers { // Crunch `number` using our magic crunch factor. export float crunch(float number); } // Namespace numbers.
Implementation:
// `NumberCruncher.cpp` // Declare the module without exporting it so the compiler can get the information it needs. module NumberCruncher; // Logger isn’t used in the interface file, so no need to import it there. import logger; #define CRUNCH_FACTOR 3.14 float numbers::applyCrunchFactor(float number) { return number * CRUNCH_FACTOR; } // No need to use the `export` keyword in the implementation; that is only for the module interface file. float numbers::crunch(float number) { auto crunched = applyCrunchFactor(number); logger::info("Crunched {} with result of {}", number, crunched); return crunched; }
To compile the above code with a Clang compiler, we need to turn on the modules feature with the -fmodules
flag and specify which version of C++ we are using with -std=c++2a
. The module will need to be precompiled using --precompile
. This produces a .pcm
file, which is used to build the final executable. Details on the integration of Clang modules can be found here.
That’s it! In this example, we can see both how expressive modules are and the potential modules have for cleaning up and organizing our C++ code.
In large existing projects, such as our Core C++ library at PSPDFKit, modules are not so simple to introduce everywhere. For this reason, it is recommended to introduce in a piecemeal manner; find some heavily used header files in your project, try converting them to modules, and see if there is a reduction in compile time or increased maintainability of code that makes the change worthwhile.
Are There Any Downsides?
There are a few notable concerns to modules that are discussed in detail in this paper, “Concerns about module toolability.”
The paper talks about the potential impact of modules on build systems and developer tools. Currently, with header files, a source file can be built in isolation from other parts of the system so long as all the header files are available. But because a module needs to be built before the code that uses it is, this can lead to concerns about being able to parallelize builds that rely on many modules. This could prove to be a difficult problem to solve on build systems that use multiple networked machines to compile a single project.
The paper also goes into detail about how modules could potentially add to the complication of extracting dependency graphs and ultimately indexing, refactoring, and performing static analysis on a project. These are issues many developers of large projects know all too well when working in certain IDEs.
There is also no official module support in the Standard Library, although Microsoft has added std.core
to Visual Studio.
The Future
The standardization of modules brings big changes to an already complex language, and as such, will take time to be a fully integrated part of existing projects.
As producers of SDKs and developer tools here at PSPDFKit, we can already see a huge benefit in implementing modules in our codebase. There’s potential to simplify our architecture and speed up our development process, and being able to package and release our own self-contained modules to customers would be advantageous, ultimately helping them develop their code safely and efficiently.
Conclusion
These are exciting times in the world of C++, with big changes on the horizon in C++20. In this blog post, we have seen how C++ modules have the potential to reduce the use of the preprocessor, which can ultimately lead to improvements to our code structure and the speed up of build times, and generally create more expressive and less error-prone code.
We have also seen there are some potential downsides with developer tooling if the feature is rushed and not implemented correctly, and how compilers such as Clang and MSVC are working on making this feature useable.