Jump to content

Draft:Modules (C++)

From Wikipedia, the free encyclopedia

Modules in C++ are an implementation of modular programming added in C++20 as a modern alternative to precompiled headers.[1] A module in C++ comprises a single translation unit. Like header files and implementation files, a module can contain declarations and definitions, but differ from precompiled headers in that they do not require the preprocessor directive #include, but rather are accessed using the word import. A module must be declared using the word module to indicate that the translation unit is a module. A module, once compiled, is stored as a .pcm (precompiled module) file which acts very similar to a .pch (precompiled header) file.

Main uses

[edit]

Modules provide the benefits of precompiled headers in that they compile much faster than traditional headers which are #included and are processed much faster during the linking phase,[2] but also greatly reduce boilerplate code, allowing code to be implemented in a single file, rather than being separated across an header file and source implementation file which was typical prior to the introduction of modules (however, this separation of "interface file" and "implementation file" is still possible with modules, but less common due to the increased boilerplate). Furthermore, modules eliminate the necessity to use #include guards or #pragma once, as modules do not directly modify the source code, unlike #includes, which during the preprocessing step must include source code from the specified header. Thus, importing a module is not handled by the preprocessor, but is rather handled during the compilation phase. Modules, unlike headers, do not have to be processed multiple times during compilation.[2] However, similar to headers, any change in a module necessitates the recompilation of not only the module itself but also all its dependencies — and the dependencies of those dependencies, et cetera. This being said, much like what was the case with headers, C++ modules do not permit circular dependencies, and a circular/cyclic dependency will fail to compile.

C++ modules most commonly have the extension .cppm (primarily common within Clang and GCC toolchains), though some alternative extensions include .ixx and .mxx (more common in Microsoft/MSVC toolchains).[3] All symbols within a module that the programmer wishes to be accessible outside of the module must be marked export.

Modules do not allow for granular imports of specific namespaces, classes, or symbols within a module, unlike Java or Rust which do allow for the aforementioned.[a] Importing a module imports all symbols marked with export, making it akin to a wildcard import in Java or Rust. Importing links the file and makes all exported symbols accessible to the importing translation unit, and thus if a module is never imported, it will never be linked.

Modules may not export or leak macros, and because of this the order of modules does not matter (however convention is typically to begin with standard library imports, then all project imports, then external dependency imports in alphabetical order).[2] If a module must re-export an imported module, it can do so using export import, meaning that the module is first imported and then exported out of the importing module, transitively importing modules which the imported module imports itself.[1] Also, because using statements will not be included into importing files (unless explicitly marked export), it is much less likely that using a using statement to bring symbols into the global namespace will cause name clashes within module translation units. C++ modules do not enforce any notion of namespaces, but it is not uncommon to see projects manually associate modules to namespaces.

Though standard C has not included modules into the standard, dialects of C allow for modules, for example Clang supports modules for the C language,[4] though the syntax and semantics of Clang C modules differ from C++ modules significantly.

Currently, only GCC, Clang, and MSVC offer support for C++ modules.[5] It is the same set of compilers that also currently support standard library modules.[6]

Standard library modules

[edit]

Since C++23, the C++ standard library has been exported as a module as well, though as of currently it must be imported in its entirety (using import std;). The C++ standards offer two standard library modules:

Name Description
std Exports all declarations in namespace std and global storage allocation and deallocation functions that are provided by the importable C++ library headers including C library facilities (although declared in standard namespace).
std.compat Exports the same declarations as the named module std, and additionally exports functions in global namespace in C library facilities. It thus contains "compat" in the name, meaning compatibility with C.

However, this may change in the future, with proposals to separate the standard library into more modules such as std.core, std.math, and std.io.[7][8] The module names std and std.* are reserved by the C++ standard,[9] however most compilers allow a flag to override this.[10]

Example

[edit]

A simple example of using C++ modules is as follows:

MyClass.cppm

export module myproject.MyClass;

import std;

using String = std::string;

export namespace myproject {

class MyClass {
private:
    int x;
    String name;
public:
    MyClass(int x, const String& name):
        x{x}, name{name} {}

    inline int getX() const noexcept {
        return x;
    }

    inline void setX(int newX) noexcept {
        x = newX;
    };

    inline String getName() const noexcept {
        return name;
    }

    inline void setName(const String& newName) noexcept {
        name = newName;
    }
};

}

Main.cpp

import std;

import myproject.MyClass;

using myproject::MyClass;

int main(int argc, char* argv[]) {
    MyClass me(10, "MyName");
    me.setX(15);

    std::println("Hello, {0}! {0} contains value {1}.", me.getName(), me.getX());
}

Anatomy

[edit]

Module partitions and hierarchy

[edit]

Modules may have partitions, which separate the implementation of the module across several files. Module partitions are declared using the syntax A:B, meaning the module A has the partition B. Module partitions cannot individually be imported outside of the module that owns the partition itself, meaning that anyone who desires to access code that is part of a module partition must import the entire module that owns the partition.[1]

To link the module partition B back to the owning module A, write import :B; inside the file containing the declaration of module A or any other module partition of A (say A:C). These import statements may themselves be exported by the owning module, even if the partition itself cannot be imported directly – thus, to import code belonging to partition B that is re-exported by A, one simply has to write import A;.

Other than partitions, C++ modules do not have a hierarchical system, but typically use a hierarchical naming convention, like Java's packages[b]. In other words, C++ does not have "submodules", meaning the . symbol which may be included in a module name bears no syntactic meaning and is used only to suggest the association of a module.[1] Meanwhile, only alphanumeric characters plus the period can be used in module names, and so the character * cannot be used in a module name (which otherwise would have been used to denote a wildcard import, like in Java). In C++, the name of a module is not tied to the name of its file or the module's location, unlike Java in which the name of a file must match the name of the public class it declares if any, and the package it belongs to must match the path it is located in. For example, the modules A and A.B in theory are disjoint modules and need not necessarily have any relation, however such a naming scheme is often employed to suggest that the module A.B is somehow related or otherwise associated with the module A.

The naming scheme of a C++ module is inherently hierarchical, and the C++ standard recommends re-exporting "sub-modules" belonging to the same public API (i.e. module alpha.beta.gamma should be re-exported by alpha.beta, etc.), even though dots in module names do not enforce any hierarchy. The C++ standard recommends lower-case ASCII module names (without hyphens or underscores), even though there is technically no restriction in such names. Also, because modules cannot be re-aliased or renamed (short of re-exporting all symbols in another module), the C++ standards recommends organisations/projects to prefix module names with organisation/project names for both clarity and to prevent naming clashes (i.e. google.abseil instead of abseil). Also, unlike Java, whose packages may typically include a TLD to avoid namespace clashes, C++ modules do not have this convention. Thus, it may be more common to see example.myfunctionality.MyModule than com.example.myfunctionality.MyModule, though both names are valid. Similar to Java, some organisations of code will split modules into exporting one class/struct/namespace each, and name the final word in PascalCase to match the name of the exported class, while modules re-exporting multiple "sub-modules" may be in all-lowercase.

Module purview and global module fragment

[edit]

Everything above the line export module myproject.MyClass; in the file MyClass.cppm is referred to as what is "outside the module purview", meaning what is outside of the scope of the module. Typically, if headers must be included, all #includes are placed outside the module purview between a line containing only the statement module; and the declaration of export module, like so:

module; // Optional, marks the beginning of the global module fragment (mandatory if an include directive is invoked above the export module declaration)

#include <print>

#include "MyHeader.h"

export module myproject.MyModule; // Mandatory, marks the beginning of the module preamble

module: private; // Optional, marks the beginning of the private module fragment

The file containing main() may declare a module, but typically it does not (as it is unusual to export main() as it is typically only used as an entry point to the program, and thus the file is usually a .cpp file and not a .cppm file). A program is ill-formed if it exports main() (causes undefined behaviour), but will not necessarily be rejected by the compiler.

All code which does not belong to any module belongs to the so-called "unnamed module" (also known as the global module fragment), and thus cannot be imported by any module.

Private module fragment

[edit]

A module may declare a "private module fragment" by writing module: private;, in which all declarations or definitions after the line are visible only from within the file, and thus inaccessible from importers of the module. Any module unit that contains a private module fragment must be the only module unit of its module.

Header units

[edit]

Headers may also be imported using import, even if they are not declared as modules – these are called "header units", and they are designed to allow existing codebases to migrate from headers to modules more gradually.[11] The syntax is similar to including a header, with the difference being that #include is replaced with import and a semicolon is placed at the end of the statement. Header units automatically export all symbols, and differ from proper modules in that they allow the emittance of macros, meaning all who import the header unit will obtain its contained macros. This offers minimal breakage between migration to modules. The semantics of searching for the file depending on whether quotation marks or angle brackets are used apply here as well. For instance, one may write import <string>; to import the <string> header, or import "MyHeader.h"; to import the file "MyHeader.h" as a header unit. Most build systems, such as CMake, do not support this feature yet.

See also

[edit]

Notes

[edit]
  1. ^ Importing (import in Java and use in Rust) in Java and Rust differs from C++. In the former, an import simply aliases the type or de-qualifies a namespace (similar to using in C++) as a convenience feature, because Java loads .class files dynamically as necessary and Rust automatically links all modules/crates, thus making all types available simply by fully quantifying all namespaces. However, in C++ modules are not automatically all linked, and thus they must be manually "imported" to be made accessible, as strictly speaking import links the file at compilation. This is further due to the fact that C++ does not define namespaces directly by modules. Thus, it is probably more appropriate to compare import in C++ to import on Python, which tells the interpreter to load the contents of a module into their own namespace.
  2. ^ It is more appropriate to compare packages in Java and modules in C++, rather than modules in Java and modules in C++. Modules in C++ and Java differ in meaning. In Java, a module (which is handled by the Java Platform Module System) is used to group several packages together, while in C++ a module is a translation unit, strictly speaking.

References

[edit]
  1. ^ a b c d cppreference.com (2025). "Modules (since C++20)". Retrieved 2025-02-20.
  2. ^ a b c "Compare header units, modules, and precompiled headers". Microsoft. 12 February 2022.
  3. ^ "Overview of modules in C++". Microsoft. 24 April 2023.
  4. ^ "Modules". clang.llvm.org.
  5. ^ "Compiler support for C++20". cppreference.com.
  6. ^ "Compiler support for C++23". cppreference.com.
  7. ^ C++ Standards Committee. (2018). P0581R1 - Modules for C++. Retrieved from https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0581r1.pdf
  8. ^ C++ Standards Committee. (2021). P2412R0 - Further refinements to the C++ Modules Design. Retrieved from https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2412r0.pdf
  9. ^ cppreference.com (2025). "C++ Standard Library". Retrieved 2025-02-20.
  10. ^ "Standard C++ modules".
  11. ^ "Walkthrough: Build and import header units in Microsoft Visual C++". Microsoft. 12 April 2022.