cxx-libmanR1
libman, A Dependency Manager ➔ Build System Bridge

A Collection of Interesting Ideas,

Author:
Colby Pike
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21

Abstract

An exploration on a solution to giving build systems a way to consume packages presented to it by a dependency manager. This system is called libman, short for Library Manifest.

1. Introduction and Overview

1.1. Problem Description and Scope

C++ is late to the game in terms of a unified and cross-platform solution for easily distributing native libraries.

There are “native” / “system” package managers and package formats, such as dpkg/apt, yum/rpm, pacman, and MSI that address a core problem of managing the global state of a given machine, and these packaging formats often deal with “native” compiled binaries. These solutions are not cross-platform, and even if they were, none of them are appropriate for the problem at hand. They are not focused on development and compilation within the C++ ecosystem. While other languages and ecosystems have tools such as pip, npm, cargo, mix, and even cpan, C++ has gone a long time with no widely-accepted solution.

Why is this?

There have been many attempts, and there are presently several competing alternatives for solving the package and dependency management problem. Running in parallel, and solving a (somewhat) orthogonal problem, are several competing build systems. CMake, Meson, Waf, b2, and countless others have been pining for the love and attention of the C++ community for years. All of them wildly incompatible in terms of their package consumption formats.

This situation presents a unique problem. With lack of a “reference implementation” of C++, and no singular and completely universal build tool and format, we have an N:M problem of N package and dependency managers attempting to work with M build systems.

This trouble can be broken down in two directions:

  1. How do I, the build system, inform a package creation and distribution tool how my project should be built and collected into a distributable unit?

  2. How do I, the dependency manager, inform the build system how it might consume the packages I’ve provided to it?

This paper and the libman system described will cover (2). Investigation into the inverse (yet equally important) problem (1) will not be discussed in detail, but warrants further discussion.

Note: This document will use the abbreviated term PDM to refer to "package and dependency manager" tools.

1.2. Usage Requirements

The concept of usage requirements originated from Boost’s b2 build system, and has been slowly bleeding into general acceptance via CMake. After years of experience with CMake, and as it has been developing and maturing its realization of usage requirements and the concept of the “usage interface,” it is clear that it is a fruitful path forward. As such, libman is explicitly designed around this concept.

What are “usage requirements” (also known as the “link interface” or “usage interface”)?

When we have a “target” (A thing that one wishes to build), we can say that it “uses” a library. When we “use” a library, we need to inherit certain attributes thereof which have a direct effect on the way that the final target will be built. These include, but are not limited to:

libman defines a platform-agnostic and build-system-agnostic format for describing these “usage requirements”, including how one should import dependencies transitively. Any build system which can represent the above concepts can import libman files. Any PDM which can represent the above concepts can generate libman files.

Therefore, any libman-capable build system can be used with any libman-capable package and dependency manager.

1.3. Goals and Non-Goals

The following are the explicit goals of libman and this document:

  1. Define a series of file formats which tell a build system how a library is to be "used"

  2. Define the semantics of how a build system should interact and perform name-based package and dependency lookup in a deterministic fashion with no dependence on "ambient" environment state.

  3. Define the requirements from a PDM for generating a correct and coherent set of libman files.

Perhaps just as important as the goals are the non-goals. In particular, libman does not seek to do any of the following:

  1. Define the semantics of ABI and version compatibility between libraries

  2. Facilitate dependency resolution beyond trivial name-based path lookup

  3. Define a distribution or packaging method for pre-compiled binary packages

  4. Define or aide package retrieval and extraction

  5. Define or aide source-package building

1.4. The File Format

libman specifies three classes of files:

See the respective sections on The Manifest Syntax, and the specifics on Index Files, Package Files, and Library Files.

2. The File Format

2.1. Base Syntax

All libman files are encoded in an extremely simple key-value plaintext format, which is easy to read and write for both human and machine alike. Files are encoded using UTF-8.

The syntax of the file is very simple:

# This is a comment
Some-Key: Some-Value

Keys and values in the file each appear on a different line, with the key and value being separated by a : (colon followed by a space character). Only a single space character after the colon is required. Trailing or leading whitespace from the keys and values is ignored. If a colon is followed by an immediate line-ending, end-of-file, or the right-hand of the key-value separator is only whitespace, the value for the associated key is an empty string.

The key and value together form a field.

Note: A colon is allowed to be a character in a key (but cannot be the final character).

Note: As a general rule, libman uses the hyphen - as a word separator in keys, with each word being capitalized. This matches the form of headers from HTTP and SMTP.

Unlike HTTP, libman keys are case-sensitive!

A field with a certain key might appear multiple times in the file. The semantics thereof depend on the semantics of the field and file. In general, it is meant to represent "appending" to the list of the corresponding key.

Each file in libman defines a set of acceptable fields. The appearance of unspecified fields is not allowed, and should be met with a user-visible warning (but not an error). There is an exception for keys beginning with X-, which are reserved for tool-specific extensions. The presence of an unrecognized key beginning with X- is not required to produce a warning.

Lines in which the first non-whitespace character is a # should be ignored.

“Trailing comments” are not supported. A # appearing in a key or value must be considered a part of that key or value.

Empty or only-whitespace lines are ignored.

A line-ending is not required at the end of the file.

Note: Readers are expected to accept a single line feed \n as a valid line-ending. Because trailing whitespace is stripped, a CR-LF \r\n is incidentally a valid line-ending and should result in an identical parse.

2.2. Index Files

Index files specify the names of available packages and the path to a Package File that can be used to consume them.

The index file should use the .lmi extension.

2.2.1. Fields

2.2.1.1. Type

The Type field must be specified exactly once, and should have the literal value Index.

Type: Index
2.2.1.2. Package

The Package field appears any number of times in a file, and specifies a package name and a path to a Package File on disk. If a relative path, the path resolves relative to the directory containing the index file.

The name and path are separated by a semicolon ;. Extraneous whitespace is stripped

Package: Boost; /path/to/boost.lmp
Package: Qt; /path-to/qt.lmp
Package: POCO; /path-to-poco.lmp
Package: SomethingElse; relative-path/anything.lmp

The appearance of two Package fields with the same package name is not allowed and consumers should produce an error upon encountering it.

2.2.2. Example

A simple Index file with a few packages
# This is an index file
Type: Index
# Some Packages
Package: Boost; /path/to/boost.lmp
Package: Qt; /path-to/qt.lmp
Package: POCO; /path-to-poco.lmp
Package: SomethingElse; relative-path/anything.lmp

2.3. Package Files

Package files are found via Index Files, and they specify some number of Library Files to import.

Package files should use the .lmp extension.

2.3.1. Fields

2.3.1.1. Type

The Type field must be specified exactly once, and should have the literal value Package.

Type: Package
2.3.1.2. Name

The Name field in a package file should be the name of the package, and should match the name of the package present in the index that points to the file defining the package. If Name is not present or not equal to the name provided in the index, consumers are not required to generate a warning. It’s purpose is for the querying of individual package files and for human consumption.

Name: Boost
2.3.1.3. Namespace

The Namespace field in a package file must appear exactly once. It is not required to correspond to any C++ namespace, and is purely for the namespaces of the import information for consuming tools. For example, CMake may prepend the Namespace and two colons :: to the name of imported targets generated from the libman manifests.

Namespace: Qt5

Note: The Namespace is not required to be unique between packages. Multiple packages may declare themselves to share a Namespace, such as modularized Boost packages.

2.3.1.4. Requires

The Requires field may appear multiple times, each time specifying the name of a package which is required in order to use the requiring package.

When a consumer encounters a Requires field, they should use the index file to find the package specified by the given name. If no such package is listed in the index, the consumer should generate an error.

Requires: Boost.Filesystem
Requires: Boost.Coroutine2
Requires: fmtlib

Note: The presence of Requires does not create any usage requirements on the libraries of the package. It is up to the individual libraries of the requiring package to explicitly call out their usage of libraries from other packages via their § 2.4.1.6 Uses field. This field is purely to ensure that the definitions from the other package are imported before the library files are processed.

2.3.1.5. Library

The Library field specifies the path to a library file. Each appearance of the Library field specifies another library which should be considered as part of the package.

Library: filesystem.lml
Library: system.lml
Library: coroutine2.lml

If a relative path, the file path should be resolved relative to the directory of the package file.

Note: The filename of a Library field is not significant other than in locating the library file to import.

2.3.2. Example

A Qt5 example:
# A merged Qt5
Type: Package
Name: Qt5
Namespace: Qt5

# Some things we might require
Requires: OpenSSL
Requires: Xcb

# Qt libraries
Library: Core.lml
Library: Widgets.lml
Library: Gui.lml
Library: Network.lml
Library: Quick.lml
# ... (Qt has many libraries)

2.4. Library Files

Library files are found via Package Files, and each one specifies exactly one "library" with a set of usage requirements.

Library files should use the .lml extension.

2.4.1. Fields

2.4.1.1. Type

The Type field must be specified exactly once, and should have the literal value Library.

Type: Library
2.4.1.2. Name

The Name field must appear exactly once. Consumers should qualify this name with the containing package’s Namespace field to form the identifier for the library.

Name: Boost
2.4.1.3. Path

For libraries which provide a linkable, the Path field specifies the path to a file which should be linked into executable binaries.

This field may be omitted for libraries which do not have a linkable (e.g. “header-only” libraries).

Path: lib/libboost_system-mt-d.a
2.4.1.4. Include-Path

Specifies a directory path in which the library’s headers can be found. Targets which use this library should have the named directory appended to their header search path. (e.g. using the -I or -isystem flag in GCC).

This field may appear any number of times. Each appearance will specify an additional search directory.

Relative paths should resolve relative to the directory containing the library file.

Include-Path: include/
Include-Path: src/
2.4.1.5. Preprocessor-Define

Sets a preprocessor definition that is required to use the library.

Note: This should not be seen as an endorsement of this design!

Should be either a legal C identifier, or a C identifier and substitution value separated with an =. (The syntax used by MSVC and GNU-style compiler command lines).

Preprocessor-Define: SOME_LIBRARY
Preprocessor-Define: SOME_LIBRARY_VERSION=339
2.4.1.6. Uses

Specify a transitive requirement for using the library. This must be of the format <namespace>/<library>, where <namespace> is the string used in the Namespace field of the package which defines <library>, and <library> is the Name field of the library which we intend to use transitively.

Uses: Boost/coroutine2
Uses: Boost/system

Build systems should use the Uses field to apply transitive imported library target usage requirements. “Using” targets should transitively “use” the libraries named by this field.

2.4.1.7. Special-Uses

Specifies Special Requirements for the library.

2.4.2. Example

A Catch2 base library, only declaring a directory that needs to be included. It has no Path attribute, and therefore acts as a "header-only" library.
Type: Library
Name: Catch2
Include-Path: include/

A library that builds upon the main Catch header-only library to provide a pre-compiled main() function, a common use-case with Catch

Type: Library
# The name is "main"
Name: main
# The static library file to link to
Path: lib/catch_main.a
# We build upon the Catch2/Catch2 sibling library.
Uses: Catch2/Catch2
A more concrete example of what a few Boost library files might look like
# The base "headers" library for Boost
Type: Library
Name: boost
Include-Path: include/
# Boost.System
Type: Library
Name: system
Uses: Boost/boost
Path: lib/libboost_system.a
# Boost.Asio
Type: Library
Name: asio
# Note: Does not depend on Boost/boost nor Boost/context directly. It inherits
# those transitively.
Uses: Boost/system
Uses: Boost/coroutine
# Boost.Beast
Type: Library
Name: beast
Uses: Boost/asio

3. Usage and Semantics

Although the libman files can be created and consume by human eye and hand, a typical use case will see the libman files generated by a PDM and consumed by a build system.

3.1. The Index

The purpose of the Index is to define name-based package lookup for a build system.

A PDM should generate an index where each package within the index has a uniform ABI. That is: An executable binary should be able to incorporate all compiled code from every library from every package within and index and produce no ODR nor ABI violations. A package may only appear once in an index.

Note: To service the case of build systems which support building multiple "build types" simultaneously, a PDM and build system may coordinate multiple indices, with one for each "build type" that the build system wishes to consume.

3.1.1. The libman Tree

Given a single index file, one can generate a single libman "tree" with the index at the root, packages at the next level, and libraries at the bottom level.

<index>
    <package-foo>
        <library-foo-1>
        <library-foo-2>
        <library-foo-3>
        <library-foo-4>
    <package-bar>
        <library-bar-1>
    <package-baz>
    <package-qux>
        <library-qux-1>
        <library-qux-2>
        <library-qux-3>

3.1.2. Uniqueness of Packages and Libraries

Each package must be unique in a tree. Each library will be unique given its qualified name of the form <namespace>/<library> (Where <namespace> is declared by the § 2.3.1.3 Namespace field of the package from which it was referred). The library § 2.4.1.2 Name field might not be unique. Disambiguating similarly named libraries is the purpose of the package’s Namespace, as it is unlikely (and unsupported) for a single package to declare more than one library with the name Name.

Note: Although libman uses the qualified form <namespace>/<library>, other tools may use their own format for the qualification. For example, CMake might use <namespace>::<library> to refer to the imported target or Scons may use <namespace>__<library>. It is up to the individual tool to select, implement, and document the appropriate qualification format for their users.

3.1.3. Index Location

When a build system wishes to use an index file, it should offer the user a way to specify the location explicitly. If no location is provided by a user, it should prefer the following:

In the above, <config> is a string specifying a "build type" for the build system. This is intended to facilitate build systems which are "multi-conf" aware.

3.2. Packages

Packages are defined in libman to be a collection of some number of libraries. They contain a § 2.3.1.3 Namespace field to qualify their libraries, and may declare the reliance on the presence of other libraries using § 2.3.1.4 Requires.

Note: The Requires field is not for dependency managers: It’s for build systems to know what other packages need to be imported when importing a package. Indeed, all of the information in libman is for build systems to consume, not dependency managers.

3.2.1. Where Does Namespace Come From?

In short: It comes from the upstream developer.

The Namespace should originate from the package itself, and be specified by the maintainer, not something generated by the dependency management system, nor by a third-party packager.

Placing this responsibility on the upstream developer ensures that all package maintainers end up with the same Namespace in their libman files, ensuring that the § 2.4.1.6 Uses field from libraries of other packages are able to successfully resolve.

Note: In the case that the package’s upstream developer cannot be contacted or does not voice an opinion, the appropriate Namespace should be chosen by the package maintainer carefully to create minimal confusion for package users. Package maintainers for different PDMs are encouraged to collaborate and consolidate on a single Namespace.

3.2.2. The Requires Graph

A Requires field of a package may only specify packages which are defined in the current libman tree (generated from the current index). Build systems must resolve the Requires recursively. Build systems must process the packages named by the Requires field before processing the package which namespace the requirement. The result will be a directed acyclic graph of the package dependencies.

If the Requires field names a package not contained in the current tree, build systems must generate an error. A well-formed index and libman tree should never encounter this issue, and the onus is on PDMs to generate a conforming index file. Regular user action should never create a situation where a Requires field is unsatisfied by the index from which the requiring package was found.

Requires may not form a cyclic dependency graph.

3.3. Libraries

Libraries are the main consumable for development package managers. In C++ we define a library as a set of interconnected translation units and/or #include-able code that provides some pre-packaged functionality that we wish to incorporate into our own project.

Consuming a library requires (1) being capable of using the preprocessor #include directive to incorporate the headers from the library and (2) being able to resolve entities of external linkage which are defined within the headers for that library. (Some libraries may have no entities with external linkage.)

3.3.1. Canonical #include Directives

The characters within the <> of #include <...> are of incredible importance. libman encourages libraries to define a single "canonical" #include directive for their files. A user must not have to guess which include directive is correct. To support this, libraries may declare the directory in which their "canonical include directives" may be resolved via the § 2.4.1.4 Include-Path field.

3.3.2. Recommendation: Avoid Header Mixing

Headers for libraries should avoid intermixing with the headers of other libraries, even of other libraries within the same package.

Upon declaring their intent to "use" a library, a user should be able to #include the headers of that library using the "canonical include directives" for that library.

If a user does not declare their usage of a library (either directly or indirectly from transitive § 2.4.1.6 Uses), they should not be able to #include headers from that library.

Mixing headers between libraries in a single Include-Path allows the user to make use of an entity of external linkage from a library without declaring their "usage" of that library, and therefore causes those entities to fail resolution at the link stage because the build system is unaware of their intent to use that library.

"Using" a library should cause the headers to be visible, but will also enforce that the external linkage entities are resolved.

Note: While the admonition to "avoid header mixing" is partially aimed at library developers, this admonition can apply equally to dependency managers who have the duty of placing the headers files in the filesystem at install time.

Note: Yes, this is a break from the FHS’s /usr/include and /usr/local/include directories. These have been very convenient in the past, but have proven very problematic for the case of unprivileged user development.

3.3.3. Transitive Usage with Uses

The Uses field is meant to represent transitive requirements. Libraries which build upon other libraries should declare this fact via Uses.

The syntax of a Uses entry is <namespace>/<library>, where <namespace> is the § 2.3.1.3 Namespace field of the package which owns the library, and <library> is simply the § 2.4.1.2 Name field from the library.

Build systems should translate the Uses field to an appropriate transitive dependency in the build system’s own representation. The exact spelling and implementation of this dependency is not specified here, but must meet the requirement of transitivity: If A uses B, and foo uses A directly, then foo should behave as if it uses B. foo is said to use B indirectly.

3.4. Special Requirements

Special Requirements are § 1.2 Usage Requirements that do not correspond to a library or package provided by the PDM. The semantics of a Special Requirement are platform-specific, but their intended semantics are outlined here. Special requirements may be namespace with <namespace>/<name>, but libman reserves all unqualified names. Platforms and build systems may define additional Special Requirements using qualified names.

3.4.1. Threading

Enables threading support. Some platforms require compile and/or link options to enable support for threading in the compiled binary. For example, GCC requires -pthread as a compile and link option for std::thread and several other threading primitives to operate correctly.

3.4.2. Filesystem

Enables support for C++17’s filesystem library. Some platforms require an additional support library to be linked in order to make use of the facilities of std::filesystem.

3.4.3. DynamicLinker

Enables support for runtime dynamic linking, for example using dlopen().

3.4.4. PosixRealtime

Enable support for POSIX realtime extensions. For example, required for shared memory functions on some platforms.

3.4.5. Math

Enable support for <math.h>. Some platforms provide the definitions of the math functions in a separate library that is not linked by default.

3.4.6. Sockets

Enable support for socket programming. For example, Windows requires linking in the Winsock libraries in order to make use of the Windows socket APIs.