A Critique of pyproject.toml

I’ve recently made the transition from setup.cfg to pyproject.toml across the suite of projects I maintain. It was mostly automatable, thanks in large part to the ini2toml project.

After making the transition, I have a fuller understanding of the benefits and detriments of the format, as first proposed in PEP 621, though even before the transition, I’d already identified several concerns with a declarative format in a single file.

This article outlines the concerns and challenges present with the design.

Bias toward static definition

The PEP specifically advocates statically-defined metadata as a primary motivator:

Encourage users to specify core metadata statically for speed, ease of specification, unambiguity, and deterministic consumption by build back-ends

This motivation is likely based in an assumption that a project’s metadata is inherently stable and statically declared and that a developer has sufficient bandwidth to author and maintain the metadata. And while true for managing a handful of projects, these assumptions start to fall apart when operating at larger scale.

Consider the suite of projects I maintain. The burden of maintaining these projects and their metadata becomes a real impediment. The skeleton project attempts to address this need by storing the static metadata in a separate repo and composing (merging) that repo with the essential code of each project.

The need for tools like Cookie Cutter or the skeleton demonstrates that the concerns present in the metadata go beyond those of a statically-defined standalone project.

In contrast, enterprise organizations avoid most of this statically-defined metadata, demonstrating an approach that limits friction and scales better.

Resistance to composition

The fact that pyproject.toml is a single file in the root of the repository and stored in version control means that it can’t be easily composed to address multiple concerns.

For example, PEP 517 describes how build backends are specified, PEP 621 describes how metadata is indicated, and other projects use pyproject.toml for other purposes such as affecting test behavior.

Some projects have even gone so far as to require configuration of separate concerns (linting, metadata) to be in a single file or to be specified redundantly in multiple files.

Ambiguity of order

Since the only method of composition is to have a single, monolithic file, it also means that the order of contents is ambiguous. A file with [tool.setuptools_scm] followed by [tool.pytest-enabler.black] is functionally identical to one with those sections reversed. The same is true for the fields within a section. As a result, there is ample opportunity for these files to have subtle and inconsequential variance that lead to excessive merge conflicts when using line-based version control operations and lead to arbitrary ordering based on the whims of a particular developer.

Unbounded scope

And while the arbitrary ordering of metadata is a small concern, as more and more tools support and even recommend use of pyproject.toml for configuration, this one file becomes increasingly complex and buries essential signals.

Consider, for example, the towncrier project config. In addition to specifying its build-system and project* attributes, it also declares configuration for hatch, towncrier, black, isort, ruff, mypy, and coverage. It’s difficult to see at a high level what tools are used by the project. If this project used tox instead of nox, the tox config might well have also been folded into the pyproject.toml, making it difficult to determine what test orchestrator is available.

This unbounded scope increases the complexity of this file and compounds problems like the ambiguity of order.

If the Python metadata supported a model like Unix’s .d, it would at least allow for better composition, separation of concerns, and reduced ambiguity.

Overlap with other metadata specs

While PEP 621 describes how to declare the metadata, it’s still the responsibility of a backend to read that metadata and then publish it following other specifications. It feels a little bit like xkcd 927 (Standards).

Downstream tools overly reliant on its presence

Due to its standardization and proliferation, tools are beginning to expect metadata to be present in a pyproject.toml, even when such metadata may be dynamic or pyproject.toml may not be desirable at all. There already exist standards (i.e. PEP 517) describing how for a tool to load metadata for a project (built or unbuilt).

As a result, these downstream builders and tools are expecting to be able to load metadata from pyproject.toml even when it’s not a relaible source of it and the only reliable way to get it is from the backend-generated metadata.

Inelegant syntax

At the risk of nitpicking, the toml syntax is inelegnt, especially when compared to something like setup.cfg. Consider the tempora migration as an example.

Each an every field now requires quotes around the value and fields with multiple values (like Classifiers) require commas separating (or trailing commas for nicer diffs).

The config went from 999 characters in setup.cfg to 1095 characters in pyproject.toml. And while the bytes are cheap, the visual and mechanical aesthetic suffer. Moreover, syntax allows for even more ambiguity now that quoting is a concern. Either of the following dependencies is valid and functionally identical:

	"importlib_resources; python_version < '3.12'",

	'importlib_resources; python_version < "3.12"',

Historically, I’d used the more common double-quotes for the specifier (i.e. python_version < "3.12"), and ini2toml will retain that value and wrap it in single quotes, but I’ve seen other users switch to using python_version < '3.12' in order to maintain consistent double quotes around each of the dependencies.

I miss the setup.cfg syntax for its simplicity and elegance, where the developer didn’t have to “hand encode” the configuration.

Overly aggressive normalization

For the name field in particular, the spec states:

Tools SHOULD normalize this name, as specified by PEP 503, as soon as it is read for internal consistency.

This recommendation implies that a backend like Setuptools or Flit should normalize the name immediately after reading it from the pyproject.toml such that the project’s preferred name would never be exposed. In other words, the project named “CherryPy” would have its name stored as “cherrypy” in the metadata, contradicting the stated and defended stance that “CherryPy” is a valid display name for a project. And especially considering that some project names are meaningfully mangled when normalized (e.g. zope.interface becomes zope-interface), this recommendation represents scope creep for the name normalization.

In practice, backends are not enforcing this recommendation, so perhaps it should just be removed.

File is largely unnecessary

The existence of a static config runs against the motivations such as those in this vision and demonstrated by coherent.build where there is literally zero static config in the source repository.

Written on May 9, 2024