Prompted by Amber Brown’s presentation at the Python Language
last month, Christian Heimes has followed up on his own earlier
work on slimming down the Python standard
library, and created a proper Python Enhancement Proposal PEP
594 for removing obviously obsolete
and unmaintained detritus from the standard library.
PEP 594 is great news for Python, and in particular for the maintainers of its
standard library, who can now address a reduced surface area. A brief trip
through the PEP’s rogues gallery of modules to deprecate or remove is
illuminating. The python standard library contains plenty of useful modules,
but it also hides a veritable necropolis of code, a towering monument to
obsolescence, threatening to topple over on its maintainers at any point.
However, I believe the PEP may be approaching the problem from the wrong
direction. Currently, the standard library is maintained in tandem with, and
by the maintainers of, the CPython python runtime. Large portions of it are
simply included in the hope that it might be useful to somebody. In the
aforementioned PEP, you can see this logic at work in defense of the
module: why not remove it? “The module is useful to convert CSS colors between
coordinate systems. [It] does not impose maintenance overhead on core
There was a time when Internet access was scarce, and maybe it was helpful to
pre-load Python with lots of stuff so it could be pre-packaged with the
Python binaries on the
when you first started learning.
Today, however, the modules you need to convert colors between coordinate
systems are only a
pip install away. The bigger core interpreter is just
more to download before you can get started.
Why Didn’t You Review My PR?
So let’s examine that claim: does a tiny module like
maintenance overhead on core development”?
The core maintainers have enough going on just trying to maintain the huge and
ancient C codebase that is CPython itself. As Mariatta put it in her North
the most common question that core developers get is “Why haven’t you looked at
my PR?” And the answer? It’s easier to not look at PRs when you don’t care
about them. This from a talk about what it means to be a core developer!
One might ask, whether Twisted has the same problem. Twisted is a big
collection of loosely-connected modules too; a sort of standard library for
networking. Are clients and servers for SSH, IMAP, HTTP, TLS, et. al. all a
bit much to try to cram into one package?
I’m compelled to reply: yes. Twisted is monolithic because it dates back to
a similar historical period as CPython, where installing stuff was really
complicated. So I am both sympathetic and empathetic towards CPython’s plight.
At some point, each sub-project within Twisted should ideally become a separate
project with its own repository, CI, website, and of course its own more
focused maintainers. We’ve been slowly splitting out projects already, where
we can find a natural boundary. Some things that started in Twisted like
incremental have been split out;
are in the process of getting that treatment as well. Other projects absorbed
into the org continue to live separately, like
treq. As we
figure out how to reduce the overhead of setting up and maintaining the CI and
release infrastructure for each of them, we’ll do more of this.
But is our monolithic nature the most pressing problem, or even a serious
problem, for the project? Let’s quantify it.
As of this writing, Twisted has 5 outstanding un-reviewed pull requests in our
review queue. The median time a ticket spends in
review is roughly four and a half days. The oldest ticket in our queue
dates from April 22, which means it’s been less than 2 months since our oldest
un-reviewed PR was submitted.
It’s always a struggle to find enough maintainers and enough time to respond to
pull requests. Subjectively, it does sometimes feel like “Why won’t you review
my pull request?” is a question we do still get all too often. We aren’t
always doing this well, but all in all, we’re managing; the queue hovers
between 0 at its lowest and 25 or so during a bad month.
By comparison to those numbers, how is core CPython doing?
Looking at CPython’s keyword-based review queue
we can see that there are 429 tickets currently awaiting review. The oldest
PR awaiting review hasn’t been touched since February 2, 2018, which is almost
500 days old.
How many are interpreter issues and how many are stdlib issues? Clearly review
latency is a problem, but would removing the stdlib even help?
For a quick and highly unscientific estimate, I scanned the first (oldest) page
of PRs in the query above. By my subjective assessment, on this page of 25
PRs, 14 were about the standard library, 10 were about the core language or
interpreter code; one was a minor documentation issue that didn’t really apply
to either. If I can hazard a very rough estimate based on this proportion,
somewhere around half of the unreviewed PRs might be in standard library code.
So the first reason the CPython core team needs to stop maintaining the
standard library because they literally don’t have the capacity to maintain
the standard library. Or to put it differently: they aren’t maintaining it,
and what remains is to admit that and start splitting it out.
It’s true that none of the open PRs on CPython are in
colorsys. It does
not, in fact, impose maintenance overhead on core development. Core
development imposes maintenance overhead on it. If I wanted to update the
colorsys module to be more modern - perhaps to have a
Color object rather
than a collection of free functions, perhaps to support integer color models -
I’d likely have to wait 500 days, or more, for a review.
As a result, code in the standard library is harder to change, which means its
users are less motivated to contribute to it. CPython’s unusually infrequent
releases also slow down the development of library code and decrease the
usefulness of feedback from users. It’s no accident that almost all of the
modules in the standard library have actively maintained alternatives outside
of it: it’s not a failure on the part of the stdlib’s maintainers. The whole
process is set up to produce stagnation in all but the most frequently used
parts of the stdlib, and that’s exactly what it does.
New Environments, New Requirements
Perhaps even more importantly is that bundling together CPython with the
definition of the standard library privileges CPython itself, and the use-cases
that it supports, above every other implementation of the language.
after keynote tells us that in
order to keep succeeding and expanding, Python needs to grow into new areas:
particularly web frontends, but also mobile clients, embedded systems, and
These environments require one or both of:
- a completely different runtime, such as Brython, or
- a modified, stripped down version of the standard library, which elides most
In all of these cases, determining which modules have been removed from the
standard library is a sticking point. They have to be discovered by a process
of trial and error; notably, a process completely different from the standard
process for determining dependencies within a Python application. There’s no
install_requires declaration you can put in your
setup.py that indicates
that your library uses a stdlib module that your target Python runtime might
leave out due to space constraints.
You can even have this problem even if all you ever use is the standard
python on your Linux installation. Even server- and desktop-class Linux
distributions have the same need for a more minimal core Python package, and so
they already chop up the standard library somewhat arbitrarily. This can break
the expectations of many python codebases, and result in bugs where even
install won’t work.
Take It All Out
How about the suggestion that we should do only a little a day? Although it
sounds convincing, don’t be fooled. The reason you never seem to finish is
precisely because you tidy a little at a time. [...] The ultimate secret of
success is this: If you tidy up in one shot, rather than little by little,
you can dramatically change your mind-set.
— Kondō, Marie.
“The Life-Changing Magic of Tidying Up”
While incremental slimming of the standard library is a step in the right
direction, incremental change can only get us so far. As Marie Kondō says,
when you really want to tidy up, the first step is to take everything out so
that you can really see everything, and put back only what you need.
It’s time to thank those modules which do not spark joy and send them on their
We need a “kernel” version of Python that contains only the most absolutely
minimal library, so that all implementations can agree on a core baseline that
gives you a “python”, and applications, even those that want to run on web
browsers or microcontrollers, can simply state their additional requirements in
Now, there are some business environments where adding things to your
requirements.txt is a fraught, bureaucratic process, and in those places, a
large standard library might seem appealing. But “standard library” is a purely
arbitrary boundary that the procurement processes in such places have drawn,
and an equally arbitrary line may be easily drawn around a binary distribution.
So it may indeed be useful for some CPython binary distributions — perhaps even
the official ones — to still ship with a broader selection of modules from
PyPI. Even for the average user, in order to use it for development, at the
very least, you’d need enough stdlib stuff that
pip can bootstrap itself, to
install the other modules you need!
It’s already the case, today, that
pip is distributed with Python, but
isn’t maintained in the CPython repository. What the default Python binary
installer ships with is already a separate question from what is developed in
the CPython repo, or what ships in the individual source tarball for the
In order to use Linux, you need bootable media with a huge array of additional
programs. That doesn’t mean the Linux kernel itself is in one giant
repository, where the hundreds of applications you need for a functioning Linux
server are all maintained by one team. The Linux kernel project is immensely
valuable, but functioning operating systems which use it are built from the
combination of the Linux kernel and a wide variety of separately maintained
libraries and programs.
The “batteries included” philosophy was a great fit for the time when it was
created: a booster rocket
to sneak Python into the imagination of the programming public. As the open
source and Python packaging ecosystems have matured, however, this strategy has
not aged well, and like any booster, we must let it fall back to earth, lest it
drag us back down with it.
New Python runtimes, new deployment targets, and new developer audiences all
present tremendous opportunities for the Python community to soar ever higher.
But to do it, we need a newer, leaner, unburdened “kernel” Python. We need to
dump the whole standard library out on the floor, adding back only the smallest
bits that we need, so that we can tell what is truly necessary and what’s just
nice to have.
I hope I’ve convinced at least a few of you that we need a kernel Python.
Now: who wants to write the
Thanks to Jean-Paul Calderone, Donald Stufft, Alex Gaynor, Amber Brown, Ian
Cordasco, Jonathan Lange, Augie Fackler, Hynek Schlawack, Pete Fein, Mark
Williams, Tom Most, Jeremy Thurgood, and Aaron Gallagher for feedback and
corrections on earlier drafts of this post. Any errors of course remain my