August 15, 2025

Glyph Lefkowitz

The Futzing Fraction

The most optimistic vision of generative AI¹ is that it will relieve us of the tedious, repetitive elements of knowledge work so that we can get to work on the really interesting problems that such tedium stands in the way of. Even if you fully believe in this vision, it’s hard to deny that today, some tedium is associated with the process of using generative AI itself.

Generative AI also isn’t free, and so, as responsible consumers, we need to ask: is it worth it? What’s the ROI of genAI, and how can we tell? In this post, I’d like to explore a logical framework for evaluating genAI expenditures, to determine if your organization is getting its money’s worth.

Perpetually Proffering Permuted Prompts

I think most LLM users would agree with me that a typical workflow with an LLM rarely involves prompting it only one time and getting a perfectly useful answer that solves the whole problem.

Generative AI best practices, even from the most optimistic vendors all suggest that you should continuously evaluate everything. ChatGPT, which is really the only genAI product with significantly scaled adoption, still says at the bottom of every interaction:

ChatGPT can make mistakes. Check important info.

If we have to “check important info” on every interaction, it stands to reason that even if we think it’s useful, some of those checks will find an error. Again, if we think it’s useful, presumably the next thing to do is to perturb our prompt somehow, and issue it again, in the hopes that the next invocation will, by dint of either:

better luck this time with the stochastic aspect of the inference process,
enhanced application of our skill to engineer a better prompt based on the deficiencies of the current inference, or
better performance of the model by populating additional context in subsequent chained prompts.

Unfortunately, given the relative lack of reliable methods to re-generate the prompt and receive a better answer², checking the output and re-prompting the model can feel like just kinda futzing around with it. You try, you get a wrong answer, you try a few more times, eventually you get the right answer that you wanted in the first place. It’s a somewhat unsatisfying process, but if you get the right answer eventually, it does feel like progress, and you didn’t need to use up another human’s time.

In fact, the hottest buzzword of the last hype cycle is “agentic”. While I have my own feelings about this particular word³, its current practical definition is “a generative AI system which automates the process of re-prompting itself, by having a deterministic program evaluate its outputs for correctness”.

A better term for an “agentic” system would be a “self-futzing system”.

However, the ability to automate some level of checking and re-prompting does not mean that you can fully delegate tasks to an agentic tool, either. It is, plainly put, not safe. If you leave the AI on its own, you will get terrible results that will at best make for a funny story⁴⁵ and at worst might end up causing serious damage⁶⁷.

Taken together, this all means that for any consequential task that you want to accomplish with genAI, you need an expert human in the loop. The human must be capable of independently doing the job that the genAI system is being asked to accomplish.

When the genAI guesses correctly and produces usable output, some of the human’s time will be saved. When the genAI guesses wrong and produces hallucinatory gibberish or even “correct” output that nevertheless fails to account for some unstated but necessary property such as security or scale, some of the human’s time will be wasted evaluating it and re-trying it.

Income from Investment in Inference

Let’s evaluate an abstract, hypothetical genAI system that can automate some work for our organization. To avoid implicating any specific vendor, let’s call the system “Mallory”.

Is Mallory worth the money? How can we know?

Logically, there are only two outcomes that might result from using Mallory to do our work.

We prompt Mallory to do some work; we check its work, it is correct, and some time is saved.
We prompt Mallory to do some work; we check its work, it fails, and we futz around with the result; this time is wasted.

As a logical framework, this makes sense, but ROI is an arithmetical concept, not a logical one. So let’s translate this into some terms.

In order to evaluate Mallory, let’s define the Futzing Fraction, “ FF ”, in terms of the following variables:

H: the average amount of time a Human worker would take to do a task, unaided by Mallory
I: the amount of time that Mallory takes to run one Inference⁸
C: the amount of time that a human has to spend Checking Mallory’s output for each inference
P: the Probability that Mallory will produce a correct inference for each prompt
W: the average amount of time that it takes for a human to Write one prompt for Mallory
E: since we are normalizing everything to time, rather than money, we do also have to account for the dollar of Mallory as as a product, so we will include the Equivalent amount of human time we could purchase for the marginal cost of one⁹ inference.

As in last week’s example of simple ROI arithmetic, we will put our costs in the numerator, and our benefits in the denominator.

FF = W+I+C+E P H

The idea here is that for each prompt, the minimum amount of time-equivalent cost possible is W+I+C+E. The user must, at least once, write a prompt, wait for inference to run, then check the output; and, of course, pay any costs to Mallory’s vendor.

If the probability of a correct answer is P=13, then they will do this entire process 3 times¹⁰, so we put P in the denominator. Finally, we divide everything by H, because we are trying to determine if we are actually saving any time or money, versus just letting our existing human, who has to be driving this process anyway, do the whole thing.

If the Futzing Fraction evaluates to a number greater than 1, as previously discussed, you are a bozo; you’re spending more time futzing with Mallory than getting value out of it.

Figuring out the Fraction is Frustrating

In order to even evaluate the value of the Futzing Fraction though, you have to have a sound method to even get a vague sense of all the terms.

If you are a business leader, a lot of this is relatively easy to measure. You vaguely know what H is, because you know what your payroll costs, and similarly, you can figure out E with some pretty trivial arithmetic based on Mallory’s pricing table. There are endless YouTube channels, spec sheets and benchmarks to give you I. W is probably going to be so small compared to H that it hardly merits consideration¹¹.

But, are you measuring C? If your employees are not checking the outputs of the AI, you’re on a path to catastrophe that no ROI calculation can capture, so it had better be greater than zero.

Are you measuring P? How often does the AI get it right on the first try?

Challenges to Computing Checking Costs

In the fraction defined above, the term C is going to be large. Larger than you think.

Measuring P and C with a high degree of precision is probably going to be very hard; possibly unreasonably so, or too expensive¹² to bother with in practice. So you will undoubtedly need to work with estimates and proxy metrics. But you have to be aware that this is a problem domain where your normal method of estimating is going to be extremely vulnerable to inherent cognitive bias, and find ways to measure.

Margins, Money, and Metacognition

First let’s discuss cognitive and metacognitive bias.

My favorite cognitive bias is the availability heuristic and a close second is its cousin salience bias. Humans are empirically predisposed towards noticing and remembering things that are more striking, and to overestimate their frequency.

If you are estimating the variables above based on the vibe that you’re getting from the experience of using an LLM, you may be overestimating its utility.

Consider a slot machine.

If you put a dollar in to a slot machine, and you lose that dollar, this is an unremarkable event. Expected, even. It doesn’t seem interesting. You can repeat this over and over again, a thousand times, and each time it will seem equally unremarkable. If you do it a thousand times, you will probably get gradually more anxious as your sense of your dwindling bank account becomes slowly more salient, but losing one more dollar still seems unremarkable.

If you put a dollar in a slot machine and it gives you a thousand dollars, that will probably seem pretty cool. Interesting. Memorable. You might tell a story about this happening, but you definitely wouldn’t really remember any particular time you lost one dollar.

Luckily, when you arrive at a casino with slot machines, you probably know well enough to set a hard budget in the form of some amount of physical currency you will have available to you. The odds are against you, you’ll probably lose it all, but any responsible gambler will have an immediate, physical representation of their balance in front of them, so when they have lost it all, they can see that their hands are empty, and can try to resist the “just one more pull” temptation, after hitting that limit.

Now, consider Mallory.

If you put ten minutes into writing a prompt, and Mallory gives a completely off-the-rails, useless answer, and you lose ten minutes, well, that’s just what using a computer is like sometimes. Mallory malfunctioned, or hallucinated, but it does that sometimes, everybody knows that. You only wasted ten minutes. It’s fine. Not a big deal. Let’s try it a few more times. Just ten more minutes. It’ll probably work this time.

If you put ten minutes into writing a prompt, and it completes a task that would have otherwise taken you 4 hours, that feels amazing. Like the computer is magic! An absolute endorphin rush.

Very memorable. When it happens, it feels like P=1.

But... did you have a time budget before you started? Did you have a specified N such that “I will give up on Mallory as soon as I have spent N minutes attempting to solve this problem with it”? When the jackpot finally pays out that 4 hours, did you notice that you put 6 hours worth of 10-minute prompt coins into it in?

If you are attempting to use the same sort of heuristic intuition that probably works pretty well for other business leadership decisions, Mallory’s slot-machine chat-prompt user interface is practically designed to subvert those sensibilities. Most business activities do not have nearly such an emotionally variable, intermittent reward schedule. They’re not going to trick you with this sort of cognitive illusion.

Thus far we have been talking about cognitive bias, but there is a metacognitive bias at play too: while Dunning-Kruger, everybody’s favorite metacognitive bias does have some problems with it, the main underlying metacognitive bias is that we tend to believe our own thoughts and perceptions, and it requires active effort to distance ourselves from them, even if we know they might be wrong.

This means you must assume any intuitive estimate of C is going to be biased low; similarly P is going to be biased high. You will forget the time you spent checking, and you will underestimate the number of times you had to re-check.

To avoid this, you will need to decide on a Ulysses pact to provide some inputs to a calculation for these factors that you will not be able to able to fudge if they seem wrong to you.

Problematically Plausible Presentation

Another nasty little cognitive-bias landmine for you to watch out for is the authority bias, for two reasons:

People will tend to see Mallory as an unbiased, external authority, and thereby see it as more of an authority than a similarly-situated human¹³.
Being an LLM, Mallory will be overconfident in its answers¹⁴.

The nature of LLM training is also such that commonly co-occurring tokens in the training corpus produce higher likelihood of co-occurring in the output; they’re just going to be closer together in the vector-space of the weights; that’s, like, what training a model is, establishing those relationships.

If you’ve ever used an heuristic to informally evaluate someone’s credibility by listening for industry-specific shibboleths or ways of describing a particular issue, that skill is now useless. Having ingested every industry’s expert literature, commonly-occurring phrases will always be present in Mallory’s output. Mallory will usually sound like an expert, but then make mistakes at random.¹⁵.

While you might intuitively estimate C by thinking “well, if I asked a person, how could I check that they were correct, and how long would that take?” that estimate will be extremely optimistic, because the heuristic techniques you would use to quickly evaluate incorrect information from other humans will fail with Mallory. You need to go all the way back to primary sources and actually fully verify the output every time, or you will likely fall into one of these traps.

Mallory Mangling Mentorship

So far, I’ve been describing the effect Mallory will have in the context of an individual attempting to get some work done. If we are considering organization-wide adoption of Mallory, however, we must also consider the impact on team dynamics. There are a number of possible potential side effects that one might consider when looking at, but here I will focus on just one that I have observed.

I have a cohort of friends in the software industry, most of whom are individual contributors. I’m a programmer who likes programming, so are most of my friends, and we are also (sigh), charitably, pretty solidly middle-aged at this point, so we tend to have a lot of experience.

As such, we are often the folks that the team — or, in my case, the community — goes to when less-experienced folks need answers.

On its own, this is actually pretty great. Answering questions from more junior folks is one of the best parts of a software development job. It’s an opportunity to be helpful, mostly just by knowing a thing we already knew. And it’s an opportunity to help someone else improve their own agency by giving them knowledge that they can use in the future.

However, generative AI throws a bit of a wrench into the mix.

Let’s imagine a scenario where we have 2 developers: Alice, a staff engineer who has a good understanding of the system being built, and Bob, a relatively junior engineer who is still onboarding.

The traditional interaction between Alice and Bob, when Bob has a question, goes like this:

Bob gets confused about something in the system being developed, because Bob’s understanding of the system is incorrect.
Bob formulates a question based on this confusion.
Bob asks Alice that question.
Alice knows the system, so she gives an answer which accurately reflects the state of the system to Bob.
Bob’s understanding of the system improves, and thus he will have fewer and better-informed questions going forward.

You can imagine how repeating this simple 5-step process will eventually transform Bob into a senior developer, and then he can start answering questions on his own. Making sufficient time for regularly iterating this loop is the heart of any good mentorship process.

Now, though, with Mallory in the mix, the process now has a new decision point, changing it from a linear sequence to a flow chart.

We begin the same way, with steps 1 and 2. Bob’s confused, Bob formulates a question, but then:

Bob asks Mallory that question.

Here, our path then diverges into a “happy” path, a “meh” path, and a “sad” path.

The “happy” path proceeds like so:

Mallory happens to formulate a correct answer.
Bob’s understanding of the system improves, and thus he will have fewer and better-informed questions going forward.

Great. Problem solved. We just saved some of Alice’s time. But as we learned earlier,

Mallory can make mistakes. When that happens, we will need to check important info. So let’s get checking:

Mallory happens to formulate an incorrect answer.
Bob investigates this answer.
Bob realizes that this answer is incorrect because it is inconsistent with some of his prior, correct knowledge of the system, or his investigation.
Bob asks Alice the same question; GOTO traditional interaction step 4.

On this path, Bob spent a while futzing around with Mallory, to no particular benefit. This wastes some of Bob’s time, but then again, Bob could have ended up on the happy path, so perhaps it was worth the risk; at least Bob wasn’t wasting any of Alice’s much more valuable time in the process.¹⁶

Notice that beginning at the start of step 4, we must begin allocating all of Bob’s time to C, so C already starts getting a bit bigger than if it were just Bob checking Mallory’s output specifically on tasks that Bob is doing.

That brings us to the “sad” path.

Mallory happens to formulate an incorrect answer.
Bob investigates this answer.
Bob does not realize that this answer is incorrect because he is unable to recognize any inconsistencies with his existing, incomplete knowledge of the system.
Bob integrates Mallory’s incorrect information of the system into his mental model.
Bob proceeds to make a larger and larger mess of his work, based on an incorrect mental model.
Eventually, Bob asks Alice a new, worse question, based on this incorrect understanding.
Sadly we cannot return to the happy path at this point, because now Alice must unravel the complex series of confusing misunderstandings that Mallory has unfortunately conveyed to Bob at this point. In the really sad case, Bob actually doesn’t believe Alice for a while, because Mallory seems unbiased¹⁷, and Alice has to waste even more time convincing Bob before she can simply explain to him.

Now, we have wasted some of Bob’s time, and some of Alice’s time. Everything from step 5-10 is C, and as soon as Alice gets involved, we are now adding to C at double real-time. If more team members are pulled in to the investigation, you are now multiplying C by the number of investigators, potentially running at triple or quadruple real time.

But That’s Not All

Here I’ve presented a brief selection reasons why C will be both large, and larger than you expect. To review:

Gambling-style mechanics of the user interface will interfere with your own self-monitoring and developing a good estimate.
You can’t use human heuristics for quickly spotting bad answers.
Wrong answers given to junior people who can’t evaluate them will waste more time from your more senior employees.

But this is a small selection of ways that Mallory’s output can cost you money and time. It’s harder to simplistically model second-order effects like this, but there’s also a broad range of possibilities for ways that, rather than simply checking and catching errors, an error slips through and starts doing damage. Or ways in which the output isn’t exactly wrong, but still sub-optimal in ways which can be difficult to notice in the short term.

For example, you might successfully vibe-code your way to launch a series of applications, successfully “checking” the output along the way, but then discover that the resulting code is unmaintainable garbage that prevents future feature delivery, and needs to be re-written¹⁸. But this kind of intellectual debt isn’t even specific to technical debt while coding; it can even affect such apparently genAI-amenable fields as LinkedIn content marketing¹⁹.

Problems with the Prediction of P

C isn’t the only challenging term though. P, is just as, if not more important, and just as hard to measure.

LLM marketing materials love to phrase their accuracy in terms of a percentage. Accuracy claims for LLMs in general tend to hover around 70%²⁰. But these scores vary per field, and when you aggregate them across multiple topic areas, they start to trend down. This is exactly why “agentic” approaches for more immediately-verifiable LLM outputs (with checks like “did the code work”) got popular in the first place: you need to try more than once.

Independently measured claims about accuracy tend to be quite a bit lower²¹. The field of AI benchmarks is exploding, but it probably goes without saying that LLM vendors game those benchmarks²², because of course every incentive would encourage them to do that. Regardless of what their arbitrary scoring on some benchmark might say, all that matters to your business is whether it is accurate for the problems you are solving, for the way that you use it. Which is not necessarily going to correspond to any benchmark. You will need to measure it for yourself.

With that goal in mind, our formulation of P must be a somewhat harsher standard than “accuracy”. It’s not merely “was the factual information contained in any generated output accurate”, but, “is the output good enough that some given real knowledge-work task is done and the human does not need to issue another prompt”?

Surprisingly Small Space for Slip-Ups

The problem with reporting these things as percentages at all, however, is that our actual definition for P is 1attempts, where attempts for any given attempt, at least, must be an integer greater than or equal to 1.

Taken in aggregate, if we succeed on the first prompt more often than not, we could end up with a P>12, but combined with the previous observation that you almost always have to prompt it more than once, the practical reality is that P will start at 50% and go down from there.

If we plug in some numbers, trying to be as extremely optimistic as we can, and say that we have a uniform stream of tasks, every one of which can be addressed by Mallory, every one of which:

we can measure perfectly, with no overhead
would take a human 45 minutes
takes Mallory only a single minute to generate a response
Mallory will require only 1 re-prompt, so “good enough” half the time
takes a human only 5 minutes to write a prompt for
takes a human only 5 minutes to check the result of
has a per-prompt cost of the equivalent of a single second of a human’s time

Thought experiments are a dicey basis for reasoning in the face of disagreements, so I have tried to formulate something here that is absolutely, comically, over-the-top stacked in favor of the AI optimist here.

Would that be a profitable? It sure seems like it, given that we are trading off 45 minutes of human time for 1 minute of Mallory-time and 10 minutes of human time. If we ask Python:

>>> def FF(H, I, C, P, W, E):
...     return (W + I + C + E) / (P * H)
... FF(H=45.0, I=1.0, C=5.0, P=1/2, W=5.0, E=0.01)
...
0.48933333333333334

We get a futzing fraction of about 0.4896. Not bad! Sounds like, at least under these conditions, it would indeed be cost-effective to deploy Mallory. But… realistically, do you reliably get useful, done-with-the-task quality output on the second prompt? Let’s bump up the denominator on P just a little bit there, and see how we fare:

>>> FF(H=45.0, I=1.0, C=5.0, P=1/3, W=5.0, E=0.01)
0.734

Oof. Still cost-effective at 0.734, but not quite as good. Where do we cap out, exactly?

>>> from itertools import count
... for A in count(start=4):
...     print(A, result := FF(H=45.0, I=1.0, C=5.0, P=1 / A, W=5.0, E=1/60.))
...     if result > 1:
...         break
...
4 0.9792592592592594
5 1.224074074074074
>>>

With this little test, we can see that at our next iteration we are already at 0.9792, and by 5 tries per prompt, even in this absolute fever-dream of an over-optimistic scenario, with a futzing fraction of 1.2240, Mallory is now a net detriment to our bottom line.

Harm to the Humans

We are treating H as functionally constant so far, an average around some hypothetical Gaussian distribution, but the distribution itself can also change over time.

Formally speaking, an increase to H would be good for our fraction. Maybe it would even be a good thing; it could mean we’re taking on harder and harder tasks due to the superpowers that Mallory has given us.

But an observed increase to H would probably not be good. An increase could also mean your humans are getting worse at solving problems, because using Mallory has atrophied their skills²³ and sabotaged learning opportunities²⁴²⁵. It could also go up because your senior, experienced people now hate their jobs²⁶.

For some more vulnerable folks, Mallory might just take a shortcut to all these complex interactions and drive them completely insane²⁷ directly. Employees experiencing an intense psychotic episode are famously less productive than those who are not.

This could all be very bad, if our futzing fraction eventually does head north of 1 and you need to reconsider introducing human-only workflows, without Mallory.

Abridging the Artificial Arithmetic (Alliteratively)

To reiterate, I have proposed this fraction:

FF = W+I+C+E P H

which shows us positive ROI when FF is less than 1, and negative ROI when it is more than 1.

This model is heavily simplified. A comprehensive measurement program that tests the efficacy of any technology, let alone one as complex and rapidly changing as LLMs, is more complex than could be captured in a single blog post.

Real-world work might be insufficiently uniform to fit into a closed-form solution like this. Perhaps an iterated simulation with variables based on the range of values seem from your team’s metrics would give better results.

However, in this post, I want to illustrate that if you are going to try to evaluate an LLM-based tool, you need to at least include some representation of each of these terms somewhere. They are all fundamental to the way the technology works, and if you’re not measuring them somehow, then you are flying blind into the genAI storm.

I also hope to show that a lot of existing assumptions about how benefits might be demonstrated, for example with user surveys about general impressions, or by evaluating artificial benchmark scores, are deeply flawed.

Even making what I consider to be wildly, unrealistically optimistic assumptions about these measurements, I hope I’ve shown:

in the numerator, C might be a lot higher than you expect,
in the denominator, P might be a lot lower than you expect,
repeated use of an LLM might make H go up, but despite the fact that it's in the denominator, that will ultimately be quite bad for your business.

Personally, I don’t have all that many concerns about E and I. E is still seeing significant loss-leader pricing, and I might not be coming down as fast as vendors would like us to believe, if the other numbers work out I don’t think they make a huge difference. However, there might still be surprises lurking in there, and if you want to rationally evaluate the effectiveness of a model, you need to be able to measure them and incorporate them as well.

In particular, I really want to stress the importance of the influence of LLMs on your team dynamic, as that can cause massive, hidden increases to C. LLMs present opportunities for junior employees to generate an endless stream of chaff that will simultaneously:

wreck your performance review process by making them look much more productive than they are,
increase stress and load on senior employees who need to clean up unforeseen messes created by their LLM output,
and ruin their own opportunities for career development by skipping over learning opportunities.

If you’ve already deployed LLM tooling without measuring these things and without updating your performance management processes to account for the strange distortions that these tools make possible, your Futzing Fraction may be much, much greater than 1, creating hidden costs and technical debt that your organization will not notice until a lot of damage has already been done.

If you got all the way here, particularly if you’re someone who is enthusiastic about these technologies, thank you for reading. I appreciate your attention and I am hopeful that if we can start paying attention to these details, perhaps we can all stop futzing around so much with this stuff and get back to doing real work.

Acknowledgments

I do not share this optimism, but I want to try very hard in this particular piece to take it as a given that genAI is in fact helpful. ↩
If we could have a better prompt on demand via some repeatable and automatable process, surely we would have used a prompt that got the answer we wanted in the first place. ↩
The software idea of a “user agent” straightforwardly comes from the legal principle of an agent, which has deep roots in common law, jurisprudence, philosophy, and math. When we think of an agent (some software) acting on behalf of a principal (a human user), this historical baggage imputes some important ethical obligations to the developer of the agent software. genAI vendors have been as eager as any software vendor to dodge responsibility for faithfully representing the user’s interests even as there are some indications that at least some courts are not persuaded by this dodge, at least by the consumers of genAI attempting to pass on the responsibility all the way to end users. Perhaps it goes without saying, but I’ll say it anyway: I don’t like this newer interpretation of “agent”. ↩
“Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents”, Axel Backlund, Lukas Petersson, Feb 20, 2025 ↩
“random thing are happening, maxed out usage on api keys”, @leojr94 on Twitter, Mar 17, 2025 ↩
“New study sheds light on ChatGPT’s alarming interactions with teens” ↩
“Lawyers submitted bogus case law created by ChatGPT. A judge fined them $5,000”, by Larry Neumeister for the Associated Press, June 22, 2023 ↩
During which a human will be busy-waiting on an answer. ↩
Given the fluctuating pricing of these products, and fixed subscription overhead, this will obviously need to be amortized; including all the additional terms to actually convert this from your inputs is left as an exercise for the reader. ↩
I feel like I should emphasize explicitly here that everything is an average over repeated interactions. For example, you might observe that a particular LLM has a low probability of outputting acceptable work on the first prompt, but higher probability on subsequent prompts in the same context, such that it usually takes 4 prompts. For the purposes of this extremely simple closed-form model, we’d still consider that a P of 25%, even though a more sophisticated model, or a monte carlo simulation that sets progressive bounds on the probability, might produce more accurate values. ↩
No it isn’t, actually, but for the sake of argument let’s grant that it is. ↩
It’s worth noting that all this expensive measuring itself must be included in C until you have a solid grounding for all your metrics, but let’s optimistically leave all of that out for the sake of simplicity. ↩
“AI Company Poll Finds 45% of Workers Trust the Tech More Than Their Peers”, by Suzanne Blake for Newsweek, Aug 13, 2025 ↩
AI Chatbots Remain Overconfident — Even When They’re Wrong by Jason Bittel for the Dietrich College of Humanities and Social Sciences at Carnegie Mellon University, July 22, 2025 ↩
AI Mistakes Are Very Different From Human Mistakes by Bruce Schneier and Nathan E. Sanders for IEEE Spectrum, Jan 13, 2025 ↩
Foreshadowing is a narrative device in which a storyteller gives an advance hint of an upcoming event later in the story. ↩
“People are worried about the misuse of AI, but they trust it more than humans” ↩
“Why I stopped using AI (as a Senior Software Engineer)”, theSeniorDev YouTube channel, Jun 17, 2025 ↩
“I was an AI evangelist. Now I’m an AI vegan. Here’s why.”, Joe McKay for the greatchatlinkedin YouTube channel, Aug 8, 2025 ↩
“What LLM is The Most Accurate?” ↩
“Study Finds That 52 Percent Of ChatGPT Answers to Programming Questions are Wrong”, by Sharon Adarlo for Futurism, May 23, 2024 ↩
“Off the Mark: The Pitfalls of Metrics Gaming in AI Progress Races”, by Tabrez Syed on BoxCars AI, Dec 14, 2023 ↩
“I tried coding with AI, I became lazy and stupid”, by Thomasorus, Aug 8, 2025 ↩
“How AI Changes Student Thinking: The Hidden Cognitive Risks” by Timothy Cook for Psychology Today, May 10, 2025 ↩
“Increased AI use linked to eroding critical thinking skills” by Justin Jackson for Phys.org, Jan 13, 2025 ↩
“AI could end my job — Just not the way I expected” by Manuel Artero Anguita on dev.to, Jan 27, 2025 ↩
“The Emerging Problem of “AI Psychosis”” by Gary Drevitch for Psychology Today, July 21, 2025. ↩

by Glyph at August 15, 2025 07:51 AM

August 09, 2025

Glyph Lefkowitz

R0ML’s Ratio

My father, also known as “R0ML” once described a methodology for evaluating volume purchases that I think needs to be more popular.

If you are a hardcore fan, you might know that he has already described this concept publicly in a talk at OSCON in 2005, among other places, but it has never found its way to the public Internet, so I’m giving it a home here, and in the process, appropriating some of his words.¹

Let’s say you’re running a circus. The circus has many clowns. Ten thousand clowns, to be precise. They require bright red clown noses. Therefore, you must acquire a significant volume of clown noses. An enterprise licensing agreement for clown noses, if you will.

If the nose plays, it can really make the act. In order to make sure you’re getting quality noses, you go with a quality vendor. You select a vendor who can supply noses for $100 each, at retail.

Do you want to buy retail? Ten thousand clowns, ten thousand noses, one hundred dollars: that’s a million bucks worth of noses, so it’s worth your while to get a good deal.

As a conscientious executive, you go to the golf course with your favorite clown accessories vendor and negotiate yourself a 50% discount, with a commitment to buy all ten thousand noses.

Is this a good deal? Should you take it?

To determine this, we will use an analytical tool called R0ML’s Ratio (RR).

The ratio has 2 terms:

the Full Undiscounted Retail List Price of Units Used (FURLPoUU), which can of course be computed by the individual retail list price of a single unit (in our case, $100) multiplied by the number of units used
the Total Price of the Entire Enterprise Volume Licensing Agreement (TPotEEVLA), which in our case is $500,000.

It is expressed as:

RR = TPotEEVLA FURLPoUU

Crucially, you must be able to compute the number of units used in order to complete this ratio. If, as expected, every single clown wears their nose at least once during the period of the license agreement, then our Units Used is 10,000, our FURLPoUU is $1,000,000 and our TPotEEVLA is $500,000, which makes our RR 0.5.

Congratulations. If R0ML’s Ratio is less than 1, it’s a good deal. Proceed.

But… maybe the nose doesn’t play. Not every clown’s costume is an exact clone of the traditional, stereotypical image of a clown. Many are avant-garde. Perhaps this plentiful proboscis pledge was premature. Here, I must quote the originator of this theoretical framework directly:

What if the wheeze doesn’t please?

What if the schnozz gives some pause?

In other words: what if some clowns don’t wear their noses?

If we were to do this deal, and then ask around afterwards to find out that only 200 of our 10,000 clowns were to use their noses, then FURLPoUU comes out to 200 * $100, for a total of $20,000. In that scenario, RR is 25, which you may observe is substantially greater than 1.

If you do a deal where R0ML’s ratio is greater than 1, then you are the bozo.

I apologize if I have belabored this point. As R0ML expressed in the email we exchanged about this many years ago,

I do not mind if you blog about it — and I don't mind getting the credit — although one would think it would be obvious.

And yeah, one would think this would be obvious? But I have belabored it because many discounted enterprise volume purchasing agreements still fail the R0ML’s Ratio Bozo Test.²

In the case of clown noses, if you pay the discounted price, at least you get to keep the nose; maybe lightly-used clown noses have some resale value. But in software licensing or SaaS deals, once you’ve purchased the “discounted” software or service, once you have provisioned the “seats”, the money is gone, and if your employees don’t use it, then no value for your organization will ever result.

Measuring number of units used is very important. Without this number, you have no idea if you are a bozo or not.

It is often better to give your individual employees a corporate card and allow them to make arbitrary individual purchases of software licenses and SaaS tools, with minimal expense-reporting overhead; this will always keep R0ML’s Ratio at 1.0, and thus, you will never be a bozo.

It is always better to do that the first time you are purchasing a new software tool, because the first time making such a purchase you (almost by definition) have no information about “units used” yet. You have no idea — you cannot have any idea — if you are a bozo or not.

If you don’t know who the bozo is, it’s probably you.

Acknowledgments

Thank you for reading, and especially thank you to my patrons who are supporting my writing on this blog. Of course, extra thanks to dad for, like, having this idea and doing most of the work here beyond my transcription. If you like my dad’s ideas and you’d like to post more of them, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

One of my other favorite posts on this blog was just stealing another one of his ideas, so hopefully this one will be good too. ↩
This concept was first developed in 2001, but it has some implications for extremely recent developments in the software industry; but that’s a post for another day. ↩

by Glyph at August 09, 2025 04:41 AM

August 08, 2025

Glyph Lefkowitz

The Best Line Length

What’s a good maximum line length for your coding standard?

This is, of course, a trick question. By posing it as a question, I have created the misleading impression that it is a question, but Black has selected the correct number for you; it’s 88 which is obviously very lucky.

Thanks for reading my blog.

OK, OK. Clearly, there’s more to it than that. This is an age-old debate on the level of “tabs versus spaces”. So contentious, in fact, that even the famously opinionated Black does in fact let you change it.

Ancient History

One argument that certain silly people¹ like to make is “why are we wrapping at 80 characters like we are using 80 character teletypes, it’s the 2020s! I have an ultrawide monitor!”. The implication here is that the width of 80-character terminals is an antiquated relic, based entirely around the hardware limitations of a bygone era, and modern displays can put tons of stuff on one line, so why not use that capability?

This feels intuitively true, given the huge disparity between ancient times and now: on my own display, I can comfortably fit about 350 characters on a line. What a shame, to have so much room for so many characters in each line, and to waste it all on blank space!

But... is that true?

I stretched out my editor window all the way to measure that ‘350’ number, but I did not continue editing at that window width. In order to have a more comfortable editing experience, I switched back into writeroom mode, a mode which emulates a considerably more writerly application, which limits each line length to 92 characters, regardless of frame width.

You’ve probably noticed this too. Almost all sites that display prose of any kind limit their width, even on very wide screens.

As silly as that tiny little ribbon of text running down the middle of your monitor might look with a full-screened stereotypical news site or blog, if you full-screen a site that doesn’t set that width-limit, although it makes sense that you can now use all that space up, it will look extremely, almost unreadably bad.

Blogging software does not set a column width limit on your text because of some 80-character-wide accident of history in the form of a hardware terminal.

Similarly, if you really try to use that screen real estate to its fullest for coding, and start editing 200-300 character lines, you’ll quickly notice it starts to feel just a bit weird and confusing. It gets surprisingly easy to lose your place. Rhetorically the “80 characters is just because of dinosaur technology! Use all those ultrawide pixels!” talking point is quite popular, but practically people usually just want a few more characters worth of breathing room, maxing out at 100 characters, far narrower than even the most svelte widescreen.

So maybe those 80 character terminals are holding us back a little bit, but... wait a second. Why were the terminals 80 characters wide in the first place?

Ancienter History

As this lovely Software Engineering Stack Exchange post summarizes, terminals were probably 80 characters because teletypes were 80 characters, and teletypes were probably 80 characters because punch cards were 80 characters, and punch cards were probably 80 characters because that’s just about how many typewritten characters fit onto one line of a US-Letter piece of paper.

Even before typewriters, consider the average newspaper: why do we call a regularly-occurring featured article in a newspaper a “column”? Because broadsheet papers were too wide to have only a single column; they would always be broken into multiple! Far more aggressive than 80 characters, columns in newspapers typically have 30 characters per line.

The first newspaper printing machines were custom designed and could have used whatever width they wanted, so why standardize on something so narrow?³

Science!

There has been a surprising amount of scientific research around this issue, but in brief, there’s a reason here rooted in human physiology: when you read a block of text, you are not consciously moving your eyes from word to word like you’re dragging a mouse cursor, repositioning continuously. Human eyes reading text move in quick bursts of rotation called “saccades”. In order to quickly and accurately move from one line of text to another, the start of the next line needs to be clearly visible in the reader’s peripheral vision in order for them to accurately target it. This limits the angle of rotation that the reader can perform in a single saccade, and, thus, the length of a line that they can comfortably read without hunting around for the start of the next line every time they get to the end.

So, 80 (or 88) characters isn’t too unreasonable for a limit. It’s longer than 30 characters, that’s for sure!

But, surely that’s not all, or this wouldn’t be so contentious in the first place?

Caveats

The screen is wide, though.

The ultrawide aficionados do have a point, even if it’s not really the simple one about “old terminals” they originally thought. Our modern wide-screen displays are criminally underutilized, particularly for text. Even adding in the big chunky file, class, and method tree browser over on the left and the source code preview on the right, a brief survey of a Google Image search for “vs code” shows a lot of editors open with huge, blank areas on the right side of the window.

Big screens are super useful as they allow us to leverage our spatial memories to keep more relevant code around and simply glance around as we think, rather than navigate interactively. But it only works if you remember to do it.

Newspapers allowed us to read a ton of information in one sitting with minimum shuffling by packing in as much as 6 columns of text. You could read a column to the bottom of the page, back to the top, and down again, several times.

Similarly, books fill both of their opposed pages with text at the same time, doubling the amount of stuff you can read at once before needing to turn the page.

You may notice that reading text in a book, even in an ebook app, is more comfortable than reading a ton of text by scrolling around in a web browser. That’s because our eyes are built for saccades, and repeatedly tracking the continuous smooth motion of the page as it scrolls to a stop, then re-targeting the new fixed location to start saccading around from, is literally more physically strenuous on your eye’s muscles!

There’s a reason that the codex was a big technological innovation over the scroll. This is a regression!

Today, the right thing to do here is to make use of horizontally split panes in your text editor or IDE, and just make a bit of conscious effort to set up the appropriate code on screen for the problem you’re working on. However, this is a potential area for different IDEs to really differentiate themselves, and build multi-column continuous-code-reading layouts that allow for buffers to wrap and be navigable newspaper-style.

Similar, modern CSS has shockingly good support for multi-column layouts, and it’s a shame that true multi-column, page-turning layouts are so rare. If I ever figure out a way to deploy this here that isn’t horribly clunky and fighting modern platform conventions like “scrolling horizontally is substantially more annoying and inconsistent than scrolling vertically” maybe I will experiment with such a layout on this blog one day. Until then… just make the browser window narrower so other useful stuff can be in the other parts of the screen, I guess.

Code Isn’t Prose

But, I digress. While I think that columnar layouts for reading prose are an interesting thing more people should experiment with, code isn’t prose.

The metric used for ideal line width, which you may have noticed if you clicked through some of those Wikipedia links earlier, is not “character cells in your editor window”, it is characters per line, or “CPL”.

With an optimal CPL somewhere between 45 and 95, a code-line-width of somewhere around 90 might actually be the best idea, because whitespace uses up your line-width budget. In a typical object-oriented Python program², most of your code ends up indented by at least 8 spaces: 4 for the class scope, 4 for the method scope. Most likely a lot of it is 12, because any interesting code will have at least one conditional or loop. So, by the time you’re done wasting all that horizontal space, a max line length of 90 actually looks more like a maximum of 78... right about that sweet spot from the US-Letter page in the typewriter that we started with.

What about soft-wrap?

In principle, source code is structured information, whose presentation could be fully decoupled from its serialized representation. Everyone could configure their preferred line width appropriate to their custom preferences and the specific physiological characteristics of their eyes, and the code could be formatted according to the language it was expressed in, and “hard wrapping” could be a silly antiquated thing.

The problem with this argument is the same as the argument against “but tabs are semantic indentation”, to wit: nope, no it isn’t. What “in principle” means in the previous paragraph is actually “in a fantasy world which we do not inhabit”. I’d love it if editors treated code this way and we had a rich history and tradition of structured manipulations rather than typing in strings of symbols to construct source code textually. But that is not the world we live in. Hard wrapping is unfortunately necessary to integrate with diff tools.

So what’s the optimal line width?

The exact, specific number here is still ultimately a matter of personal preference.

Hopefully, understanding the long history, science, and underlying physical constraints can lead you to select a contextually appropriate value for your own purposes that will balance ease of reading, integration with the relevant tools in your ecosystem, diff size, presentation in the editors and IDEs that your contributors tend to use, reasonable display in web contexts, on presentation slides, and so on.

But — and this is important — counterpoint:

No it isn’t, you don’t need to select an optimal width, because it’s already been selected for you. It is 88.

Acknowledgments

Thank you for reading, and especially thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

I love the fact that this message is, itself, hard-wrapped to 77 characters. ↩
Let’s be honest; we’re all object-oriented python programmers here, aren’t we? ↩
Unsurprisingly, there are also financial reasons. More, narrower columns meant it was easier to fix typesetting errors and to insert more advertisements as necessary. But readability really did have a lot to do with it, too; scientists were looking at ease of reading as far back as the 1800s. ↩

by Glyph at August 08, 2025 05:37 AM

June 05, 2025

Glyph Lefkowitz

I Think I’m Done Thinking About genAI For Now

The Problem

Like many other self-styled thinky programmer guys, I like to imagine myself as a sort of Holmesian genius, making trenchant observations, collecting them, and then synergizing them into brilliant deductions with the keen application of my powerful mind.

However, several years ago, I had an epiphany in my self-concept. I finally understood that, to the extent that I am usefully clever, it is less in a Holmesian idiom, and more, shall we say, Monkesque.

For those unfamiliar with either of the respective franchises:

Holmes is a towering intellect honed by years of training, who catalogues intentional, systematic observations and deduces logical, factual conclusions from those observations.
Monk, on the other hand, while also a reasonably intelligent guy, is highly neurotic, wracked by unresolved trauma and profound grief. As both a consulting job and a coping mechanism, he makes a habit of erratically wandering into crime scenes, and, driven by a carefully managed jenga tower of mental illnesses, leverages his dual inabilities to solve crimes. First, he is unable to filter out apparently inconsequential details, building up a mental rat’s nest of trivia about the problem; second, he is unable to let go of any minor incongruity, obsessively ruminating on the collection of facts until they all make sense in a consistent timeline.

Perhaps surprisingly, this tendency serves both this fictional wretch of a detective, and myself, reasonably well. I find annoying incongruities in abstractions and I fidget and fiddle with them until I end up building something that a lot of people like, or perhaps something that a smaller number of people get really excited about. At worst, at least I eventually understand what’s going on. This is a self-soothing activity but it turns out that, managed properly, it can very effectively soothe others as well.

All that brings us to today’s topic, which is an incongruity I cannot smooth out or fit into a logical framework to make sense. I am, somewhat reluctantly, a genAI skeptic. However, I am, even more reluctantly, exposed to genAI Discourse every damn minute of every damn day. It is relentless, inescapable, and exhausting.

This preamble about personality should hopefully help you, dear reader, to understand how I usually address problematical ideas by thinking and thinking and fidgeting with them until I manage to write some words — or perhaps a new open source package — that logically orders the ideas around it in a way which allows my brain to calm down and let it go, and how that process is important to me.

In this particular instance, however, genAI has defeated me. I cannot make it make sense, but I need to stop thinking about it anyway. It is too much and I need to give up.

My goal with this post is not to convince anyone of anything in particular — and we’ll get to why that is a bit later — but rather:

to set out my current understanding in one place, including all the various negative feelings which are still bothering me, so I can stop repeating it elsewhere,
to explain why I cannot build a case that I think should be particularly convincing to anyone else, particularly to someone who actively disagrees with me,
in so doing, to illustrate why I think the discourse is so fractious and unresolvable, and finally
to give myself, and hopefully by proxy to give others in the same situation, permission to just peace out of this nightmare quagmire corner of the noosphere.

But first, just because I can’t prove that my interlocutors are Wrong On The Internet, doesn’t mean I won’t explain why I feel like they are wrong.

The Anti-Antis

Most recently, at time of writing, there have been a spate of “the genAI discourse is bad” articles, almost exclusively written from the perspective of, not boosters exactly, but pragmatically minded (albeit concerned) genAI users, wishing for the skeptics to be more pointed and accurate in our critiques. This is anti-anti-genAI content.

I am not going to link to any of these, because, as part of their self-fulfilling prophecy about the “genAI discourse”, they’re also all bad.

Mostly, however, they had very little worthwhile to respond to because they were straw-manning their erstwhile interlocutors. They are all getting annoyed at “bad genAI criticism” while failing to engage with — and often failing to even mention — most of the actual substance of any serious genAI criticism. At least, any of the criticism that I’ve personally read.

I understand wanting to avoid a callout or Gish-gallop culture and just express your own ideas. So, I understand that they didn’t link directly to particular sources or go point-by-point on anyone else’s writing. Obviously I get it, since that’s exactly what this post is doing too.

But if you’re going to talk about how bad the genAI conversation is, without even mentioning huge categories of problem like “climate impact” or “disinformation”¹ even once, I honestly don’t know what conversation you’re even talking about. This is peak “make up a guy to get mad at” behavior, which is especially confusing in this circumstance, because there’s an absolutely huge crowd of actual people that you could already be mad at.

The people writing these pieces have historically seemed very thoughtful to me. Some of them I know personally. It is worrying to me that their critical thinking skills appear to have substantially degraded specifically after spending a bunch of time intensely using this technology which I believe has a scary risk of degrading one’s critical thinking skills. Correlation is not causation or whatever, and sure, from a rhetorical perspective this is “post hoc ergo propter hoc” and maybe a little “ad hominem” for good measure, but correlation can still be concerning.

Yet, I cannot effectively respond to these folks, because they are making a practical argument that I cannot, despite my best efforts, find compelling evidence to refute categorically. My experiences of genAI are all extremely bad, but that is barely even anecdata. Their experiences are neutral-to-positive. Little scientific data exists. How to resolve this?²

The Aesthetics

As I begin to state my own position, let me lead with this: my factual analysis of genAI is hopelessly negatively biased. I find the vast majority of the aesthetic properties of genAI to be intensely unpleasant.

I have been trying very hard to correct for this bias, to try to pay attention to the facts and to have a clear-eyed view of these systems’ capabilities. But the feelings are visceral, and the effort to compensate is tiring. It is, in fact, the desire to stop making this particular kind of effort that has me writing up this piece and trying to take an intentional break from the subject, despite its intense relevance.

When I say its “aesthetic qualities” are unpleasant, I don’t just mean the aesthetic elements of output of genAIs themselves. The aesthetic quality of genAI writing, visual design, animation and so on, while mostly atrocious, is also highly variable. There are cherry-picked examples which look… fine. Maybe even good. For years now, there have been, famously, literally award-winning aesthetic outputs of genAI³.

While I am ideologically predisposed to see any “good” genAI art as accruing the benefits of either a survivorship bias from thousands of terrible outputs or simple plagiarism rather than its own inherent quality, I cannot deny that in many cases it is “good”.

However, I am not just talking about the product, but the process; the aesthetic experience of interfacing with the genAI system itself, rather than the aesthetic experience of the outputs of that system.

I am not a visual artist and I am not really a writer⁴, particularly not a writer of fiction or anything else whose experience is primarily aesthetic. So I will speak directly to the experience of software development.

I have seen very few successful examples of using genAI to produce whole, working systems. There are no shortage of highly public miserable failures, particularly from the vendors of these systems themselves, where the outputs are confused, self-contradictory, full of subtle errors and generally unusable. While few studies exist, it sure looks like this is an automated way of producing a Net Negative Productivity Programmer, throwing out chaff to slow down the rest of the team.⁵

Juxtapose this with my aforementioned psychological motivations, to wit, I want to have everything in the computer be orderly and make sense, I’m sure most of you would have no trouble imagining that sitting through this sort of practice would make me extremely unhappy.

Despite this plethora of negative experiences, executives are aggressively mandating the use of AI⁶. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.⁷

Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.

But, at least in that scenario, the thing ultimately doesn’t work, so there’s a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down.

I am inclined to believe that, in fact, it doesn’t work well enough to be used this way, and that we are going to see a big crash. But that is not the most aesthetically distressing thing. The most distressing thing is that maybe it does work; if not well enough to actually do the work, at least ambiguously enough to fool the executives long-term.

This project, in particular, stood out to me as an example. Its author, a self-professed “AI skeptic” who “thought LLMs were glorified Markov chain generators that didn’t actually understand code and couldn’t produce anything novel”, did a green-field project to test this hypothesis.

Now, this particular project is not totally inconsistent with a world in which LLMs cannot produce anything novel. One could imagine that, out in the world of open source, perhaps there is enough “OAuth provider written in TypeScript” blended up into the slurry of “borrowed⁸” training data that the minor constraint of “make it work on Cloudflare Workers” is a small tweak⁹. It is not fully dispositive of the question of the viability of “genAI coding”.

But it is a data point related to that question, and thus it did make me contend with what might happen if it were actually a fully demonstrative example. I reviewed the commit history, as the author suggested. For the sake of argument, I tried to ask myself if I would like working this way. Just for clarity on this question, I wanted to suspend judgement about everything else; assuming:

the model could be created with ethically, legally, voluntarily sourced training data
its usage involved consent from labor rather than authoritarian mandates
sensible levels of energy expenditure, with minimal CO2 impact
it is substantially more efficient to work this way than to just write the code yourself

and so on, and so on… would I like to use this magic robot that could mostly just emit working code for me? Would I use it if it were free, in all senses of the word?

No. I absolutely would not.

I found the experience of reading this commit history and imagining myself using such a tool — without exaggeration — nauseating.

Unlike many programmers, I love code review. I find that it is one of the best parts of the process of programming. I can help people learn, and develop their skills, and learn from them, and appreciate the decisions they made, develop an impression of a fellow programmer’s style. It’s a great way to build a mutual theory of mind.

Of course, it can still be really annoying; people make mistakes, often can’t see things I find obvious, and in particular when you’re reviewing a lot of code from a lot of different people, you often end up having to repeat explanations of the same mistakes. So I can see why many programmers, particularly those more introverted than I am, hate it.

But, ultimately, when I review their code and work hard to provide clear and actionable feedback, people learn and grow and it’s worth that investment in inconvenience.

The process of coding with an “agentic” LLM appears to be the process of carefully distilling all the worst parts of code review, and removing and discarding all of its benefits.

The lazy, dumb, lying robot asshole keeps making the same mistakes over and over again, never improving, never genuinely reacting, always obsequiously pretending to take your feedback on board.

Even when it “does” actually “understand” and manages to load your instructions into its context window, 200K tokens later it will slide cleanly out of its memory and you will have to say it again.

All the while, it is attempting to trick you. It gets most things right, but it consistently makes mistakes in the places that you are least likely to notice. In places where a person wouldn’t make a mistake. Your brain keeps trying to develop a theory of mind to predict its behavior but there’s no mind there, so it always behaves infuriatingly randomly.

I don’t think I am the only one who feels this way.

The Affordances

Whatever our environments afford, we tend to do more of. Whatever they resist, we tend to do less of. So in a world where we were all writing all of our code and emails and blog posts and texts to each other with LLMs, what do they afford that existing tools do not?

As a weirdo who enjoys code review, I also enjoy process engineering. The central question of almost all process engineering is to continuously ask: how shall we shape our tools, to better shape ourselves?

LLMs are an affordance for producing more text, faster. How is that going to shape us?

Again arguing in the alternative here, assuming the text is free from errors and hallucinations and whatever, it’s all correct and fit for purpose, that means it reduces the pain of circumstances where you have to repeat yourself. Less pain! Sounds great; I don’t like pain.

Every codebase has places where you need boilerplate. Every organization has defects in its information architecture that require repetition of certain information rather than a link back to the authoritative source of truth. Often, these problems persist for a very long time, because it is difficult to overcome the institutional inertia required to make real progress rather than going along with the status quo. But this is often where the highest-value projects can be found. Where there’s muck, there’s brass.

The process-engineering function of an LLM, therefore, is to prevent fundamental problems from ever getting fixed, to reward the rapid-fire overwhelm of infrastructure teams with an immediate, catastrophic cascade of legacy code that is now much harder to delete than it is to write.

There is a scene in Game of Thrones where Khal Drogo kills himself. He does so by replacing a stinging, burning, therapeutic antiseptic wound dressing with some cool, soothing mud. The mud felt nice, addressed the immediate pain, removed the discomfort of the antiseptic, and immediately gave him a lethal infection.

The pleasing feeling of immediate progress when one prompts an LLM to solve some problem feels like cool mud on my brain.

The Economics

We are in the middle of a mania around this technology. As I have written about before, I believe the mania will end. There will then be a crash, and a “winter”. But, as I may not have stressed sufficiently, this crash will be the biggest of its kind — so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero. Meanwhile, that cost keeps going up, and up, and up.

Others have reported on this in detail¹⁰, and I will not reiterate that all here, but in addition to being a looming and scary industry-wide (if we are lucky; more likely it’s probably “world-wide”) economic threat, it is also going to drive some panicked behavior from management.

Panicky behavior from management stressed that their idea is not panning out is, famously, the cause of much human misery. I expect that even in the “good” scenario, where some profit is ultimately achieved, will still involve mass layoffs rocking the industry, panicked re-hiring, destruction of large amounts of wealth.

It feels bad to think about this.

The Energy Usage

For a long time I believed that the energy impact was overstated. I am even on record, about a year ago, saying I didn’t think the energy usage was a big deal. I think I was wrong about that.

It initially seemed like it was letting regular old data centers off the hook. But recently I have learned that, while the numbers are incomplete because the vendors aren’t sharing information, they’re also extremely bad.¹¹

I think there’s probably a version of this technology that isn’t a climate emergency nightmare, but that’s not the version that the general public has access to today.

The Educational Impact

LLMs are making academic cheating incredibly rampant.¹²

Not only is it so common as to be nearly universal, it’s also extremely harmful to learning.¹³

For learning, genAI is a forklift at the gym.

To some extent, LLMs are simply revealing a structural rot within education and academia that has been building for decades if not centuries. But it was within those inefficiencies and the inconveniences of the academic experience that real learning was, against all odds, still happening in schools.

LLMs produce a frictionless, streamlined process where students can effortlessly glide through the entire credential, learning nothing. Once again, they dull the pain without regard to its cause.

This is not good.

The Invasion of Privacy

This is obviously only a problem with the big cloud models, but then, the big cloud models are the only ones that people actually use. If you are having conversations about anything private with ChatGPT, you are sending all of that private information directly to Sam Altman, to do with as he wishes.

Even if you don’t think he is a particularly bad guy, maybe he won’t even create the privacy nightmare on purpose. Maybe he will be forced to do so as a result of some bizarre kafkaesque accident.¹⁴

Imagine the scenario, for example, where a woman is tracking her cycle and uploading the logs to ChatGPT so she can chat with it about a health concern. Except, surprise, you don’t have to imagine, you can just search for it, as I have personally, organically, seen three separate women on YouTube, at least one of whom lives in Texas, not only do this on camera but recommend doing this to their audiences.

Citation links withheld on this particular claim for hopefully obvious reasons.

I assure you that I am neither particularly interested in menstrual products nor genAI content, and if I am seeing this more than once, it is probably a distressingly large trend.

The Stealing

The training data for LLMs is stolen. I don’t mean like “pirated” in the sense where someone illicitly shares a copy they obtained legitimately; I mean their scrapers are ignoring both norms¹⁵ and laws¹⁶ to obtain copies under false pretenses, destroying other people’s infrastructure¹⁷ in the process.

The Fatigue

I have provided references to numerous articles outlining rhetorical and sometimes data-driven cases for the existence of certain properties and consequences of genAI tools. But I can’t prove any of these properties, either at a point in time or as a durable ongoing problem.

The LLMs themselves are simply too large to model with the usual kind of heuristics one would use to think about software. I’d sooner be able to predict the physics of dice in a casino than a 2 trillion parameter neural network. They resist scientific understanding, not just because of their size and complexity, but because unlike a natural phenomenon (which could of course be considerably larger and more complex) they resist experimentation.

The first form of genAI resistance to experiment is that every discussion is a motte-and-bailey. If I use a free model and get a bad result I’m told it’s because I should have used the paid model. If I get a bad result with ChatGPT I should have used Claude. If I get a bad result with a chatbot I need to start using an agentic tool. If an agentic tool deletes my hard drive by putting os.system(“rm -rf ~/”) into sitecustomize.py then I guess I should have built my own MCP integration with a completely novel heretofore never even considered security sandbox or something?

What configuration, exactly, would let me make a categorical claim about these things? What specific methodological approach should I stick to, to get reliably adequate prompts?

For the record though, if the idea of the free models is that they are going to be provocative demonstrations of the impressive capabilities of the commercial models, and the results are consistently dogshit, I am finding it increasingly hard to care how much better the paid ones are supposed to be, especially since the “better”-ness cannot really be quantified in any meaningful way.

The motte-and-bailey doesn’t stop there though. It’s a war on all fronts. Concerned about energy usage? That’s OK, you can use a local model. Concerned about infringement? That’s okay, somewhere, somebody, maybe, has figured out how to train models consensually¹⁸. Worried about the politics of enriching the richest monsters in the world? Don’t worry, you can always download an “open source” model from Hugging Face. It doesn’t matter that many of these properties are mutually exclusive and attempting to fix one breaks two others; there’s always an answer, the field is so abuzz with so many people trying to pull in so many directions at once that it is legitimately difficult to understand what’s going on.

Even here though, I can see that characterizing everything this way is unfair to a hypothetical sort of person. If there is someone working at one of these thousands of AI companies that have been springing up like toadstools after a rain, and they really are solving one of these extremely difficult problems, how can I handwave that away? We need people working on problems, that’s like, the whole point of having an economy. And I really don’t like shitting on other people’s earnest efforts, so I try not to dismiss whole fields. Given how AI has gotten into everything, in a way that e.g. cryptocurrency never did, painting with that broad a brush inevitably ends up tarring a bunch of stuff that isn’t even really AI at all.

The second form of genAI resistance to experiment is the inherent obfuscation of productization. The models themselves are already complicated enough, but the products that are built around the models are evolving extremely rapidly. ChatGPT is not just a “model”, and with the rapid¹⁹ deployment of Model Context Protocol tools, the edges of all these things will blur even further. Every LLM is now just an enormous unbounded soup of arbitrary software doing arbitrary whatever. How could I possibly get my arms around that to understand it?

The Challenge

I have woefully little experience with these tools.

I’ve tried them out a little bit, and almost every single time the result has been a disaster that has not made me curious to push further. Yet, I keep hearing from all over the industry that I should.

To some extent, I feel like the motte-and-bailey characterization above is fair; if the technology itself can really do real software development, it ought to be able to do it in multiple modalities, and there’s nothing anyone can articulate to me about GPT-4o which puts it in a fundamentally different class than GPT-3.5.

But, also, I consistently hear that the subjective experience of using the premium versions of the tools is actually good, and the free ones are actually bad.

I keep struggling to find ways to try them “the right way”, the way that people I know and otherwise respect claim to be using them, but I haven’t managed to do so in any meaningful way yet.

I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I’d like to use a local model. But there is obviously not a nicely composed way to use local models like this.

Since there are apparently zero models with ethically-sourced training data, and litigation is ongoing²⁰ to determine the legal relationships of training data and outputs, even if I can be comfortable with some level of plagiarism on a project, I don’t feel that I can introduce the existential legal risk into other people’s infrastructure, so I would need to make a new project.

Others have differing opinions of course, including some within my dependency chain, which does worry me, but I still don’t feel like I can freely contribute further to the problem; it’s going to be bad enough to unwind any impact upstream. Even just for my own sake, I don’t want to make it worse.

This especially presents a problem because I have way too much stuff going on already. A new project is not practical.

Finally, even if I did manage to satisfy all of my quirky²¹ constraints, would this experiment really be worth anything? The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn’t really be addressing my opponents’ views.

I’m stuck.

The Surrender

I am writing this piece to make my peace with giving up on this topic, at least for a while. While I do idly hope that some folks might find bits of it convincing, and perhaps find ways to be more mindful with their own usage of genAI tools, and consider the harm they may be causing, that’s not actually the goal. And that is not the goal because it is just so much goddamn work to prove.

Here, I must return to my philosophical hobbyhorse of sprachspiel. In this case, specifically to use it as an analytical tool, not just to understand what I am trying to say, but what the purpose for my speech is.

The concept of sprachspiel is most frequently deployed to describe the goal of the language game being played, but in game theory, that’s only half the story. Speech — particularly rigorously justified speech — has a cost, as well as a benefit. I can make shit up pretty easily, but if I want to do anything remotely like scientific or academic rigor, that cost can be astronomical. In the case of developing an abstract understanding of LLMs, the cost is just too high.

So what is my goal, then? To be king Canute, standing astride the shore of “tech”, whatever that is, commanding the LLM tide not to rise? This is a multi-trillion dollar juggernaut.

Even the rump, loser, also-ran fragment of it has the power to literally suffocate us in our homes²² if they so choose, completely insulated from any consequence. If the power curve starts there, imagine what the winners in this industry are going to be capable of, irrespective of the technology they’re building - just with the resources they have to hand. Am I going to write a blog post that can rival their propaganda apparatus? Doubtful.

Instead, I will just have to concede that maybe I’m wrong. I don’t have the skill, or the knowledge, or the energy, to demonstrate with any level of rigor that LLMs are generally, in fact, hot garbage. Intellectually, I will have to acknowledge that maybe the boosters are right. Maybe it’ll be OK.

Maybe the carbon emissions aren’t so bad. Maybe everybody is keeping them secret in ways that they don’t for other types of datacenter for perfectly legitimate reasons. Maybe the tools really can write novel and correct code, and with a little more tweaking, it won’t be so difficult to get them to do it. Maybe by the time they become a mandatory condition of access to developer tools, they won’t be miserable.

Sure, I even sincerely agree, intellectual property really has been a pretty bad idea from the beginning. Maybe it’s OK that we’ve made an exception to those rules. The rules were stupid anyway, so what does it matter if we let a few billionaires break them? Really, everybody should be able to break them (although of course, regular people can’t, because we can’t afford the lawyers to fight off the MPAA and RIAA, but that’s a problem with the legal system, not tech).

I come not to praise “AI skepticism”, but to bury it.

Maybe it really is all going to be fine. Perhaps I am simply catastrophizing; I have been known to do that from time to time. I can even sort of believe it, in my head. Still, even after writing all this out, I can’t quite manage to believe it in the pit of my stomach.

Unfortunately, that feeling is not something that you, or I, can argue with.

Acknowledgments

Thank you to my patrons. Normally, I would say, “who are supporting my writing on this blog”, but in the case of this piece, I feel more like I should apologize to them for this than to thank them; these thoughts have been preventing me from thinking more productive, useful things that I actually have relevant skill and expertise in; this felt more like a creative blockage that I just needed to expel than a deliberately written article. If you like what you’ve read here and you’d like to read more of it, well, too bad; I am sincerely determined to stop writing about this topic. But, if you’d like to read more stuff like other things I have written, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

And yes, disinformation is still an issue even if you’re “just” using it for coding. Even sidestepping the practical matter that technology is inherently political, validation and propagation of poor technique is a form of disinformation. ↩
I can’t resolve it, that’s the whole tragedy here, but I guess we have to pretend I will to maintain narrative momentum here. ↩
The story in Creative Bloq, or the NYT, if you must ↩
although it’s not for lack of trying, Jesus, look at the word count on this ↩
These are sometimes referred to as “10x” programmers, because they make everyone around them 10x slower. ↩
Douglas B. Laney at Forbes, Viral Shopify CEO Manifesto Says AI Now Mandatory For All Employees ↩
The National CIO Review, AI Mandates, Minimal Use: Closing the Workplace Readiness Gap ↩
Matt O’Brien at the AP, Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude ↩
Using the usual tricks to find plagiarism like searching for literal transcriptions of snippets of training data did not pull up anything when I tried, but then, that’s not how LLMs work these days, is it? If it didn’t obfuscate the plagiarism it wouldn’t be a very good plagiarism-obfuscator. ↩
David Gerard at Pivot to AI, “Microsoft and AI: spending billions to make millions”, Edward Zitron at Where’s Your Ed At, “The Era Of The Business Idiot”, both sobering reads ↩
James O’Donnell and Casey Crownhart at the MIT Technology Review, We did the math on AI’s energy footprint. Here’s the story you haven’t heard. ↩
Lucas Ropek at Gizmodo, AI Cheating Is So Out of Hand In America’s Schools That the Blue Books Are Coming Back ↩
James D. Walsh at the New York Magazine Intelligencer, Everyone Is Cheating Their Way Through College ↩
Ashley Belanger at Ars Technica, OpenAI slams court order to save all ChatGPT logs, including deleted chats ↩
Ashley Belanger at Ars Technica, AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt ↩
Blake Brittain at Reuters, Judge in Meta case warns AI could ‘obliterate’ market for original works ↩
Xkeeper, TCRF has been getting DDoSed ↩
Kate Knibbs at Wired, Here’s Proof You Can Train an AI Model Without Slurping Copyrighted Content ↩
and, I should note, extremely irresponsible ↩
Porter Anderson at Publishing Perspectives, Meta AI Lawsuit: US Publishers File Amicus Brief ↩
It feels bizarre to characterize what feel like baseline ethical concerns this way, but the fact remains that within the “genAI community”, this places me into a tiny and obscure minority. ↩
Ariel Wittenberg for Politico, ‘How come I can’t breathe?’: Musk’s data company draws a backlash in Memphis ↩

by Glyph at June 05, 2025 05:22 AM

April 17, 2025

Glyph Lefkowitz

Stop Writing `init` Methods

The History

Before dataclasses were added to Python in version 3.7 — in June of 2018 — the __init__ special method had an important use. If you had a class representing a data structure — for example a 2DCoordinate, with x and y attributes — you would want to be able to construct it as 2DCoordinate(x=1, y=2), which would require you to add an __init__ method with x and y parameters.

The other options available at the time all had pretty bad problems:

You could remove 2DCoordinate from your public API and instead expose a make_2d_coordinate function and make it non-importable, but then how would you document your return or parameter types?
You could document the x and y attributes and make the user assign each one themselves, but then 2DCoordinate() would return an invalid object.
You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all 2DCoordinate objects to be not just mutable, but mutated at every call site.
You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, typing.Protocol didn’t even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.

Also, an __init__ method that does nothing but assign a few attributes doesn’t have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.

However, by accepting “define a custom __init__” as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.

Wherever there is arbitrary code, there are arbitrary problems.

The Problems

Let’s consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader.

Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.

Let’s assume a world where we have the following functions, in an imaginary fileio module:

open(path: str) -> int
read(fileno: int, length: int)
close(fileno: int)

Our hypothetical fileio.open returns an integer representing a file descriptor¹, fileio.read allows us to read length bytes from an open file descriptor, and fileio.close closes that file descriptor, invalidating it for future use.

With the habit that we have built from writing thousands of __init__ methods, we might want to write our FileReader class like this:

class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = fileio.open(path)
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

For our initial use-case, this is fine. Client code creates a FileReader by doing something like FileReader("./config.json"), which always creates a FileReader that maintains its file descriptor int internally as private state. This is as it should be; we don’t want user code to see or mess with _fd, as that might violate FileReader’s invariants. All the necessary work to construct a valid FileReader — i.e. the call to open — is always taken care of for you by FileReader.__init__.

However, additional requirements will creep in, and as they do, FileReader.__init__ becomes increasingly awkward.

Initially we only care about fileio.open, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open by itself, and wants to give us an int that we use as our _fd, we now have to resort to weird workarounds like:

def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd’s type signature, which takes a plain int, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int.

Testing is much more of a hassle, because we have to patch in our own copy of fileio.open any time we want an instance of a FileReader in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader s for testing purposes.

All of this also assumes a fileio.open that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you’ve ever found yourself wanting to type async def __init__(self): ... then you have seen this limitation in practice.

Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are “often” associated with it, then by definition they are not “always” associated with it, and all the cases where they aren’t associated become unweildy and potentially broken.

Defining an __init__ is an anti-pattern, and we need a replacement for it.

The Solutions

I believe this tripartite assemblage of design techniques will address the problems raised above:

using dataclass to define attributes,
replacing behavior that previously would have previously been in __init__ with a new classmethod that does the same thing, and
using precise types to describe what a valid instance looks like.

Using `dataclass` attributes to create an `init` for you

To begin, let’s refactor FileReader into a dataclass. This does get us an __init__ method, but it won’t be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.

@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

Except... oops. In fixing the problems that we created with our custom __init__ that calls fileio.open, we have re-introduced several problems that it solved:

We have removed all the convenience of FileReader("path"). Now the user needs to import the low-level fileio.open again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a FileReader in a practical scenario, we will have to add something in our documentation to point at a separate module entirely.
There’s no enforcement of the validity of _fd as a file descriptor; it’s just some integer, which the user could easily pass an incorrect instance of, with no error.

In isolation, dataclass by itself can’t solve all our problems, so let’s add in the second technique.

Using `classmethod` factories to create objects

We don’t want to require any additional imports, or require users to go looking at any other modules — or indeed anything other than FileReader itself — to figure out how to create a FileReader for its intended usage.

Luckily we have a tool that can easily address all of these concerns at once: @classmethod. Let’s define a FileReader.open class method:

from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(fileio.open(path))

Now, your callers can replace FileReader("path") with FileReader.open("path"), and get all the same benefits.

Additionally, if we needed to await fileio.open(...), and thus we needed its signature to be @classmethod async def open, we are freed from the constraint of __init__ as a special method. There is nothing that would prevent a @classmethod from being async, or indeed, from having any other modification to its return value, such as returning a tuple of related values rather than just the object being constructed.

Using `NewType` to address object validity

Next, let’s address the slightly trickier issue of enforcing object validity.

Our type signature calls this thing an int, and indeed, that is unfortunately what the lower-level fileio.open gives us, and that’s beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType:

from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let’s just insist to Mypy that we have versions of fileio.open, fileio.read, and fileio.write which actually already take FileDescriptor integers rather than regular ones.

from typing import Callable
_open: Callable[[str], FileDescriptor] = fileio.open  # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = fileio.read
_close: Callable[[FileDescriptor], None] = fileio.close

We do of course have to slightly adjust FileReader, too, but the changes are very small. Putting it all together, we get:

from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Note that the main technique here is not necessarily using NewType specifically, but rather aligning an instance’s property of “has all attributes set” as closely as possible with an instance’s property of “fully valid instance of its class”; NewType is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int, str or bytes.

In Summary - The New Best Practice

From now on, when you’re defining a new Python class:

Make it a dataclass².
Use its default __init__ method³.
Add @classmethods to provide your users convenient and discoverable ways to build your objects.
Require that all dependencies be satisfied by attributes, so you always start with a valid object.
Use typing.NewType to enforce any constraints on primitive data types (like int and str) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.

If you define all your classes this way, you will get all the benefits of a custom __init__ method:

All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.

Along with some nice new benefits:

You will be future-proofed against new requirements for different ways that users may need to construct your object.
If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like def __init__(self, maybe_a_filename: int | str | None = None):
Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.

Before dataclasses, it was always a bit weird that such a basic feature of the Python language — giving data to a data structure to make it valid — required overriding a method with 4 underscores in its name. __init__ stuck out like a sore thumb. Other such methods like __add__ or even __repr__ were inherently customizing esoteric attributes of classes.

For many years now, that historical language wart has been resolved. @dataclass, @classmethod, and NewType give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “but what is a ‘class’, really?”.

If you aren’t already familiar, a “file descriptor” is an integer which has meaning only within your program; you tell the operating system to open a file, it says “I have opened file 7 for you”, and then whenever you refer to “7” it is that file, until you close(7). ↩
Or an attrs class, if you’re nasty. ↩
Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a “range” object where start must always be less than end. There are always exceptions to these types of rules. Still, it’s pretty much never a good idea to do any I/O in __init__, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a __post_init__ rather than writing a literal __init__. ↩

by Glyph at April 17, 2025 10:35 PM

April 01, 2025

Glyph Lefkowitz

A Bigger Database

A Database File

When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.

FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.

As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.

As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn’t collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.

Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it “Claris’s Basilisk”, for obvious reasons.

Therefore, a less-generous (or more clinically-minded) observer might have replaced the word “systematically” with “obsessively” in the assessment above.

I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is “was it possible” to which my answer was “yes”.

If I was obliged to collect all the information which I could observe — in case it later became interesting — I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.

I was an avid reader of science fiction as well.

I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:

human brains are kinda like computers,
the software running in the human brain is very complex,
I could only write simple computer programs, but,
when you really think about it, a “complex” program is just a collection of simpler programs

Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to “keep grinding” as the kids would put it today.

I still feel like this is an understandable way to think — if you are a highly depressed and anxious 10-year-old in 1990.

Anyway.

35 Years Later

OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.

A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to “X™, The Everything App™,” and described the trajectory of his career of the last decade at OpenAI as, and I quote, a “grind for a decade trying to help make super-intelligence to cure cancer or whatever” (emphasis mine).

I really, really don’t want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.

This is not a system-design strategy. It is a trauma response.

You can’t cure cancer “or whatever”. If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.

Aside: But... are they, though?

I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like “you can’t cure cancer this way” I need to back it up.

My first argument — and possibly my strongest — is that cancer is not cured.

QED.

But I guess, to Sam’s credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described “initial phase” and it’s not entirely clear that it is going to work out very well.

Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can’t easily find any independently-verified information.

A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!

But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):

The most promising use of AI in healthcare right now is automating “mundane” tasks like paperwork and physician note-taking, he said. The tendency for AI models to “hallucinate” and contain bias presents serious risks for using AI to replace doctors. Both Color’s Laraki and OpenAI’s Lightcap are adamant that doctors be involved in any clinical decisions.

I would probably not personally characterize “‘mundane’ tasks like paperwork and … note-taking” as “curing cancer”. Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn’t be stealing valor from them on the curing-cancer part of their job.

Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.

It would not even matter if it were “super-intelligence”, whatever that means, because “intelligence” is not how you do medical care or medical research. It’s called “lab work” not “lab think”.

To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.

Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not “an experiment”, but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is “training data”) might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.

Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and “AI” would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.

Knowing Things

The practice of science — indeed any practice of the collection of meaningful information — must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.

Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.

Eventually, we’ll just run out of disk space, and then lose the database file when the family gets a new computer anyway.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.

by Glyph at April 01, 2025 12:47 AM

January 15, 2025

Glyph Lefkowitz

Small PINPal Update

Today on stream, I updated PINPal to fix the memorization algorithm.

If you haven’t heard of PINPal before, it is a vault password memorization tool. For more detail on what that means, you can check it out the README, and why not give it a ⭐ while you’re at it.

As I started writing up an update post I realized that I wanted to contextualize it a bit more, because it’s a tool I really wish were more popular. It solves one of those small security problems that you can mostly ignore, right up until the point where it’s a huge problem and it’s too late to do anything about it.

In brief, PINPal helps you memorize new secure passcodes for things you actually have to remember and can’t simply put into your password manager, like the password to your password manager, your PC user account login, your email account¹, or the PIN code to your phone or debit card.

Too often, even if you’re properly using a good password manager for your passwords, you’ll be protecting it with a password optimized for memorability, which is to say, one that isn’t random and thus isn’t secure. But I have also seen folks veer too far in the other direction, trying to make a really secure password that they then forget right after switching to a password manager. Forgetting your vault password can also be a really big deal, making you do password resets across every app you’ve loaded into it so far, so having an opportunity to practice it periodically is important.

PINPal uses spaced repetition to ensure that you remember the codes it generates.

While periodic forced password resets are a bad idea, if (and only if!) you can actually remember the new password, it is a good idea to get rid of old passwords eventually — like, let’s say, when you get a new computer or phone. Doing so reduces the risk that a password stored somewhere on a very old hard drive or darkweb data dump is still floating around out there, forever haunting your current security posture. If you do a reset every 2 years or so, you know you’ve never got more than 2 years of history to worry about.

PINPal is also particularly secure in the way it incrementally generates your password; the computer you install it on only ever stores the entire password in memory when you type it in. It stores even the partial fragments that you are in the process of memorizing using the secure keyring module, avoiding plain-text whenever possible.

I’ve been using PINPal to generate and memorize new codes for a while, just in case², and the change I made today was because encountered a recurring problem. The problem was, I’d forget a token after it had been hidden, and there was never any going back. The moment that a token was hidden from the user, it was removed from storage, so you could never get a reminder. While I’ve successfully memorized about 10 different passwords with it so far, I’ve had to delete 3 or 4.

So, in the updated algorithm, the visual presentation now hides tokens in the prompt several memorizations before they’re removed. Previously, if the password you were generating was ‘hello world’, you’d see hello world 5 times or so, times, then •••• world; if you ever got it wrong past that point, too bad, start over. Now, you’ll see hello world, then °°°° world, then after you have gotten the prompt right without seeing the token a few times, you’ll see •••• world after the backend has locked it in and it’s properly erased from your computer.

If you get the prompt wrong, breaking your streak reveals the recently-hidden token until you get it right again. I also did a new release on that same livestream, so if this update sounds like it might make the memorization process more appealing, check it out via pip install pinpal today.

Right now this tool is still only extremely for a specific type of nerd — it’s command-line only, and you probably need to hand-customize your shell prompt to invoke it periodically. But I’m working on making it more accessible to a broader audience. It’s open source, of course, so you can feel free to contribute your own code!

Acknowledgments

Your email account password can be stored in your password manager, of course, but given that email is the root-of-trust reset factor for so many things, being able to remember that password is very helpful in certain situations. ↩
Funny story: at one point, Apple had an outage which made it briefly appear as if a lot of people needed to reset their iCloud passwords, myself included. Because I’d been testing PINPal a bunch, I actually had several highly secure random passwords already memorized. It was a strange feeling to just respond to the scary password reset prompt with a new, highly secure password and just continue on with my day secure in the knowledge I wouldn't forget it. ↩

by Glyph at January 15, 2025 12:54 AM

January 10, 2025

Glyph Lefkowitz

The “Active Enum” Pattern

Have you ever written some Python code that looks like this?

from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

def behavior(number: SomeNumber) -> int:
    match number:
        case SomeNumber.one:
            print("one!")
            return 1
        case SomeNumber.two:
            print("two!")
            return 2
        case SomeNumber.three:
            print("three!")
            return 3

That is to say, have you written code that:

defined an enum with several members
associated custom behavior, or custom values, with each member of that enum,
needed one or more match / case statements (or, if you’ve been programming in Python for more than a few weeks, probably a big if/elif/elif/else tree) to do that association?

In this post, I’d like to submit that this is an antipattern; let’s call it the “passive enum” antipattern.

For those of you having a generally positive experience organizing your discrete values with enums, it may seem odd to call this an “antipattern”, so let me first make something clear: the path to a passive enum is going in the correct direction.

Typically - particularly in legacy code that predates Python 3.4 - one begins with a value that is a bare int constant, or maybe a str with some associated values sitting beside in a few global dicts.

Starting from there, collecting all of your values into an enum at all is a great first step. Having an explicit listing of all valid values and verifying against them is great.

But, it is a mistake to stop there. There are problems with passive enums, too:

The behavior can be defined somewhere far away from the data, making it difficult to:
1. maintain an inventory of everywhere it’s used,
2. update all the consumers of the data when the list of enum values changes, and
3. learn about the different usages as a consumer of the API
Logic may be defined procedurally (via if/elif or match) or declaratively (via e.g. a dict whose keys are your enum and whose values are the required associated value).
1. If it’s defined procedurally, it can be difficult to build tools to interrogate it, because you need to parse the AST of your Python program. So it can be difficult to build interactive tools that look at the associated data without just calling the relevant functions.
2. If it’s defined declaratively, it can be difficult for existing tools that do know how to interrogate ASTs (mypy, flake8, Pyright, ruff, et. al.) to make meaningful assertions about it. Does your linter know how to check that a dict whose keys should be every value of your enum is complete?

To refactor this, I would propose a further step towards organizing one’s enum-oriented code: the active enum.

An active enum is one which contains all the logic associated with the first-party provider of the enum itself.

You may recognize this as a more generalized restatement of the object-oriented lens on the principle of “separation of concerns”. The responsibilities of a class ought to be implemented as methods on that class, so that you can send messages to that class via method calls, and it’s up to the class internally to implement things. Enums are no different.

More specifically, you might notice it as a riff on the Active Nothing pattern described in this excellent talk by Sandi Metz, and, yeah, it’s the same thing.

The first refactoring that we can make is, thus, to mechanically move the method from an external function living anywhere, to a method on SomeNumber . At least like this, we present an API to consumers externally that shows that SomeNumber has a behavior method that can be invoked.

from enum import Enum, auto

class SomeNumber(Enum):
    one = auto()
    two = auto()
    three = auto()

    def behavior(self) -> int:
        match self:
            case SomeNumber.one:
                print("one!")
                return 1
            case SomeNumber.two:
                print("two!")
                return 2
            case SomeNumber.three:
                print("three!")
                return 3

However, this still leaves us with a match statement that repeats all the values that we just defined, with no particular guarantee of completeness. To continue the refactoring, what we can do is change the value of the enum itself into a simple dataclass to structurally, by definition, contain all the fields we need:

from dataclasses import dataclass
from enum import Enum
from typing import Callable

@dataclass(frozen=True)
class NumberValue:
    result: int
    effect: Callable[[], None]

class SomeNumber(Enum):
    one = NumberValue(1, lambda: print("one!"))
    two = NumberValue(2, lambda: print("two!"))
    three = NumberValue(3, lambda: print("three!"))

    def behavior(self) -> int:
        self.value.effect()
        return self.value.result

Here, we give SomeNumber members a value of NumberValue, a dataclass that requires a result: int and an effect: Callable to be constructed. Mypy will properly notice that if x is a SomeNumber, that x will have the type NumberValue and we will get proper type checking on its result (a static value) and effect (some associated behaviors)¹.

Note that the implementation of behavior method - still conveniently discoverable for callers, and with its signature unchanged - is now vastly simpler.

But what about...

Lookups?

You may be noticing that I have hand-waved over something important to many enum users, which is to say, by-value lookup. enum.auto will have generated int values for one, two, and three already, and by transforming those into NumberValue instances, I can no longer do SomeNumber(1).

For the simple, string-enum case, one where you might do class MyEnum: value = “value” so that you can do name lookups via MyEnum("value"), there’s a simple solution: use square brackets instead of round ones. In this case, with no matching strings in sight, SomeNumber["one"] still works.

But, if we want to do integer lookups with our dataclass version here, there’s a simple one-liner that will get them back for you; and, moreover, will let you do lookups on whatever attribute you want:

by_result = {each.value.result: each for each in SomeNumber}

`enum.Flag`?

You can do this with Flag more or less unchanged, but in the same way that you can’t expect all your list[T] behaviors to be defined on T, the lack of a 1-to-1 correspondence between Flag instances and their values makes it more complex and out of scope for this pattern specifically.

3rd-party usage?

Sometimes an enum is defined in library L and used in application A, where L provides the data and A provides the behavior. If this is the case, then some amount of version shear is unavoidable; this is a situation where the data and behavior have different vendors, and this means that other means of abstraction are required to keep them in sync. Object-oriented modeling methods are for consolidating the responsibility for maintenance within a single vendor’s scope of responsibility. Once you’re not responsible for the entire model, you can’t do the modeling over all of it, and that is perfectly normal and to be expected.

The goal of the Active Enum pattern is to avoid creating the additional complexity of that shear when it does not serve a purpose, not to ban it entirely.

A Case Study

I was inspired to make this post by a recent refactoring I did from a more obscure and magical² version of this pattern into the version that I am presenting here, but if I am going to call passive enums an “antipattern” I feel like it behooves me to point at an example outside of my own solo work.

So, for a more realistic example, let’s consider a package that all Python developers will recognize from their day-to-day work, python-hearthstone, the Python library for parsing the data files associated with Blizzard’s popular computerized collectible card game Hearthstone.

As I’m sure you already know, there are a lot of enums in this library, but for one small case study, let’s look a few of the methods in hearthstone.enums.GameType.

GameType has already taken the “step 1” in the direction of an active enum, as I described above: as_bnet is an instancemethod on GameType itself, making it at least easy to see by looking at the class definition what operations it supports. However, in the implementation of that method (among many others) we can see the worst of both worlds:

class GameType(IntEnum):
    def as_bnet(self, format: FormatType = FormatType.FT_STANDARD):
        if self == GameType.GT_RANKED:
            if format == FormatType.FT_WILD:
                return BnetGameType.BGT_RANKED_WILD
            elif format == FormatType.FT_STANDARD:
                return BnetGameType.BGT_RANKED_STANDARD
            # ...
            else:
                raise ValueError()
        # ...
        return {
            GameType.GT_UNKNOWN: BnetGameType.BGT_UNKNOWN,
            # ...
            GameType.GT_BATTLEGROUNDS_DUO_FRIENDLY: BnetGameType.BGT_BATTLEGROUNDS_DUO_FRIENDLY,
        }[self]

We have procedural code mixed with a data lookup table; raise ValueError mixed together with value returns. Overall, it looks like this might be hard to maintain this going forward, or to see what’s going on without a comprehensive understanding of the game being modeled. Of course for most python programmers that understanding can be assumed, but, still.

If GameType were refactored in the manner above³, you’d be able to look at the member definition for GT_RANKED and see a mapping of FormatType to BnetGameType, or GT_BATTLEGROUNDS_DUO_FRIENDLY to see an unconditional value of BGT_BATTLEGROUNDS_DUO_FRIENDLY. Given that this enum has 40 elements, with several renamed or removed, it seems reasonable to expect that more will be added and removed as the game is developed.

Conclusion

If you have large enums that change over time, consider placing the responsibility for the behavior of the values alongside the values directly, and any logic for processing the values as methods of the enum. This will allow you to quickly validate that you have full coverage of any data that is required among all the different members of the enum, and it will allow API clients a convenient surface to discover the capabilities associated with that enum.

Acknowledgments

You can get even fancier than this, defining a typing.Protocol as your enum’s value, but it’s best to keep things simple and use a very simple dataclass container if you can. ↩
derogatory ↩
I did not submit such a refactoring as a PR before writing this post because I don’t have full context for this library and I do not want to harass the maintainers or burden them with extra changes just to make a rhetorical point. If you do want to try that yourself, please file a bug first and clearly explain how you think it would benefit their project’s maintainability, and make sure that such a PR would be welcome. ↩

by Glyph at January 10, 2025 10:37 PM

December 16, 2024

Glyph Lefkowitz

DANGIT

Over the last decade, it has become a common experience to be using a social media app, and to perceive that app as saying something specific to you. This manifests in statements like “Twitter thinks Rudy Giuliani has lost his mind”, “Facebook is up in arms about DEI”, “Instagram is going crazy for this new water bottle”, “BlueSky loves this bigoted substack”, or “Mastodon can’t stop talking about Linux”. Sometimes this will even be expressed with “the Internet” as a metonym for the speaker’s preferred social media: “the Internet thinks that Kate Middleton is missing”.

However, even the smallest of these networks comprises literal millions of human beings, speaking dozens of different languages, many of whom never interact with each other at all. The hot takes that you see from a certain excitable sub-community, on your particular timeline or “for you” page, are not necessarily representative of “the Internet” — at this point, a group that represents a significant majority of the entire human population.

If I may coin a phrase, I will refer to these as “Diffuse, Amorphous, Nebulous, Generalized Internet Takes”, or DANGITs, which handily evokes the frustrating feeling of arguing against them.

A DANGIT is not really a new “internet” phenomenon: it is a specific expression of the availability heuristic.

If we look at our device and see a bunch of comments in our inbox, particularly if those comments have high salience via being recent, emotive, and repeated, we will naturally think that this is what The Internet thinks. However, just because we will naturally think this does not mean that we will accurately think it.

It is worth keeping this concept in mind when participating in public discourse because it leads to a specific type of communication breakdown. If you are arguing with a DANGIT, you will feel like you are arguing with someone with incredibly inconsistent, hypocritical, and sometimes even totally self-contradictory views. But to be self-contradictory, one needs to have a self. And if you are arguing with 9 different people from 3 different ideological factions, all making completely different points and not even taking time to agree on the facts beforehand, of course it’s going to sound like cacophonous nonsense. You’re arguing with the cacophony, it’s just presented to you in a way that deceives you into thinking that it’s one group.

There are subtle variations on this breakdown; for example, it can also make people’s taste seem incoherent. If it seems like one week the Interior Designer internet loves stark Scandinavian minimalism, and the next week baroque Rococo styles are making a comeback, it might seem like The Internet has no coherent sense of taste, and these things don’t go together. That’s because it doesn’t! Why would you expect it to?

Most likely, you are simply seeing some posts from minimalists, and then, separately, some posts from Rococo aficionados. Any particular person’s feed may be dedicated to a specific, internally coherent viewpoint, aesthetic, or ideology, but if you dump them all into a blender to separate them from their context, of course they will look jumbled together.

This is what social media does. It is context collapse as a service. Even if you eliminate engagement-maximizing algorithms and view everything perfectly chronologically, even if you have the world’s best trust & safety team making sure that there is nothing harmful and no disinformation, social media — like email — inherently remains that context-collapsing blender. There’s no way for it not to be; if two people you follow, who do not follow and are not aware of each other, are both posting unrelated things at the same time, you’re going to see them at around the same time.

Do not argue with a DANGIT. Discussions are the internet are famously Pyrrhic battles to begin with, but if you argue with a DANGIT it’s not that you will achieve a Pyrrhic victory, you cannot possibly achieve any victory, because you are shadowboxing an imagined consensus where none exits.

You can’t win against something that isn’t there.

Acknowledgments

by Glyph at December 16, 2024 10:58 PM

November 11, 2024

Glyph Lefkowitz

It’s Time For Democrats To Get More Annoying

Kamala Harris lost. Here we are. So it goes.

Are you sad? Are you scared?

I am very sad. I am very scared.

But, like everyone else in this position, most of all, I want to know what to do next.

A Mission For Progress

I believe that we should set up a missionary organization for progressive and liberal values.

In 2017, Kayla Chadwick wrote the now-classic article, “I Don’t Know How To Explain To You That You Should Care About Other People”. It resonated with millions of people, myself included. It expresses an exasperation with a populace that seems ignorant of economics, history, politics, and indeed unable to read the news. It is understandable to be frustrated with people who are exercising their electoral power callously and irresponsibly.

But I think in 2024, we need to reckon with the fact that we do, in fact, need to explain to a large swathe of the population that they should care about other people.

We had better figure out how to explain it soon.

Shared Values — A Basis for Hope

The first question that arises when we start considering outreach to the conservative-leaning or undecided independent population is, “are these people available to be convinced?”.

To that, I must answer an unqualified “yes”.

I know that some of you are already objecting. For those of us with an understanding of history and the mechanics of bigotry in the United States, it might initially seem like the answer is “no”.

As the Nazis came to power in the 1920s, they were campaigning openly on a platform of antisemitic violence. Everyone knew what the debate was. It was hard to claim that you didn’t, in spite of some breathtakingly cowardly contemporaneous journalism, they weren’t fooling anyone.

It feels ridiculous to say this, but Hitler did not have support among Jews.

Yet, after campaigning on a platform of defaming immigrants, and Mexican immigrants specifically for a decade, a large part of what drove his victory is that Trump enjoyed a shockingly huge surge of support among the Hispanic population. Even some undocumented migrants — the ones most likely to be herded into concentration camps starting in January — are supporting him.

I believe that this is possible because, in order to maintain support of the multi-ethnic working-class coalition that Trump has built, the Republicans must maintain plausible deniability. They have to say “we are not racist”, “we are not xenophobic”. Incredibly, his supporters even say “I don’t hate trans people” with startling regularity.

Most voters must continue to believe that hateful policies with devastating impacts are actually race-neutral, and are simply going to get rid of “bad” people. Even the ones motivated by racial resentment are mostly motivated by factually incorrect beliefs about racialized minorities receiving special treatment and resources which they are not in fact receiving.

They are victims of a disinformation machine. One that has rendered reality incomprehensible.

If you listen to conservative messaging, you can hear them referencing this all the time. Remember when JD Vance made that comment about Democrats calling Diet Mountain Dew racist?

Many publications wrote about this joke “bombing”¹, but the kernel of truth within it is this: understanding structural bigotry in the United States is difficult. When we progressives talk about it, people who don’t understand it think that our explanations sound ridiculous and incoherent.

There’s a reason that the real version of critical race theory is a graduate-level philosophy-of-law course, and not a couple of catch phrases.

If, without context, someone says that “municipal zoning laws are racist”, this makes about as much sense as “Diet Mountain Dew is racist” to someone who doesn’t already know what “redlining” is.

Conservatives prey upon this confusion to their benefit. But they prey on this because they must do so. They must do so because, despite everything, hate is not actually popular among the American electorate. Even now, they have to be deceived into it.

The good news is that all we need to do is stop the deception.

Politics Matter

If I have sold you on the idea that a substantial plurality of voters are available to be persuaded, the next question is: can we persuade them? Do we, as progressives, have the resources and means to do so? We did lose, after all, and it might seem like nothing we did had much of an impact.

Let’s analyze that assumption.

Across the country, Trump’s margins increased. However, in the swing states, where Harris spent money on campaigning, his margins increased less than elsewhere. At time of writing, we project that the safe-state margin shift will be 3.55% towards trump, and the swing-state margin shift will be 1.69%.

This margin was, sadly, too small for a victory, but it does show that the work mattered. Perhaps given more time, or more resources, it would have mattered just a little bit more, and that would have been decisive.

This is to say, in the places where campaign dollars were spent, even against the similar spending of the Trump campaign, we pushed the margin of support 1.86% higher within 107 days. So yes: campaigning matters. Which parts and how much are not straightforward, but it definitely matters.

This is a bit of a nonsensical comparison for a whole host of reasons², but just for a ballpark figure, if we kept this pressure up continuously during the next 4 years, we could increase support for a democratic candidate by 25%.

We Can Teach, Not Sell

Political junkies tend to overestimate the knowledge of the average voter. Even when we are trying to compensate for it, we tend to vastly overestimate how much the average voter knows about politics and policy. I suspect that you, dear reader, are a political junkie even if you don’t think of yourself as one.

To give you a sense of what I mean, across the country, on Election day and the day after, there was a huge spike in interest for the Google query, “did Joe Biden drop out”.

Consistently over the last decade, democratic policies are more popular than their opponents. Even deep red states, such as Kansas, often vote for policies supported by democrats and opposed by Republicans.

This confusion about policy is not organic; it is not voters’ fault. It is because Republicans constantly lie.

All this ignorance might seem discouraging, but it presents an opportunity: people will not sign up to be persuaded, but people do like being informed. Rather than proselytizing via a hard sales pitch, it should be possible to offer to explain how policy connects to elections. And this is made so much the easier if so many of these folks already generally like our policies.

The Challenge Is Enormous

I’ve listed some reasons for optimism, but that does not mean that this will be easy.

Republicans have a tremendously powerful, decentralized media apparatus that reinforces their culture-war messaging all the time.

After some of the post-election analysis, “The Left Needs Its Own Joe Rogan” is on track to become a cliché within the week.³ While I am deeply sympathetic to that argument, the right-wing media’s success is not organic; it is funded by petrochemical billionaires.

We cannot compete via billionaire financing, and as such, we have to have a way to introduce voters to progressive and liberal media. Which means more voters need social connections to liberals and progressives.

Good Works

The democratic presidential campaign alone spent a billion and a half dollars. And, as shown above, this can be persuasive, but it’s just the persuasion itself.

Better than spending all this money on telling people what good stuff we would do for them if we were in power, we could just show them, by doing good stuff. We should live our values, not just endlessly reiterate them.

A billion dollars is a significant amount of power in its own right.

For historical precedent, consider the Black Panthers’ Free Breakfast For Children program. This program absolutely scared the shit out of the conservative power structure, to the point that Nixon’s FBI literally raided them for giving out free food to children.

Religious missionaries, who are famously annoying, often offset their annoying-ness by doing charitable work in the communities they are trying to reach. A lot of the country that we need to reach are religious people, and nominally both Christians and leftists share a concern for helping those in need, so we should find some cultural common ground there.

We can leverage that overlap in values by partnering with churches. This immediately makes such work culturally legible to many who we most need to reach.

Jobs Jobs Jobs

When I raised this idea with Philip James, he had been mulling over similar ideas for a long time, but with a slightly different tack: free career skills workshops from folks who are obviously “non-traditional” with respect to the average rural voter’s cultural expectations. Recruit trans folks, black folks, women, and non-white immigrants from our tech networks.

Run the trainings over remote video conferencing to make volunteering more accessible. Run those workshops through churches as a distribution network.

There is good evidence that this sort of prolonged contact and direct exposure to outgroups, to help people see others as human beings, very effective politically.

However, job skills training is by no means the only benefit we could bring. There are lots of other services we could offer remotely, particularly with the skills that we in the tech community could offer. I offer this as an initial suggestion; if you have more ideas I’d love to hear them. I think the best ideas are ones where folks can opt in, things that feel like bettering oneself rather than receiving charity; nobody likes getting handouts, particularly from the outgroup, but getting help to improve your own skills feels more participatory.

I do think that free breakfast for children, specifically, might be something to start with because people are far more willing to accept gifts to benefit others (particularly their children, or the elderly!) rather than themselves.

Take Credit

Doing good works in the community isn’t enough. We need to do visible good works. Attributable good works.

We don’t want to be assholes about it, but we do want to make sure that these benefits are clearly labeled. We do not want to attach an obligation to any charitable project, but we do want to attach something to indicate where it came from.

I don’t know what that “something” should be. The most important thing is that whatever “something” is appeals to set of partially-overlapping cultures that I am not really a part of — Midwestern, rural, southern, exurban, working class, “red state” — and thus, I would want to hear from people from those cultures about what works best.

But it’s got to be something.

Maybe it’s a little sticker, “brought to you by progressives and liberals. we care about you!”. Maybe it’s a subtle piece of consistent branding or graphic design, like a stylized blue stripe. Maybe we need to avoid the word “democrats”, or even “progressive” or “liberal”, and need some independent brand for such a thing, that is clearly tenuously connected but not directly; like the Coalition of Liberal and Leftist Helpful Neighbors or something.

Famously, when Trump sent everybody a check from the government, he put his name on it. Joe Biden did the same thing, and Democrats seem to think it’s a good thing that he didn’t take credit because it “wasn’t about advancing politics”, even though this obviously backfired. Republicans constantly take credit for the benefits of Democratic policies, which is one reason why voters don’t know they’re democratic policies.

Our broad left-liberal coalition is attempting to improve people’s material conditions. Part of that is, and must be, advancing a political agenda. It’s no good if we provide job trainings and free lunches to a community if that community is just going to be reduced to ruin by economically catastrophic tariffs and mass deportations.

We cannot do this work just for the credit, but getting credit is important.

Let’s You And Me — Yes YOU — Get Started

I think this is a good idea, but I am not the right person to lead it.

For one thing, building this type of organization requires a lot of organizational and leadership skills that are not really my forte. Even the idea of filing the paperwork for a new 501(c)3 right now sounds like rolling Sisyphus’s rock up the hill to me.

For another, we need folks who are connected to this culture, in ways that I am not. I would be happy to be involved — I do have some relevant technical skills to help with infrastructure, and I could always participate in some of the job-training stuff, and I can definitely donate a bit of money to a nonprofit, but I don’t think I can be in charge.

You can definitely help too, and we will need a wide variety of skills to begin with, and it will definitely need money. Maybe you can help me figure out who should be in charge.

This project will be weaker without your support. Thus: I need to hear from you.

You can email me, or, if you’d prefer a more secure channel, feel free to reach out over Signal, where my introduction code is glyph.99 . Please start the message with “good works:” so I can easily identify conversations about this.

If I receive any interest at all, I plan to organize some form of meeting within the next 30 days to figure out concrete next steps.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more things like it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! My aspirations for this support are more in the directions of software development than activism, but needs must, when the devil drives. Thanks especially to Philip James for both refining the idea and helping to edit this post, and to Marley Myrianthopoulos for assistance with the data analysis.

Personally I think that the perception of it “bombing” had to do with the microphones during his speech not picking up much in the way of crowd noise. It sounded to me like there were plenty of claps and laughs at the time. But even if it didn’t land with most of the audience, it definitely resonated for some of them. ↩
A brief, non-exhaustive list of the most obvious ones:
- This is a huge amount of money raised during a crisis with an historic level of enthusiasm among democrats. There’s no way to sustain that kind of momentum.
- There are almost certainly diminishing returns at some point; people harbor conservative (and, specifically, bigoted) beliefs to different degrees, and the first million people will be much easier to convince than the second million, etc.
- Support share is not fungible; different communities will look different, and some will be saturated much more quickly than others. There is no reason to expect the rate over time to be consistent, nor the rate over geography.
↩
I mostly agree with this take, and in the interest of being the change I want to see in the world, let me just share a brief list of some progressive and liberal sources of media that you might want to have a look at and start paying attention to:
- If Books Could Kill
- Some More News
- Behind The Bastards
- Crooked Media, the publishers of Pod Save America, but you should check out everything they have on offer
- Bryan Tyler Cohen
- Hasan Piker
- PhilosophyTube
- Hbomberguy
- FD Signifier
- Citation Needed
- Platformer
Please note that not all of these are to my taste and not all of them may be to yours. They are all at different places along the left-liberal coalition spectrum, but find some sources that you enjoy and trust, and build from there. ↩

by Glyph at November 11, 2024 04:01 AM

November 03, 2024

Glyph Lefkowitz

The Federation Deathmatch

It’s the weekend, and I have some Thoughts about federated social media. So, buckle up, I guess, it’s time to start some fights.

Recently there has been some discourse about Bluesky’s latest fundraising round. I’ve been participating in conversations about this on Mastodon, and I think I might sometimes come across as a Mastodon partisan, but my feelings are complex and I really don’t want to be boosting the ActivityPub Fediverse without qualification.

So here are some qualifications.

Bluesky Is Evil

To the extent that I am an ActivityPub partisan in the discourse between ActivityPub and ATProtocol, it is because I do not believe that Bluesky is a meaningfully decentralized social network. It is a social network, run by a company, which has a public API with some elements that might, one day, make it possible for it to be decentralized. But today, it is not, either practically or theoretically.

The Bluesky developers are putting in a ton of effort to maybe make it decentralized, hypothetically, someday. A lot of people think they will succeed. But ActivityPub (and, of course, Mastodon specifically) are already, today, meaningfully decentralized, as you can see on FediDB, there are instances with hundreds of thousands of people on them, before we even get to esoterica like the integrations Threads, Wordpress, Flipboard, and Ghost are doing.

The inciting incident for this post — that a lot of people are also angry about Bluesky raising millions of dollars from Evil Guys Doing Evil Stuff Capital — is indeed a serious concern. It lights the fuse that burns towards their eventual, inevitable incredible journey. ATProtocol is just an API, and that API will get shut off one day, whenever their funders get bored of the pretense of their network being “decentralized”.

At time of writing, it is also interesting that 3 of the 4 times that the CEO of Bluesky has even skeeted the word “blockchain” is to say “no blockchain”, to reassure users that the scam magnet of “Blockchain” is not actually near their product or protocol, which is a much harder position to maintain when your lead investor is “Blockchain Capital”.

I think these are all valid criticisms of Bluesky. But I also think that the actual engineers working on the product are aware of these issues, and are making a significant effort to address them or mitigate them in any way they can. All that work can still be easily incinerated by a slow quarter in terms of user growth numbers or a missed revenue forecast when the VCs are getting impatient, but it’s not nothing, it is a life’s work.

Really, who among us could not have our life’s ambitions trivially destroyed in an afternoon, simply because a billionaire decided that they should be? If you feel like you are safe from this, I have some bad news about how money works. So we are all doing our best in an imperfect system and maybe Bluesky is on to something here. That’s eminently possible. They’re certainly putting forth an earnest effort.

Mastodon Is Stupid

Meanwhile, not nearly as much has been made recently of Mastodon refusing funding from a variety of sources, when all indications are that funding is low, and plummeting, far below the level required to actually sustain the site, and they haven’t done a financial transparency report for over a year, and that report was already nearly a year late.

Mastodon and the fediverse are not nearly in a position to claim moral superiority over Bluesky. Sure, taking blockchain VC money might seem like a rookie mistake, but going out of business because you are spurning every possible source of funding is not that wise either.

Some might think that, sure, Mastodon the company might die but at least the Fediverse as a whole will keep going strong, right? Lots of people run their own instances! I even find elements of this argument convincing, and I think there is probably some truth to it. But to really believe this argument as claimed, that it’s a fait accompli that the fediverse will survive in some form, that all those self-run servers will be a robust network that will self-repair, requires believing some obviously false stuff. It is frankly unprofitable to run a Fediverse instance. Realistically, if you want to operate a mastodon server for yourself, it is going to cost at least $100/year once you include stuff like having a domain name, and managing the infrastructure costs is a complex problem that keeps getting harder to manage as the software itself gets slower.

Cory Doctorow has recently argued that this is all worth it, because at least on Mastodon, you’re in control, not at the whims of centralized website operators like Bluesky. In his words,

On Mastodon (and other services based on Activitypub), you can easily leave one server and go to another, and everyone you follow and everyone who follows you will move over to the new server. If the person who runs your server turns out to be imperfect in a way that you can’t endure, you can find another server, spend five minutes moving your account over, and you’re back up and running on the new server

He concludes:

Any system where users can leave without pain is a system whose owners have high switching costs and whose users have none

(Emphasis mine).

This is a beautiful vision. It is, however, an incorrect assessment of the state of the Fediverse as it stands today. It’s not true in two important ways:

First, if you look at any account of a user’s fediverse account migration, like this one from Steve Bate or this one from the Ente project or this one from Erin Kissane, you will see that it is “painful for the foreseeable future” or “wasn’t as seamless as advertised”, and that “the best time to […] migrate instances […] is never”. This language does not presage a pleasant experience, as Doctorow puts it, “without pain”.

Second, migration is an active process that requires engagement from the instance that hosts you. If you have been blocked or banned, or had your account terminated, you are just out of luck. You do not have control over your data or agency over your online identity unless you’ve shelled out the relatively exorbitant amount of money to actually operate your own instance.

In short, ActivityPub is no panacea. A federated system is not really a “decentralized” system, as much as it is a bunch of smaller centralized systems that all talk to each other. You still need to know, and care, about your social and financial relationship to the operators of your instance. There is probably no getting away from this, like, just generally on the Internet, no matter how much peer-to-peer software we deploy, but there certainly isn’t in the incomplete mess that is ActivityPub.

JOIN, or DIE.

Neither Mastodon (or ActivityPub) nor Bluesky (or ATProtocol) has a comprehensive solution to the problem of decentralized social media. These companies, and these protocols, are both deeply flawed and if everything keeps bumping along as it is, I believe both are likely to fail. At different times, on different timelines, and for different reasons, but fail nonetheless.

However, these networks are both small and growing, and we are not yet in the phase of enshittification where margins are shrinking and audiences are captured and the screws must be tightened to juice revenue. There are stil possibilities. Mastodon is crowdfunded and what they lack in resources they make up for in flexibility and scrappiness. Bluesky has money and while there will eventually be a need to monetize somehow, they have plenty of runway to come up with that answer, and a lot of sophisticated protocol work has been done. Not enough to make a complete circut and allow users true, practical decentralization, but it’s not nothing, either.

Mastodon and Bluesky are both organizations with humans in them, and piles of data that is roughly schema-compatible even if the nuances and details are different. I know that there is a compatible model becuse thanks to both platforms being relatively open, there is a functioning ActivityPub/ATProtocol bridge in the form of Brid.gy Fed. You can use it today, and I highly recommend that you do so, so that “choice of protocol” does not fully define your audience. If you’re on bluesky, follow this account, and if you’re on Mastodon or elsewhere on the Fediverse, search for and follow @bsky.brid.gy@bsky.brid.gy.

The reality that fans of decentralized, independent social media must confront is that we are a tiny audicence right now. Whichever site we are looking at, we are talking about a few million monthly active users at best, in a world where even the pathetic husk of Twitter still has hundreds of millions and Facebook has billions. Interneceine fights are not going to get us anywhere. We need to build bridges and links and connect our networks as densely as possible. If I’m being honest, Bridgy Fed looks like a pretty janky solution, but it’s something, and we need to start doing something soon, so we do not collectively become a permanent minority that mass markets can safely ignore.

As users, we need to set an example, so that the developers of the respective platforms get their shit together and work together directly so that workarounds like Bridgy are not required. Frankly, this is mostly on the ActivityPub and Mastodon devs, as far as I can tell. Unfortunately, not a lot of this seems to be public, or at least I haven’t witnessed a lot of it directly, but I have heard repeatedly that the ActivityPub developers are prickly, and this is one high-profile public example where an ActivityPub partisan is incredibly, pointlessly hostile and borderline harrassing towards someone — Mike Masnick, a long-time staunch advocate for open protocols and open patents, someone with a Mastodon account, and thus as good a prospective ally as the ActivityPub fediverse might reasonably find — explaining some of the relative benefits of Bluesky.

Most of us are technology nerds in one way or another. In that way we can look at signifiers like “ActivityPub” and “ATProtocol”, and feel like these are hard boundaries around different all-encompassing structures for the future, and thus tribes we must join and support.

A better way to look at this, however, is to see social entities like Mastodon gGmbH and Bluesky PBC — or, more to the point, Fosstodon, SFBA Social, Hachyderm (and maybe, one day, even an instance which isn’t fully just for software development nerds), as groups that deploy these protocols to access some data that they publish, just as they might publish their website over HTTP or their newsletters over SMTP. There are technical challenges involved in bridging between mutually unintelligible domain models, but that is, like, network software’s whole deal. Most software is just some kind of translation from one format or context to another. The best possible future for the fediverse is the one where users care as much about the distinction between ATProtocol and ActivityPub as they do about the distinction between POP3 and IMAP.

To both developers and users of these systems, I say: get it together. Be nice to each other. Because the rest of the social media ecosystem is sure as shit not going to be nice to us if we ever see even a hint of success and start to actually cut into their user base.

Acknowledgments

by Glyph at November 03, 2024 09:49 PM

September 24, 2024

Hynek Schlawack

Production-ready Python Docker Containers with uv

Starting with 0.3.0, Astral’s uv brought many great features, including support for cross-platform lock files uv.lock. Together with subsequent fixes, it has become Python’s finest workflow tool for my (non-scientific) use cases. Here’s how I build production-ready containers, as fast as possible.

by Hynek Schlawack (hs@ox.cx) at September 24, 2024 12:00 AM

September 23, 2024

Hynek Schlawack

Python Project-Local Virtualenv Management Redux

One of my first TIL entries was about how you can imitate Node’s node_modules semantics in Python on UNIX-like operating systems. A lot has happened since then (to the better!) and it’s time for an update. direnv still rocks, though.

by Hynek Schlawack (hs@ox.cx) at September 23, 2024 12:00 AM

September 11, 2024

Glyph Lefkowitz

Python macOS Framework Builds

When you build Python, you can pass various options to ./configure that change aspects of how it is built. There is documentation for all of these options, and they are things like --prefix to tell the build where to install itself, --without-pymalloc if you have some esoteric need for everything to go through a custom memory allocator, or --with-pydebug.

One of these options only matters on macOS, and its effects are generally poorly understood. The official documentation just says “Create a Python.framework rather than a traditional Unix install.” But… do you need a Python.framework? If you’re used to running Python on Linux, then a “traditional Unix install” might sound pretty good; more consistent with what you are used to.

If you use a non-Framework build, most stuff seems to work, so why should anyone care? I have mentioned it as a detail in my previous post about Python on macOS, but even I didn’t really explain why you’d want it, just that it was generally desirable.

The traditional answer to this question is that you need a Framework build “if you want to use a GUI”, but this is demonstrably not true. At first it might not seem so, since the go-to Python GUI test is “run IDLE”; many non-Framework builds also omit Tkinter because they don’t ship a Tk dependency, so IDLE won’t start. But other GUI libraries work fine. For example, uv tool install runsnakerun / runsnake will happily pop open a GUI window, Framework build or not. So it bears some explaining

Wait, what is a “Framework” anyway?

Let’s back up and review an important detail of the mac platform.

On macOS, GUI applications are not just an executable file, they are organized into a bundle, which is a directory with a particular layout, that includes metadata, that launches an executable. A thing that, on Linux, might live in a combination of /bin/foo for its executable and /share/foo/ for its associated data files, is instead on macOS bundled together into Foo.app, and those components live in specified locations within that directory.

A framework is also a bundle, but one that contains a library. Since they are directories, Applications can contain their own Frameworks and Frameworks can contain helper Applications. If /Applications is roughly equivalent to the Unix /bin, then /Library/Frameworks is roughly equivalent to the Unix /lib.

App bundles are contained in a directory with a .app suffix, and frameworks are a directory with a .framework suffix.

So what do you need a Framework for in Python?

The truth about Framework builds is that there is not really one specific thing that you can point to that works or doesn’t work, where you “need” or “don’t need” a Framework build. I was not able to quickly construct an example that trivially fails in a non-framework context for this post, but I didn’t try that many different things, and there are a lot of different things that might fail.

The biggest issue is not actually the Python.framework itself. The metadata on the framework is not used for much outside of a build or linker context. However, Python’s Framework builds also ship with a stub application bundle, which places your Python process into a normal application(-ish) execution context all the time, which allows for various platform APIs like [NSBundle mainBundle] to behave in the normal, predictable ways that all of the numerous, various frameworks included on Apple platforms expect.

Various Apple platform features might want to ask a process questions like “what is your unique bundle identifier?” or “what entitlements are you authorized to access” and even beginning to answer those questions requires information stored in the application’s bundle.

Python does not ship with a wrapper around the core macOS “cocoa” API itself, but we can use pyobjc to interrogate this. After installing pyobjc-framework-cocoa, I can do this

>>> import AppKit
>>> AppKit.NSBundle.mainBundle()

On a non-Framework build, it might look like this:

1	`NSBundle </Users/glyph/example/.venv/bin> (loaded)`

But on a Framework build (even in a venv in a similar location), it might look like this:

NSBundle </Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app> (loaded)

This is why, at various points in the past, GUI access required a framework build, since connections to the window server would just be rejected for Unix-style executables. But that was an annoying restriction, so it was removed at some point, or at least, the behavior was changed. As far as I can tell, this change was not documented. But other things like user notifications or geolocation might need to identity an application for preferences or permissions purposes, respectively. Even something as basic as “what is your app icon” for what to show in alert dialogs is information contained in the bundle. So if you use a library that wants to make use of any of these features, it might work, or it might behave oddly, or it might silently fail in an undocumented way.

This might seem like undocumented, unnecessary cruft, but it is that way because it’s just basic stuff the platform expects to be there for a lot of different features of the platform.

`/etc/` builds

Still, this might seem like a strangely vague description of this feature, so it might be helpful to examine it by a metaphor to something you are more familiar with. If you’re familiar with more Unix style application development, consider a junior developer — let’s call him Jim — asking you if they should use an “/etc build” or not as a basis for their Docker containers.

What is an “/etc build”? Well, base images like ubuntu come with a bunch of files in /etc, and Jim just doesn’t see the point of any of them, so he likes to delete everything in /etc just to make things simpler. It seems to work so far. More experienced Unix engineers that he has asked react negatively and make a face when he tells them this, and seem to think that things will break. But their app seems to work fine, and none of these engineers can demonstrate some simple function breaking, so what’s the problem?

Off the top of your head, can you list all the features that all the files that /etc is needed for? Why not? Jim thinks it’s weird that all this stuff is undocumented, and it must just be unnecessary cruft.

If Jim were to come back to you later with a problem like “it seems like hostname resolution doesn’t work sometimes” or “ls says all my files are owned by 1001 rather than the user name I specified in my Dockerfile” you’d probably say “please, put /etc back, I don’t know exactly what file you need but lots of things just expect it to be there”.

This is what a framework vs. a non-Framework build is like. A Framework build just includes all the pieces of the build that the macOS platform expects to be there. What pieces do what features need? It depends. It changes over time. And the stub that Python’s Framework builds include may not be sufficient for some more esoteric stuff anyway. For example, if you want to use a feature that needs a bundle that has been signed with custom entitlements to access something specific, like the virtualization API, you might need to build your own app bundle. To extend our analogy with Jim, the fact that /etc exists and has the default files in it won’t always be sufficient; sometimes you have to add more files to /etc, with quite specific contents, for some features to work properly. But “don’t get rid of /etc (or your application bundle)” is pretty good advice.

Do you ever want a non-Framework build?

macOS does have a Unix subsystem, and many Unix-y things work, for Unix-y tasks. If you are developing a web application that mostly runs on Linux anyway and never care about using any features that touch the macOS-specific parts of your mac, then you probably don’t have to care all that much about Framework builds. You’re not going to be surprised one day by non-framework builds suddenly being unable to use some basic Unix facility like sockets or files. As long as you are aware of these limitations, it’s fine to install non-Framework builds. I have a dozen or so Pythons on my computer at any given time, and many of them are not Framework builds.

Framework builds do have some small drawbacks. They tend to be larger, they can be a bit more annoying to relocate, they typically want to live in a location like /Library or ~/Library. You can move Python.framework into an application bundle according to certain rules, as any bundling tool for macOS will have to do, but it might not work in random filesystem locations. This may make managing really large number of Python versions more annoying.

Most of all, the main reason to use a non-Framework build is if you are building a tool that manages a fleet of Python installations to perform some automation that needs to know about Python installs, and you want to write one simple tool that does stuff on Linux and on macOS. If you know you don’t need any platform-specific features, don’t want to spend the (not insignificant!) effort to cover those edge cases, and you get a lot of value from that level of consistency (for example, a teaching environment or interdisciplinary development team with a lot of platform diversity) then a non-framework build might be a better option.

Why do I care?

Personally, I think it’s important for Framework builds to be the default for most users, because I think that as much stuff should work out of the box as possible. Any user who sees a neat library that lets them get control of some chunk of data stored on their mac - map data, health data, game center high scores, whatever it is - should be empowered to call into those APIs and deal with that data for themselves.

Apple already makes it hard enough with their thicket of code-signing and notarization requirements for distributing software, aggressive privacy restrictions which prevents API access to some of this data in the first place, all these weird Unix-but-not-Unix filesystem layout idioms, sandboxing that restricts access to various features, and the use of esoteric abstractions like mach ports for communications behind the scenes. We don't need to make it even harder by making the way that you install your Python be a surprise gotcha variable that determines whether or not you can use an API like “show me a user notification when my data analysis is done” or “don’t do a power-hungry data analysis when I’m on battery power”, especially if it kinda-sorta works most of the time, but only fails on certain patch-releases of certain versions of the operating system, becuase an implementation detail of a proprietary framework changed in the meanwhile to require an application bundle where it didn’t before, or vice versa.

More generally, I think that we should care about empowering users with local computation and platform access on all platforms, Linux and Windows included. This just happens to be one particular quirk of how native platform integration works on macOS specifically.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. For this one, thanks especially to long-time patron Hynek who requested it specifically. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how can we set up our Mac developers’ laptops with Python”.

by Glyph at September 11, 2024 07:43 PM

September 03, 2024

Hynek Schlawack

How to Ditch Codecov for Python Projects

Codecov’s unreliability breaking CI on my open source projects has been a constant source of frustration for me for years. I have found a way to enforce coverage over a whole GitHub Actions build matrix that doesn’t rely on third-party services.

by Hynek Schlawack (hs@ox.cx) at September 03, 2024 12:00 AM

September 02, 2024

Hynek Schlawack

Why I Still Use Python Virtual Environments in Docker

Whenever I publish something about my Python Docker workflows, I invariably get challenged about whether it makes sense to use virtual environments in Docker containers. As always, it’s a trade-off, and I err on the side of standards and predictability.

by Hynek Schlawack (hs@ox.cx) at September 02, 2024 11:00 AM

August 16, 2024

Glyph Lefkowitz

On The Defense Of Heroes

If a high-status member of a community that you participate in is accused of misbehavior, you may want to defend them. You may even write a long essay in their defense.

In that essay, it may seem only natural to begin with a lengthy enumeration of the accused’s positive personal qualities. To extol the quality of their career and their contributions to your community. To talk about how nice they are. To be a character witness in the court of public opinion.

If you do this, you are not defending them. You are proving the point. This is exactly how missing stairs come to exist. People don’t get away with bad behavior if they don’t have high status and a good reputation already.

Sometimes, someone with antisocial inclinations seeks out status, in order to facilitate their bad behavior. Sometimes, a good, but, flawed person does a lot of really good work and thereby accidentally ends up with more status than they were expecting to have, and they don’t know how to handle it. In either case, bad behavior may ensue.

If you truly believe that your fave is being accused or punished unjustly, focus on the facts. What, specifically, has been alleged? How are these allegations substantiated? What verifiable evidence exists to the contrary? If you feel that someone is falsely accusing them to ruin their reputation, is there evidence to support your claim that the accusation is false? Ask yourself the question: what information do you have, that is leading to your correct analysis of the situation, that the people making the accusations do not have, which might be leading them into error?

But, also, maybe just… don’t?

The urge to defend someone like this is much more likely to come from a sense of personal grievance than justice. Consider: does it feel like you are being attacked, when your fave has been attacked? Is there a tightness in your chest, heat rising on your cheeks? Do you feel suddenly defensive?

Do you think that defensiveness is likely to lead to you making good, rational decisions about what steps to take next?

Let your heroes face accountability. If they are really worth your admiration, they might accept responsibility and make amends. Or they might fight the accusations with their own real evidence — evidence that you, someone peripheral to their situation, are unlikely to have — and prove the accusations wrong.

They might not want your defense. Even if they feel like they do want it in the moment — they are human too, after all, and facing accountability does not feel good to us humans — is the intensified feeling that they can’t let down their supporters who believe in them likely to make them feel less defensive and panicked?

In either case, your character defense is unlikely to serve them. At best it helps them stay on an ego trip, at worst it muddies the waters and might confuse the collection of facts that would, if considered dispassionately, properly exonerate them.

Do you think that I am pretending to speak in generalities but really talking about one specific recent event?

Wrong!

Just in this last week, I have read 2 different blog posts about 2 completely different people in completely unrelated communities and both of their authors need to read this. But each of those were already of a type, one that I’ve read dozens of instances of in the past.

It is a very human impulse to perceive a threat to someone we think well of, and to try to defend against that threat. But the consequences of someone’s own actions are not a threat you can defend them from.

by Glyph at August 16, 2024 07:53 PM

July 04, 2024

Glyph Lefkowitz

Against Innovation Tokens

Updated 2024-07-04: After some discussion, added an epilogue going into more detail about the value of the distinction between the two types of tokens.

In 2015, Dan McKinley laid out a model for software teams selecting technologies. He proposed that each team have a limited supply of “innovation tokens”, and, when selecting a technology, they can choose boring ones for free but “innovative” ones cost a token. This implies that we all know which technologies are innovative, and we assume that they are inherently costly, so we want to restrict their supply.

That model has become popular to the point that it is now part of the vernacular. In many discussions, it is accepted as received wisdom, or even common sense.

In this post I aim to show you that despite being superficially helpful, this model is wrong, and in fact, may be counterproductive. I believe it is an attractive nuisance in computer programming discourse.

In fairness to Mr. McKinley, the model he described in this post is:

nearly a decade old at this point, and
much more nuanced in its description of the problem with “innovation” than the subsequent memetic mutation of the concept.

While I will be referencing McKinley’s post, and I do take some issue with it, I am reacting more strongly to the life of its own that this idea has taken on once it escaped its original context. There are a zillion worse posts rehashing this concept, on blogs and LinkedIn, but I won’t be linking to them because the goal is not to call anybody out.

To some extent I am re-raising McKinley’s own caveats and reinforcing them. So I may be arguing with a strawman, but it’s a strawman I have seen deployed with some regularity over the years.

To reduce it to its core, this strawman is “don’t use new or interesting technology, and if you have to, only use a little bit”.

Within the broader culture of programmers, an “innovation token” has become a shorthand to smear any technology perceived — almost always based on vibes, not data — as risky, and the adoption of novel approaches as pretentious and unserious. Speaking of programmer culture though, I do have to acknowledge there is also a pervasive tendency for us to get distracted by novelty and waste time on puzzles rather than problem-solving, so I understand where the reactionary attitude represented by the concept of an innovation token comes from.

But it is reactionary.

At its worst, it borders on anti-intellectualism. I have heard it used on more than one occasion as a thought-terminating cliche to discard a potentially promising new tool. But before I get into that, let me try to give a sympathetic summary of the idea, because the model is not entirely bad.

It has been popular for a long time because it does work okay as an heuristic.

The real problem that McKinley is describing is operational overhead. When programmers make a technology selection, we are often considering how difficult it will make the programming. Innovative technology selections are, by definition, less mature.

That lack of maturity — particularly in the open source world — often means that the project is in a part of its lifecycle where it is concerned with development affordances more than operational ones. Therefore, the stereotypical innovative project, even one which might legitimately be a big improvement to development velocity, will create more operational overhead. That operational overhead creates a hidden cost for the operations team later on.

This is a point I emphatically agree with. When selecting a technology, you should consider its ease of operation more than its ease of development. If your team is successful, they will be operating and maintaining it far longer than they are initially integrating and deploying it.

Furthermore, some operational overhead is inevitable. You will need to hire people to mitigate it. More popular, more mature projects will have a bigger talent pool to hire from, so your training costs will be lower, and those training costs are part of your operational cost too.

Rationing innovation tokens therefore can work as a reasonable heuristic, or proxy metric, for avoiding a mess of complex operational problems associated with dependencies that are expensive to operate and hard to hire for.

There are some minor issues I want to point out before getting to the overarching one.

“has a lot of operational overhead” is a stereotype of a new technology, not an inherent property. If you want to reject a technology on the basis of being too high-overhead, at least look into its actual overhead a little bit. Sometimes, especially in 2024 as opposed to 2015, the point of a new, shiny piece of tech is to address operational issues that the more boring, older one had.
“hard to learn” is also a stereotype; if “newer” meant “harder” then we would all be using troff rather than Google Docs. Actually ask if the innovativeness is making things harder or easier; don’t assume.
You are going to have to train people on your stack no matter what. If a technology is adding a lot of value, it’s absolutely worth hiring for general ability and making a plan to teach people about it. You are going to have to do this with the core technology of your product anyway.

As I said, though, these are minor issues. The big problem with modeling operational overhead as an “innovation token” is that an even bigger concern than selecting an innovative tool is selecting too many tools.

The impulse to select more tools and make your operational environment more complex can be made worse by trying to avoid innovative tools. The important thing is not “less innovation”, but more consistency. To illustrate this, let’s do a simple thought experiment.

Let’s say you’re going to make a web app. There’s a tool in Haskell that you really like for a critical part of your app’s problem domain. You don’t want to spend more than one innovation token though, and everything in Haskell is inherently innovative, so you write a little service that just does that one part and you write the rest of your app in Ruby, calling into that service whenever you need to use that thing. This will appropriately restrict your “innovation token” expenditure.

Does doing this actually reduce your operational overhead, though?

First, you will have to find a team that likes both Ruby and Haskell and sees no problem using both. If you are not familiar with the cultural proclivities of these languages, suffice it to say that this is unlikely. Hiring for Haskell programmers is hard because there are fewer of them than Ruby programmers, but hiring for polyglot Haskell/Ruby programmers who are happy to do either is going to be really hard.

Since you will need to find different people to write in the different languages, even in the best case scenario, you will have two teams: the Haskell team and the Ruby team. Even if you are incredibly disciplined about inter-service responsibilities, there will be some areas where duplication of code is necessary across those services. Disagreements will arise and every one of these disagreements will be a source of social friction and software defects.

Then, you need to set up separate CI pipelines for each language, separate deployment systems, and of course, separate databases. Right away you are effectively doubling your workload.

In the worse, and unfortunately more likely scenario, there will be enormous infighting between these two teams. Operational incidents will be more difficult to manage because rather than learning the Haskell tools for operational visibility and disseminating that institutional knowledge amongst your team, you will be half-learning the lessons from two separate ecosystems and attempting to integrate them. Every on-call engineer will be frantically trying to learn a language ecosystem they don’t use regularly, or you will double the size of your on-call rotation. The Ruby team may start to resent the Haskell team for getting to exclusively work on the fun parts of the problem while they are doing things that look more like rote grunt work.

A better way to think about the problem of managing operational overhead is, rather than “innovation tokens”, consider “boundary tokens”.

That is to say, rather than evaluating the general sense of weird vibes from your architecture, consider the consistency of that architecture. If you’re using Haskell, use Haskell. You should be all-in on Haskell web frameworks, Haskell ORMs, Haskell OAuth integrations, and so on.¹ To cross the boundary out of Haskell, you need to spend a boundary token, and you shouldn’t have many of those.

I submit that the increased operational overhead that you might experience with an all-Haskell tool selection will be dwarfed by the savings that you get by having a team that is aligned with each other, that can communicate easily, and that can share programs with each other without needing to first strategize about a channel for the two pieces of work to establish bidirectional communication. The ability to simply call a function when you need to call it is very powerful, and extremely underrated.

Consistency ought to apply at each layer of the stack; it is perhaps most obvious with programming languages, but it is true of web frameworks, test frameworks, cryptographic libraries, you name it. Make a choice and stick with it, because every deviation from that choice carries a significant cost. Moreover this cost is a hidden cost, in the same way that the operational downsides of an “innovative” tool that hasn’t seen much production use might be hidden.

Discarding a more standard tool in favor of a tool more consistent with your architecture extends even to fairly uncontroversial, ubiquitous tools. For example, one of my favorite architectural patterns is to forego the use of the venerable — and very boring – Cron, the UNIX task-scheduler. Instead of Cron, it can make a lot of sense to have hand-written bespoke code for scheduling tasks within the application. Within the “innovation tokens” model, this is a very silly waste of a token!

Just use Cron! Everybody knows how to use Cron!

Except… does everybody know how to use Cron? Here are some questions to consider, if you’re about to roll out a big dependency on Cron:

How do you write a unit test for a scheduling rule with Cron?
Can you even remember how to write a cron rule that runs at the times you want?
How do you inject secrets and configuration variables into the distinct and somewhat idiosyncratic runtime execution environment of Cron?
How do you know that you did that variable-injection properly until the job actually runs, possibly in the middle of the night?
How do you deploy your monitoring and error-logging frameworks to observe your scripts run under Cron?

Granted, this architectural choice is less controversial than it once was. Cron used to be ambiently available on whatever servers you happened to be running. As container-based deployments have increased in popularity, this sense that Cron is just kinda around has gone away, and if you need to run a container that just runs Cron, much of the jankiness of its deployment is a lot more immediately visible.

There is friction at the boundary between things. That friction is a cost, but sometimes it’s a cost worth paying.

If there’s a really good library in Haskell and a really good library in Ruby and you really do want to use them both, maybe it makes sense to actually have multiple services. As your team gets larger and more mature, the need to bring in more tools, and the ability to handle the associated overhead, will only increase over time. But the place that the cost comes in the most is at the boundary between tools, not in the operational deficiencies of any one particular tool.

Even in a bog-standard web application with the most boring, least innovative tech stack imaginable (PHP, MySQL, HTML, CSS, JavaScript), many of the annoying points of friction are where different, inconsistent technologies make contact. If you are a programmer working on the web yourself, consider your own impression of the level of controversy of these technologies:

CSS frameworks that attempt to bridge the gap between CSS and JavaScript, like Bootstrap or Tailwind.
ORMs that attempt to bridge the gap between SQL and your backend language of choice
RPC systems that attempt to connect disparate services together using simple abstractions.

Consider that there are far more complex technical tools in terms of required skills to implement them, like computer vision or physics simulation, tools which are also pretty widely used, which consistently generate lower levels of controversy. People do have strong feelings about these things as well, of course, and it’s hard to find things to link to that show “this isn’t controversial”, but, like, search your feelings, you know it to be true.

You can see the benefits of the boundary token approach in programming language design. Many of the most influential and best-loved programming languages had an impact not by bundling together lots of tools, but by making everything into one thing:

LISP: everything is a list
Smalltalk: everything is an object
ML: everything is an algebraic data type
Forth: everything is a stack

There is a tremendous power in thinking about everything as a single kind of thing, because then you don’t have to juggle lots of different ideas about different kinds of things; you can just think about your problem.

When people complain about programming languages, they’re often complaining about how many different kinds of thing they have to remember in order to use it.

If you keep your boundary-token budget small, and allow your developers to accomplish as much as possible while staying within a solution space delineated by a single, clean cognitive boundary, I promise you can innovate as much as you want and your operational costs will remain manageable.

Epilogue

In subsequent Mastodon discussion of this post on with Matt Campbell and Meejah, I realized that I may not have made it entirely clear why I feel the distinction between “boundary” and “innovation” tokens is important. I do say above that the “innovation token” model can be a useful heuristic, so why bother with a new, but slightly different heuristic? Especially since most experienced engineers - indeed, McKinley himself - would budget “innovation” quite similarly to “boundaries”, and might even consider the use of more “innovative” Haskell tools in my hypothetical scenario to not even be an expenditure of innovation tokens at all.

To answer that, I need to highlight the purpose of having heuristics like this in the first place. These are vague, nebulous guidelines, not hard and fast rules. I cannot give you a token calculator to plug your technical decisions into. The purpose of either token heuristic is to facilitate discussions among a team.

With a team of skilled and experienced engineers, the distinction is meaningless. Senior and staff engineers (at least, the ones who deserve their level) will intuit the goals behind “innovation tokens” and inherently consider things like operational overhead anyway. In practice, a high-performing, well-aligned team discussing innovation tokens and one discussing boundary tokens will look functionally indistinguishable.

The distinction starts to be important when you have management pressures, nervous executives, inexperienced engineers, a fresh team without existing consensus about core technology choices, and so on. That is to say, most teams that exist in the messy, perpetually in medias res world of the software industry.

If you are just getting started on a project and you have a bunch of competent but disagreeable engineers, the words “innovation” and “boundaries” function very differently.

If you ask, “is this an innovation” about a particular technical tool, you are asking your interlocutor to pull in a bunch of their skills and experience to subjectively evaluate the relative industry-wide, or maybe company-wide, or maybe team-wide² newness of the thing being discussed. The discussion of whether it counts as boring or innovative is immediately fraught with a ton of subjective, difficult-to-quantify information about costs of hiring, difficulty of learning, and your impression of the feelings of hundreds or thousands of people outside of your team. And, yes, ultimately you do need to have an estimate of all that stuff, but starting your “is it OK to use this” conversation by simultaneously arguing about all those subjective judgments is setting yourself up for failure.

Instead, if you ask “does this introduce a boundary between two different technologies with different conceptual models”, while that is not a perfectly objective question, it is much easier for your team to answer, with much crisper intermediary factual questions. What are the two technologies? What are the models? How much do they differ? You can just hash out the answers to each one within the team directly, rather than needing to sift through the last few years of Stack Overflow developer surveys to determine relative adoption or popularity of technologies in the world at large.

Restricting your supply of either boundary or innovation tokens is a good idea, but achieving unanimity within your team about what your boundaries are is always going to be easier than deciding what your innovations are.

Acknowledgments

I gave a talk about this once, a very long time ago, where Haskell was Python. ↩
It’s not clear, that’s a big part of the problem. ↩

by Glyph at July 04, 2024 07:54 PM

June 20, 2024

Moshe Zadka

The Prehistory of Kubernetes: From Cubic Equations to Cloud Orchestration

The story of Kubernetes, the leading container orchestration platform, is a tale of mathematical innovation, wartime necessity, and the open-source revolution. It begins, perhaps unexpectedly, with the work of Omar Khayyam, the 11th-century Persian polymath known for his contributions to mathematics, astronomy, and poetry.

Khayyam's work on solving cubic equations laid the foundation for the development of algebraic geometry, which in turn led to the invention of Cartesian coordinates by René Descartes in the 17th century. This "algebrization of geometry" allowed for the mathematical description of physical phenomena, such as planetary motion, and paved the way for Isaac Newton's development of calculus.

However, Newton's calculus, while groundbreaking, lacked a rigorous mathematical foundation. It took the work of 19th-century mathematicians like Augustin-Louis Cauchy and Karl Weierstrass to establish the epsilon-delta definition of limits and place calculus on a solid footing. This development also opened up new questions about infinity, leading to Georg Cantor's work on set theory and the discovery of paradoxes, such as Russell's paradox by Bertrand Russell, that threatened the foundations of mathematics.

The quest to resolve these paradoxes and establish a secure foundation for mathematics led to Kurt Gödel's incompleteness theorems, published in 1931. Gödel's first incompleteness theorem showed that in any consistent axiomatic system that includes arithmetic, there are statements that can neither be proved nor disproved within the system. The second incompleteness theorem demonstrated that such a system cannot prove its own consistency.

Crucially, Gödel's theorems relied on the concept of computability, which he used to construct a formal system representing arithmetic. However, Gödel's definition of computability was not entirely convincing, as it relied on the intuitive notion of a "finite procedure." This left open the possibility that a non-computable axiomatization of number theory, capturing "all that is true about the natural numbers," could exist and potentially sidestep the incompleteness theorems.

It was Alan Turing who took up the challenge of formalizing the concept of computability. In his groundbreaking 1936 paper "On Computable Numbers, with an Application to the Entscheidungsproblem," Turing introduced the Turing machine, a simple yet powerful mathematical model of computation. Turing's work not only provided a more rigorous foundation for Gödel's ideas but also proved that certain problems, such as the halting problem, are undecidable by Turing machines.

Turing's formalization of computability had far-reaching implications beyond the foundations of mathematics. It laid the groundwork for the development of modern computer science and played a crucial role in the birth of the digital age. Turing's work took on new urgency with the outbreak of World War II, as the need to break the German Enigma machine led to the development of early computing machines based on the principles he had established.

After the war, the individuals who had worked on these machines helped to establish the first computing companies, leading to the industrialization of computing and the development of programming languages and operating systems. One notable example is Alan Turing himself, who joined the National Physical Laboratory (NPL) in London, where he worked on the design of the Automatic Computing Engine (ACE), one of the first stored-program computers.

Another key figure was John von Neumann, a mathematician and physicist who made significant contributions to the design of the EDVAC (Electronic Discrete Variable Automatic Computer), an early stored-program computer. Von Neumann's work on the EDVAC and his subsequent report, "First Draft of a Report on the EDVAC," laid the foundation for the von Neumann architecture, which became the standard design for modern computers.

In the United Kingdom, Maurice Wilkes, who had worked on radar systems during the war, led the development of the EDSAC (Electronic Delay Storage Automatic Calculator) at the University of Cambridge. The EDSAC, which became operational in 1949, was the first practical stored-program computer and inspired the development of similar machines in the United States and elsewhere.

In the United States, J. Presper Eckert and John Mauchly, who had worked on the ENIAC (Electronic Numerical Integrator and Computer) during the war, founded the Eckert-Mauchly Computer Corporation in 1946. The company developed the UNIVAC (Universal Automatic Computer), which became the first commercially available general-purpose computer in the United States.

The UNIVAC was followed by the Multiprocessing Automatic Computer (Multivac), developed by IBM in the late 1950s. The Multivac introduced several innovative features, such as multiprogramming and memory protection, which allowed multiple users to share the same machine and provided a degree of isolation between their programs. These features would later inspire both positive and negative lessons for the creators of Unix, the influential operating system developed at Bell Labs in the 1970s.

Due to antitrust pressures, Bell Labs made Unix available to universities, where it became a standard teaching tool. This decision led to the development of Minix, a simplified Unix-like system, and eventually to the creation of Linux by Linus Torvalds.

As Linux grew in popularity, thanks to its open-source nature and ability to run on cheap hardware, it caught the attention of Google, which was looking for an operating system to power its "cloud-native" approach to computing. Google's engineers contributed key features to the Linux kernel, such as cgroups and namespaces, which laid the groundwork for the development of containerization technologies like Docker.

Google, recognizing the potential of containers and the need for a robust orchestration platform, developed Kubernetes as an open-source system based on its experience with Borg and other orchestration tools. By establishing Kubernetes as the standard for container orchestration, Google aimed to reduce the barrier to entry for users looking to switch between cloud providers, challenging the dominance of Amazon Web Services.

Today, Kubernetes has become the de facto standard for managing containerized applications. As in previous improvements, this led to a new open problem: generating and deploying manifests. Tools for generating manifests range from general templating solutions like Bash variable substitution, Sed, and Jinja, through full fledged programming languages, like using Jsonnet and Python, all the way to using dedicated tools like Kustomize and Helm. Meanwhile, deploying the manifests to Kubernetes can be done through continuous integration platforms running "helm upgrade" or "kubectl apply" or using dedicated platforms like ArgoCD or FluxCD. ArgoCD or Flagger also support gradual roll-outs.

From cubic equations to cloud orchestration, the story of Kubernetes is a reminder that the path of progress is rarely straightforward, but rather a winding journey through the realms of mathematics, computer science, and human ingenuity.

by Moshe Zadka at June 20, 2024 04:30 AM

May 22, 2024

Glyph Lefkowitz

A Grand Unified Theory of the AI Hype Cycle

The Cycle

The history of AI goes in cycles, each of which looks at least a little bit like this:

Scientists do some basic research and develop a promising novel mechanism, N. One important detail is that N has a specific name; it may or may not be carried out under the general umbrella of “AI research” but it is not itself “AI”. N always has a few properties, but the most common and salient one is that it initially tends to require about 3x the specifications of the average computer available to the market at the time; i.e., it requires three times as much RAM, CPU, and secondary storage as is shipped in the average computer.
Research and development efforts begin to get funded on the hypothetical potential of N. Because N is so resource intensive, this funding is used to purchase more computing capacity (RAM, CPU, storage) for the researchers, which leads to immediate results, as the technology was previously resource constrained.
Initial successes in the refinement of N hint at truly revolutionary possibilities for its deployment. These revolutionary possibilities include a dimension of cognition that has not previously been machine-automated.
Leaders in the field of this new development — specifically leaders, like lab administrators, corporate executives, and so on, as opposed to practitioners like engineers and scientists — recognize the sales potential of referring to this newly-“thinking” machine as “Artificial Intelligence”, often speculating about science-fictional levels of societal upheaval (specifically in a period of 5-20 years), now that the “hard problem” of machine cognition has been solved by N.
Other technology leaders, in related fields, also recognize the sales potential and begin adopting elements of the novel mechanism to combine with their own areas of interest, also referring to their projects as “AI” in order to access the pool of cash that has become available to that label. In the course of doing so, they incorporate N in increasingly unreasonable ways.
The scope of “AI” balloons to include pretty much all of computing technology. Some things that do not even include N start getting labeled this way.
There’s a massive economic boom within the field of “AI”, where “the field of AI” means any software development that is plausibly adjacent to N in any pitch deck or grant proposal.
Roughly 3 years pass, while those who control the flow of money gradually become skeptical of the overblown claims that recede into the indeterminate future, where N precipitates a robot apocalypse somewhere between 5 and 20 years away. Crucially, because of the aforementioned resource-intensiveness, the gold owners skepticism grows slowly over this period, because their own personal computers or the ones they have access to do not have the requisite resources to actually run the technology in question and it is challenging for them to observe its performance directly. Public critics begin to appear.
Competent practitioners — not leaders — who have been successfully using N in research or industry quietly stop calling their tools “AI”, or at least stop emphasizing the “artificial intelligence” aspect of them, and start getting funding under other auspices. Whatever N does that isn’t “thinking” starts getting applied more seriously as its limitations are better understood. Users begin using more specific terms to describe the things they want, rather than calling everything “AI”.
Thanks to the relentless march of Moore’s law, the specs of the average computer improve. The CPU, RAM, and disk resources required to actually run the software locally come down in price, and everyone upgrades to a new computer that can actually run the new stuff.
The investors and grant funders update their personal computers, and they start personally running the software they’ve been investing in. Products with long development cycles are finally released to customers as well, but they are disappointing. The investors quietly get mad. They’re not going to publicly trash their own investments, but they stop loudly boosting them and they stop writing checks. They pivot to biotech for a while.
The field of “AI” becomes increasingly desperate, as it becomes the label applied to uses of N which are not productive, since the productive uses are marketed under their application rather than their mechanism. Funders lose their patience, the polarity of the “AI” money magnet rapidly reverses. Here, the AI winter is finally upon us.
The remaining AI researchers who still have funding via mechanisms less vulnerable to hype, who are genuinely thinking about automating aspects of cognition rather than simply N, quietly move on to the next impediment to a truly thinking machine, and in the course of doing so, they discover a new novel mechanism, M. Go to step 1, with M as the new N, and our current N as a thing that is now “not AI”, called by its own, more precise name.

The History

A non-exhaustive list of previous values of N have been:

Neural networks and symbolic reasoning in the 1950s.
Theorem provers in the 1960s.
Expert systems in the 1980s.
Fuzzy logic and hidden Markov models in the 1990s.
Deep learning in the 2010s.

Each of these cycles has been larger and lasted longer than the last, and I want to be clear: each cycle has produced genuinely useful technology. It’s just that each follows the progress of a sigmoid curve that everyone mistakes for an exponential one. There is an initial burst of rapid improvement, followed by gradual improvement, followed by a plateau. Initial promises imply or even state outright “if we pour more {compute, RAM, training data, money} into this, we’ll get improvements forever!” The reality is always that these strategies inevitably have a limit, usually one that does not take too long to find.

Where Are We Now?

So where are we in the current hype cycle?

Here’s a Computerphile video which explains some recent research into LLM performance. I’d highly encourage you to have a look at the paper itself, particularly Figure 2, “Log-linear relationships between concept frequency and CLIP zero-shot performance”.
Here’s a series of posts by Simon Willison explaining the trajectory of the practicality of actually-useful LLMs on personal devices. He hasn’t written much about it recently because it is now fairly pedestrian for an AI-using software developer to have a bunch of local models, and although we haven’t quite broken through the price floor of the gear-acquisition-syndrome prosumer market in terms of the requirements of doing so, we are getting close.
The Rabbit R1 and Humane AI Pin were both released; were they disappointments to their customers and investors? I think we all know how that went at this point.
I hear Karius just raised a series C, and they’re an “emerging unicorn”.
It does appear that we are all still resolutely calling these things “AI” for now, though, much as I wish, as a semasiology enthusiast, that we would be more precise.

Some Qualifications

History does not repeat itself, but it does rhyme. This hype cycle is unlike any that have come before in various ways. There is more money involved now. It’s much more commercial; I had to phrase things above in very general ways because many previous hype waves have been based on research funding, some really being exclusively a phenomenon at one department in DARPA, and not, like, the entire economy.

I cannot tell you when the current mania will end and this bubble will burst. If I could, you’d be reading this in my $100,000 per month subscribers-only trading strategy newsletter and not a public blog. What I can tell you is that computers cannot think, and that the problems of the current instantation of the nebulously defined field of “AI” will not all be solved within “5 to 20 years”.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. Special thanks also to Ben Chatterton for a brief pre-publication review; any errors remain my own. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “what are we doing that history will condemn us for”. Or, you know, Python programming.

by Glyph at May 22, 2024 04:58 PM

May 15, 2024

Glyph Lefkowitz

How To PyCon

These tips are not the “right” way to do PyCon, but they are suggestions based on how I try to do PyCon. Consider them reminders to myself, an experienced long-time attendee, which you are welcome to overhear.

See Some Talks

The hallway track is awesome. But the best version of the hallway track is not just bumping into people and chatting; it’s the version where you’ve all recently seen the same thing, and thereby have a shared context of something to react to. If you aren’t going to talks, you aren’t going to get a good hallway track.. Therefore: choose talks that interest you, attend them and pay close attention, then find people to talk to about them.

Given that you will want to see some of the talks, make sure that you have the schedule downloaded and available offline on your mobile device, or printed out on a piece of paper.

Make a list of the talks you think you want to see, but have that schedule with you in case you want to call an audible in the middle of the conference, switching to a different talk you didn’t notice based on some of those “hallway track” conversations.

Participate In Open Spaces

The name “hallway track” itself is antiquated, in a way which is relevant and important to modern conferences. It used to be that conferences were exclusively oriented around their scheduled talks; it was called the “hallway” track because the way to access it was to linger in the hallways, outside the official structure of the conference, and just talk to people.

But however, at PyCon and many other conferences, this unofficial track is now much more of an integrated, official part of the program. In particular, open spaces are not only a more official hallway track, they are considerably better than the historical “hallway” experience, because these ad-hoc gatherings can be convened with a prepared topic and potentially a loose structure to facilitate productive discussion.

With open spaces, sessions can have an agenda and so conversations are easier to start. Rooms are provided, which is more useful than you might think; literally hanging out in a hallway is actually surprisingly disruptive to speakers and attendees at talks; us nerds tend to get pretty loud and can be quite audible even through a slightly-cracked door, so avail yourself of these rooms and don’t be a disruptive jerk outside somebody’s talk.

Consult the open space board, and put up your own proposed sessions. Post them as early as you can, to maximize the chance that they will get noticed. Post them on social media, using the conference's official hashtag, and ask any interested folks you bump into help boost it.¹

Remember that open spaces are not talks. If you want to give a mini-lecture on a topic and you can find interested folks you could do that, but the format lends itself to more peer-to-peer, roundtable-style interactions. Among other things, this means that, unlike proposing a talk, where you should be an expert on the topic that you are proposing, you can suggest open spaces where you are curious — but ignorant — about something, in the hopes that some experts will show up and you can listen to their discussion.

Be prepared for this to fail; there’s a lot going on and it’s always possible that nobody will notice your session. Again, maximize your chances by posting it as early as you can and promoting it, but be prepared to just have a free 30 minutes to check your email. Sometimes that’s just how it goes. The corollary here is not to always balance attending others’ spaces with proposing your own. After all if someone else proposed it you know at least one other person is gonna be there.

Take Care of Your Body

Conferences can be surprisingly high-intensity physical activities. It’s not a marathon, but you will be walking quickly from one end of a large convention center to another, potentially somewhat anxiously.

Hydrate, hydrate, hydrate. Bring a water bottle, and have it with you at all times. It might be helpful to set repeating timers on your phone to drink water, since it can be easy to forget in the middle of engaging conversations. If you take advantage of the hallway track as much as you should, you will talk more than you expect; talking expels water from your body. All that aforementioned walking might make you sweat a bit more than you realize.

Hydrate.

More generally, pay attention to what you are eating and drinking. Conference food isn’t always the best, and in a new city you might be tempted to load up on big meals and junk food. You should enjoy yourself and experience the local cuisine, but do it intentionally. While you enjoy the local fare, do so in whatever moderation works best for you. Similarly for boozy night-time socializing. Nothing stings quite as much as missing a morning of talks because you’ve got a hangover or a migraine.

This is worth emphasizing because in the enthusiasm of an exciting conference experience, it’s easy to lose track and overdo it.

Meet Both New And Old Friends: Plan Your Socializing

A lot of the advice above is mostly for first-time or new-ish conferencegoers, but this one might be more useful for the old heads. As we build up a long-time clique of conference friends, it’s easy to get a bit insular and lose out on one of the bits of magic of such an event: meeting new folks and hearing new perspectives.

While open spaces can address this a little bit, there's one additional thing I've started doing in the last few years: dinners are for old friends, but lunches are for new ones. At least half of the days I'm there, I try to go to a new table with new folks that I haven't seen before, and strike up a conversation. I even have a little canned icebreaker prompt, which I would suggest to others as well, because it’s worked pretty nicely in past years: “what is one fun thing you have done with Python recently?”².

Given that I have a pretty big crowd of old friends at these things, I actually tend to avoid old friends at lunch, since it’s so easy to get into multi-hour conversations, and meeting new folks in a big group can be intimidating. Lunches are the time I carve out to try and meet new folks.

I’ll See You There

I hope some of these tips were helpful, and I am looking forward to seeing some of you at PyCon US 2024!

In PyCon2024's case, #PyConUS on Mastodon is probably the way to go. Note, also, that it is #PyConUS and not #pyconus, which is much less legible for users of screen-readers. ↩
Obviously that is specific to this conference. At the O’Reilly Software Architecture conference, my prompt was “What is software architecture?” which had some really fascinating answers. ↩

by Glyph at May 15, 2024 09:12 AM

May 07, 2024

Glyph Lefkowitz

Hope

Humans are pattern-matching machines. As a species, it is our superpower. To summarize the core of my own epistemic philosophy, here is a brief list of the activities in the core main-loop of a human being:

stuff happens to us
we look for patterns in the stuff
we weave those patterns into narratives
we turn the narratives into models of the world
we predict what will happen based on those models
we do stuff based on those predictions
based on the stuff we did, more stuff happens to us; return to step 1

While this ability lets humans do lots of great stuff, like math and physics and agriculture and so on, we can just as easily build bad stories and bad models. We can easily trick ourselves into thinking that our predictive abilities are more powerful than they are.

The existence of magic-seeming levels of prediction in fields like chemistry and physics and statistics, in addition to the practical usefulness of rough estimates and heuristics in daily life, itself easily creates a misleading pattern. “I see all these patterns and make all these predictions and I’m right a lot of the time, so if I just kind of wing it and predict some more stuff, I’ll also be right about that stuff.”

This leaves us very vulnerable to things like mean world syndrome. Mean world syndrome itself is specifically about danger, but I believe it is a manifestation of an even broader phenomenon which I would term “the apophenia of despair”.

Confirmation bias is an inherent part of human cognition, but the internet has turbocharged it. Humans have immediate access to more information than we ever had in the past. In order to cope with that information, we have also built ways to filter that information. Even disregarding things like algorithmic engagement maximization and social media filter bubbles, the simple fact that when you search for things, you are a lot more likely to find the thing that you’re searching for than to find arguments refuting it, can provide a very strong sense that you’re right about whatever you’re researching.

All of this is to say: if you decide that something in the world is getting worse, you can very easily convince yourself that it is getting much, much worse, very rapidly. Especially because there are things which are, unambiguously, getting worse.

However, Pollyanna-ism is just the same phenomenon in reverse and I don’t want to engage in that. The ice sheets really are melting, globally, fascism really is on the rise. I am not here to deny reality or to cherry pick a bunch of statistics to lull people into complacency.

I believe that while dwelling on a negative reality is bad, I also believe that in the face of constant denial, it is sometimes necessary to simply emphasize those realities, however unpleasant they may be. Distinguishing between unhelpful rumination on negativity and repetition of an unfortunate but important truth to correct popular perception is subjective and subtle, but the difference is nevertheless important.

As our ability to acquire information about things getting worse has grown, our ability to affect those things has not. Knowledge is not power; power is power, and most of us don’t have a lot of it, so we need to be strategic in the way that we deploy our limited political capital and personal energy.

Overexposure to negative news can cause symptoms of depression; depressed people have reduced executive function and find it harder to do stuff. One of the most effective interventions against this general feeling of malaise? Hope.. Not “hope” in the sense of wishing. As this article in the American Psychological Association’s “Monitor on Psychology” puts it:

“We often use the word ‘hope’ in place of wishing, like you hope it rains today or you hope someone’s well,” said Chan Hellman, PhD, a professor of psychology and founding director of the Hope Research Center at the University of Oklahoma. “But wishing is passive toward a goal, and hope is about taking action toward it.”

Here, finally, I can get around to my point.

If you have an audience, and you have some negative thoughts about some social trend, talking about it in a way which is vague and non-actionable is potentially quite harmful. If you are doing this, you are engaged in the political project of robbing a large number of people of hope. You are saying that the best should have less conviction, while the worst will surely remain just as full of passionate intensity.

I do not mean to say that it is unacceptable to ever publicly share thoughts of sadness, or even hopelessness. If everyone in public is forced to always put on a plastic smile and pretend that everything is going to be okay if we have grit and determination, then we have an Instagram culture of fake highlight reels where anyone having their own struggles with hopelessness will just feel even worse in their isolation. I certainly posted my way through my fair share of pretty bleak mental health issues during the worst of the pandemic.

But we should recognize that while sadness is a feeling, hopelessness is a problem, a bad reaction to that feeling, one that needs to be addressed if we are going to collectively dig ourselves out of the problem that creates the sadness in the first place. We may not be able to conjure hope all the time, but we should always be trying.

When we try to address these feelings, as I said earlier, Pollyanna-ism doesn’t help. The antidote to hopelessness is not optimism, but curiosity. If you have a strong thought like “people these days just don’t care about other people¹”, yelling “YES THEY DO” at yourself (or worse, your audience) is unlikely to make much of a change, and certainly not likely to be convincing to an audience. Instead, you could ask yourself some questions, and use them for a jumping-off point for some research:

Why do I think this — is the problem in my perception, or in the world?
If there is a problem in my perception, is this a common misperception? If it’s common, what is leading to it being common? If it’s unique to me, what sort of work do I need to do to correct it?
If the problem is real, what are its causes? Is there anything that I, or my audience, could do to address those causes?

The answers to these questions also inform step 6 of the process I outlined above: the doing stuff part of the process.

At some level, all communication is persuasive communication. Everything you say that another person might hear, everything you say that a person might hear, is part of a sprachspiel where you are attempting to achieve something. There is always an implied call to action; even “do nothing, accept the status quo” is itself an action. My call to action right now is to ask you to never make your call to action “you should feel bad, and you should feel bad about feeling bad”. When you communicate in public, your words have power.

Use that power for good.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Cassandra Granade, who provided some editorial feedback on this post; any errors, of course, remain my own.

I should also note that vague sentiments of this form, “things used to be better, now they’re getting worse”, are at their core a reactionary yearning for a prelapsarian past, which is both not a good look and also often wrong in a very common way. Complaining about how “people” are getting worse is a very short journey away from complaining about kids these days, which has a long and embarrassing history of being comprehensively incorrect in every era. ↩

by Glyph at May 07, 2024 10:26 PM

March 30, 2024

Glyph Lefkowitz

Software Needs To Be More Expensive

The Cost of Coffee

One of the ideas that James Hoffmann — probably the most influential… influencer in the coffee industry — works hard to popularize is that “coffee needs to be more expensive”.

The coffee industry is famously exploitative. Despite relatively thin margins for independent café owners¹, there are no shortage of horrific stories about labor exploitation and even slavery in the coffee supply chain.

To summarize a point that Mr. Hoffman has made over a quite long series of videos and interviews², some of this can be fixed by regulatory efforts. Enforcement of supply chain policies both by manufacturers and governments can help spot and avoid this type of exploitation. Some of it can be fixed by discernment on the part of consumers. You can try to buy fair-trade coffee, avoid brands that you know have problematic supply-chain histories.

Ultimately, though, even if there is perfect, universal, zero-cost enforcement of supply chain integrity… consumers still have to be willing to, you know, pay more for the coffee. It costs more to pay wages than to have slaves.

The Price of Software

The problem with the coffee supply chain deserves your attention in its own right. I don’t mean to claim that the problems of open source maintainers are as severe as those of literal child slaves. But the principle is the same.

Every tech company uses huge amounts of open source software, which they get for free.

I do not want to argue that this is straightforwardly exploitation. There is a complex bargain here for the open source maintainers: if you create open source software, you can get a job more easily. If you create open source infrastructure, you can make choices about the architecture of your projects which are more long-term sustainable from a technology perspective, but would be harder to justify on a shorter-term commercial development schedule. You can collaborate with a wider group across the industry. You can build your personal brand.

But, in light of the recent xz Utils / SSH backdoor scandal, it is clear that while the bargain may not be entirely one-sided, it is not symmetrical, and significant bad consequences may result, both for the maintainers themselves and for society.

To fix this problem, open source software needs to get more expensive.

A big part of the appeal of open source is its implicit permission structure, which derives both from its zero up-front cost and its zero marginal cost.

The zero up-front cost means that you can just get it to try it out. In many companies, individual software developers do not have the authority to write a purchase order, or even a corporate credit card for small expenses.

If you are a software engineer and you need a new development tool or a new library that you want to purchase for work, it can be a maze of bureaucratic confusion in order to get that approved. It might be possible, but you are likely to get strange looks, and someone, probably your manager, is quite likely to say “isn’t there a free option for this?” At worst, it might just be impossible.

This makes sense. Dealing with purchase orders and reimbursement requests is annoying, and it only feels worth the overhead if you’re dealing with a large enough block of functionality that it is worth it for an entire team, or better yet an org, to adopt. This means that most of the purchasing is done by management types or “architects”, who are empowered to make decisions for larger groups.

When individual engineers need to solve a problem, they look at open source libraries and tools specifically because it’s quick and easy to incorporate them in a pull request, where a proprietary solution might be tedious and expensive.

That’s assuming that a proprietary solution to your problem even exists. In the infrastructure sector of the software economy, free options from your operating system provider (Apple, Microsoft, maybe Amazon if you’re in the cloud) and open source developers, small commercial options have been marginalized or outright destroyed by zero-cost options, for this reason.

If the zero up-front cost is a paperwork-reduction benefit, then the zero marginal cost is almost a requirement. One of the perennial complaints of open source maintainers is that companies take our stuff, build it into a product, and then make a zillion dollars and give us nothing. It seems fair that they’d give us some kind of royalty, right? Some tiny fraction of that windfall? But once you realize that individual developers don’t have the authority to put $50 on a corporate card to buy a tool, they super don’t have the authority to make a technical decision that encumbers the intellectual property of their entire core product to give some fraction of the company’s revenue away to a third party. Structurally, there’s no way that this will ever happen.

Despite these impediments, keeping those dependencies maintained does cost money.

Some Solutions Already Exist

There are various official channels developing to help support the maintenance of critical infrastructure. If you work at a big company, you should probably have a corporate Tidelift subscription. Maybe ask your employer about that.

But, as they will readily admit there are a LOT of projects that even Tidelift cannot cover, with no official commercial support, and no practical way to offer it in the short term. Individual maintainers, like yours truly, trying to figure out how to maintain their projects, either by making a living from them or incorporating them into our jobs somehow. People with a Ko-Fi or a Patreon, or maybe just an Amazon wish-list to let you say “thanks” for occasional maintenance work.

Most importantly, there’s no path for them to transition to actually making a living from their maintenance work. For most maintainers, Tidelift pays a sub-hobbyist amount of money, and even setting it up (and GitHub Sponsors, etc) is a huge hassle. So even making the transition from “no income” to “a little bit of side-hustle income” may be prohibitively bureaucratic.

Let’s take myself as an example. If you’re a developer who is nominally profiting from my infrastructure work in your own career, there is a very strong chance that you are also a contributor to the open source commons, and perhaps you’ve even contributed more to that commons than I have, contributed more to my own career success than I have to yours. I can ask you to pay me³, but really you shouldn’t be paying me, your employer should.

What To Do Now: Make It Easy To Just Pay Money

So if we just need to give open source maintainers more money, and it’s really the employers who ought to be giving it, then what can we do?

Let’s not make it complicated. Employers should just give maintainers money. Let’s call it the “JGMM” benefit.

Specifically, every employer of software engineers should immediately institute the following benefits program: each software engineer should have a monthly discretionary budget of $50 to distribute to whatever open source dependency developers they want, in whatever way they see fit. Venmo, Patreon, PayPal, Kickstarter, GitHub Sponsors, whatever, it doesn’t matter. Put it on a corp card, put the project name on the line item, and forget about it. It’s only for open source maintenance, but it’s a small enough amount that you don’t need intense levels of approval-gating process. You can do it on the honor system.

This preserves zero up-front cost. To start using a dependency, you still just use it⁴. It also preserves zero marginal cost: your developers choose which dependencies to support based on perceived need and popularity. It’s a fixed overhead which doesn’t scale with revenue or with profit, just headcount.

Because the whole point here is to match the easy, implicit, no-process, no-controls way in which dependencies can be added in most companies. It should be easier to pay these small tips than it is to use the software in the first place.

This sub-1% overhead to your staffing costs will massively de-risk the open source projects you use. By leaving the discretion up to your engineers, you will end up supporting those projects which are really struggling and which your executives won’t even hear about until they end up on the news. Some of it will go to projects that you don’t use, things that your engineers find fascinating and want to use one day but don’t yet depend upon, but that’s fine too. Consider it an extremely cheap, efficient R&D expense.

A lot of the options for developers to support open source infrastructure are already tax-deductible, so if they contribute to something like one of the PSF’s fiscal sponsorees, it’s probably even more tax-advantaged than a regular business expense.

I also strongly suspect that if you’re one of the first employers to do this, you can get a round of really positive PR out of the tech press, and attract engineers, so, the race is on. I don’t really count as the “tech press” but nevertheless drop me a line to let me know if your company ends up doing this so I can shout you out.

Acknowledgments

I don’t have time to get into the margins for Starbucks and friends, their relationship with labor, economies of scale, etc. ↩
While this is a theme that pervades much of his work, the only place I can easily find where he says it in so many words is on a podcast that sometimes also promotes right-wing weirdos and pseudo-scientific quacks spreading misinformation about autism and ADHD. So, I obviously don’t want to link to them; you’ll have to take my word for it. ↩
and I will, since as I just recently wrote about, I need to make sure that people are at least aware of the option ↩
Pending whatever legal approval program you have in place to vet the license. You do have a nice streamlined legal approvals process, right? You’re not just putting WTFPL software into production, are you? ↩

by Glyph at March 30, 2024 11:00 PM

March 29, 2024

Glyph Lefkowitz

The Hat

This year I will be going to two conferences: PyCon 2024, and North Bay Python 2024.

At PyCon, I will be promoting my open source work and my writing on this blog. As I’m not giving a talk this year, I am looking forward to organizing and participating in some open spaces about topics like software architecture, open source sustainability, framework interoperability, and event-driven programming.

At North Bay Python, though, I will either be:

doing a lot more of that, or
looking for new full-time employment, pausing the patreon, and winding down this experiment.

If you’d like to see me doing the former, now would be a great time to sign up to my Patreon to support the continuation of my independent open source work and writing.

The Bad News

It has been nearly a year since I first mentioned that I have a Patreon on this blog. That year has been a busy one, with consulting work and personal stuff consuming more than enough time that I have never been full time on independent coding & blogging. As such, I’ve focused more on small infrastructure projects and less on user-facing apps than I’d like, but I have spent the plurality of my time on it.

For that to continue, let alone increase, this work needs to—at the very least—pay for health insurance. At my current consulting rate, a conservative estimate based on some time tracking is that I am currently doing this work at something like a 98.5% discount. I do love doing it! I would be happy to continue doing it at a significant discount! But “significant” and “catastrophic” are different thresholds.

As I have said previously, this is an appeal to support my independent work; not to support me. I will be fine; what you will be voting for with your wallet is not my well-being but a choice about where I spend my time.

Hiding The Hat

When people ask me what I do these days, I sometimes struggle to explain. It is confusing. I might say I have a Patreon, I do a combination of independent work and consulting, or if I’m feeling particularly sarcastic I might say I’m an ✨influencer✨. But recently I saw the very inspiring Matt Ricardo describing the way he thinks about his own Patreon, and I realized what I am actually trying to do, which is software development busking.

Previously, when I’ve been thinking about making this “okay, it’s been a year of Patreon, let’s get serious now” post, I’ve been thinking about adding more reward products to my Patreon, trying to give people better value for their money before asking for anything more, trying to finish more projects to make a better sales pitch, maybe making merch available for sale, and so on. So aside from irregular weekly posts on Friday and acknowledgments sections at the bottom of blog posts, I’ve avoided mentioning this while I think about adding more private rewards.

But busking is a public performance, and if you want to support my work then it is that public aspect that you presumably want to support. And thus, an important part of the busking process is to actually pass the hat at the end. The people who don’t chip in still get to see the performance, but everyone else need to know that they can contribute if they liked it.¹

I’m going to try to stop hiding the metaphorical hat. I still don’t want to overdo it, but I will trust that you’ll tell me if these reminders get annoying. For my part today, in addition to this post, I’m opening up a new $10 tier on Patreon for people who want to provide a higher level of support, and officially acknowledging the rewards that I already provide.

What’s The Deal?

So, what would you be supporting?

What You Give (The Public Part)

I have tended to focus on my software, and there has been a lot of it! You’d be supporting me writing libraries and applications and build infrastructure to help others do the same with Python, as well as maintaining existing libraries (like the Twisted ecosystem libraries) sometimes. If I can get enough support together to more than bare support for myself, I’d really like to be able to do things like pay people to others to help with aspects of applications that I would struggle to complete myself, like accessibility or security audits.
I also do quite a bit of writing though, about software and about other things. To make it easier to see what sort of writing I’m talking about, I’ve collected just the stuff that I’ve written during the period where I have had some patrons, under the supported tag.
Per my earlier sarcastic comment about being an “influencer”, I also do quite a bit of posting on Mastodon about software and the tech industry.

What You Get (Just For Patrons)

As I said above, I will be keeping member benefits somewhat minimal.

I will add you to SponCom so that your name will be embedded in commit messages like this one on a cadence appropriate to your support level.
You will get access to my private Patreon posts where I discuss what I’ve been working on. As one of my existing patrons put it:

I figure I’m getting pretty good return on it, getting not only targeted project tracking, but also a peek inside your head when it comes to Sores Business Development. Maybe some of that stuff will rub off on me :)
This is a somewhat vague and noncommittal benefit, but it might be the best one: if you are interested in the various different projects that I am doing, you can help me prioritize! I have a lot of things going on. What would you prefer that I focus on? You can make suggestions in the comments of Patreon posts, which I pay a lot more attention to than other feedback channels.
In the near future² I am also planning to start doing some “office hours” type live-streaming, where I will take audience questions and discuss software design topics, or maybe do some live development to showcase my process and the tools I use. When I figure out the mechanics of this, I plan to add some rewards to the existing tiers to select topics or problems for me to work on there.

If that sounds like a good deal to you, please sign up now. If you’re already supporting me, sharing this and giving a brief testimonial of something good I’ve done would be really helpful. Github is not an algorithmic platform like YouTube, despite my occasional jokey “remember to like and subscribe”, nobody is getting recommended DBXS, or Fritter, or Twisted, or Automat, or this blog unless someone goes out and shares it.

A year into this, after what feels like endlessly repeating this sales pitch to the point of obnoxiousness, I still routinely interact with people who do not realize that I have a Patreon at all. ↩
Not quite comfortable putting this on the official patreon itemized inventory of rewards yet, but I do plan to add it once I’ve managed to stream for a couple of weeks in a row. ↩

by Glyph at March 29, 2024 11:56 PM

DBXS 0.1.0

New Release

Yesterday I published a new release of DBXS for you all. It’s still ZeroVer, but it has graduated from double-ZeroVer as this is the first nonzero minor version.

More to the point though, the meaning of that version increment this version introduces some critical features that I think most people would need to give it a spin on a hobby project.

What’s New

It has support for MySQL and PostgreSQL using native asyncio drivers, which means you don’t need to take a Twisted dependency in production.
While Twisted is still used for some of the testing internals, Deferred is no longer exposed anywhere in the public API, either; your tests can happily pretend that they’re doing asyncio, as long as they can run against SQLite.
There is a new repository convenience function that automatically wires together multiple accessors and transaction discipline. Have a look at the docstring for a sense of how to use it.
Several papercuts, like confusing error messages when messing up query result handling, and lack of proper handling of default arguments in access protocols, are now addressed.

It’s A Good Time To Contribute!

If you’ve been looking for an open source project to try your hand at contributing to, DBXS might be a great opportunity, for a few reasons:

The team is quite small (just me, right now!), so it’s easy to get involved.
It’s quite generally useful, so there’s a potential for an audience, but right now it doesn’t really have any production users; there’s still time to change things without a lot of ceremony.
Unlike many other small starter projects, it’s got a test suite with 100% coverage, so you can contribute with confidence that you’re not breaking anything.
There’s not that much code (a bit over 2 thousand SLOC), so it’s not hard to get your head around.
There are a few obvious next steps for improvement, which I’ve filed as issues if you want to pick one up.

Share and enjoy, and please let me know if you do something fun with it.

Acknowledgments

by Glyph at March 29, 2024 10:19 PM

March 19, 2024

Hynek Schlawack

You Can Build Portable Binaries of Python Applications

Contrary to popular belief, it’s possible to ship portable executables of Python applications without sending your users to Python packaging hell.

by Hynek Schlawack (hs@ox.cx) at March 19, 2024 12:00 AM

February 05, 2024

Glyph Lefkowitz

Let Me Tell You A Secret

I do consulting¹ on software architecture, network protocol development, python software infrastructure, streamlined cloud deployment, and open source strategy, among other nerdy things. I enjoy solving challenging, complex technical problems or contributing to the open source commons. On the best jobs, I get to do both.

Today I would like to share with you a secret of the software technology consulting trade.

I should note that this secret is not specific to me. I have several colleagues who have also done software consulting and have reflected versions of this experience back to me.²

We’ll get to the secret itself in a moment, but first, some background.

Companies do not go looking for consulting when things are going great. This is particularly true when looking for high-level consulting on things like system architecture or strategy. Almost by definition, there’s a problem that I have been brought in to solve. Ideally, that problem is a technical challenge.

In the software industry, your team probably already has some software professionals with a variety of technical skills, and thus they know what to do with technical challenges. Which means that, as often as not, the problem is to do with people rather than technology, even it appears otherwise.

When you hire a staff-level professional like myself to address your software team’s general problems, that consultant will need to gather some information. If I am that consultant and I start to suspect that the purported technology problem that you’ve got is in fact a people problem, here is the secret technique that I am going to use:

I am going to go get a pen and a pad of paper, then schedule a 90-minute meeting with the most senior IC³ engineer that you have on your team. I will bring that pen and paper to the meeting. I will then ask one question:

What is fucked up about this place?

I will then write down their response in as much detail as I can manage. If I have begun to suspect that this meeting is necessary, 90 minutes is typically not enough time, and I will struggle to keep up. Even so, I will usually manage to capture the highlights.

One week later, I will schedule a meeting with executive leadership, and during that meeting, I will read back a very lightly edited⁴ version of the transcript of the previous meeting. This is then routinely praised as a keen strategic insight.

I should pause here to explicitly note that — obviously, I hope — this is not an oblique reference to any current or even recent client; if I’d had this meeting recently it would be pretty awkward to answer that “so, I read your blog…” email.⁵ But talking about clients in this way, no matter how obfuscated and vague the description, is always a bit professionally risky. So why risk it?

The thing is, I’m not a people manager. While I can do this kind of work, and I do not begrudge doing it if it is the thing that needs doing, I find it stressful and unfulfilling. I am a technology guy, not a people person. This is generally true of people who elect to go into technology consulting; we know where the management track is, and we didn’t pick it.

If you are going to hire me for my highly specialized technical expertise, I want you to get the maximum value out of it. I know my value; my rates are not low, and I do not want clients to come away with the sense that I only had a couple of “obvious” meetings.

So the intended audience for this piece is potential clients, leaders of teams (or organizations, or companies) who have a general technology problem and are wondering if they need a consultant with my skill-set to help them fix it. Before you decide that your issue is the need to implement a complex distributed system consensus algorithm, check if that is really what’s at issue. Talk to your ICs, and — taking care to make sure they understand that you want honest feedback and that they are safe to offer it — ask them what problems your organization has.

During this meeting it is important to only listen. Especially if you’re at a small company and you are regularly involved in the day-to-day operations, you might feel immediately defensive. Sit with that feeling, and process it later. Don’t unload your emotional state on an employee you have power over.⁶

“Only listening” doesn’t exclusively mean “don’t push back”. You also shouldn’t be committing to fixing anything. While the information you are gathering in these meetings is extremely valuable, and you should probably act on more of it than you will initially want to, your ICs won’t have the full picture. They really may not understand why certain priorities are set the way they are. You’ll need to take that as feedback for improving internal comms rather than “fixing” the perceived problem, and you certainly don’t want to make empty promises.

If you have these conversations directly, you can get something from it that no consultant can offer you: credibility. If you can actively listen, the conversation alone can improve morale. People like having their concerns heard. If, better still, you manage to make meaningful changes to address the concerns you’ve heard about, you can inspire true respect.

As a consultant, I’m going to be seen as some random guy wasting their time with a meeting. Even if you make the changes I recommend, it won’t resonate the same way as someone remembering that they personally told you what was wrong, and you took it seriously and fixed it.

Once you know what the problems are with your organization, and you’ve got solid technical understanding that you really do need that event-driven distributed systems consensus algorithm implemented using Twisted, I’m absolutely your guy. Feel free to get in touch.

While I immensely value my patrons support and am eternally grateful for their support, at — as of this writing — less than $100 per month it doesn’t exactly pay the SF bay area cost-of-living bill. ↩
When I reached out for feedback on a draft of this essay, every other consultant I showed it to said that something similar had happened to them within the last month, all with different clients in different sectors of the industry. I really cannot stress how common it is. ↩
“individual contributor”, if this bit of jargon isn’t universal in your corner of the world; i.e.: “not a manager”. ↩
Mostly, I need to remove a bunch of profanity, but sometimes I will also need to have another interview, usually with a more junior person on the team to confirm that I’m not relaying only a single person’s perspective. It is pretty rare that the top-of-mind problems are specific to one individual, though. ↩
To the extent that this is about anything saliently recent, I am perhaps grumbling about how tech CEOs aren’t taking morale problems generated by the constant drumbeat of layoffs seriously enough. ↩
I am not always in the role of a consultant. At various points in my career, I have also been a leader needing to sit in this particular chair, and believe me, I know it sucks. This would not be a common problem if there weren’t a common reason that leaders tend to avoid this kind of meeting. ↩

by Glyph at February 05, 2024 10:36 PM

January 25, 2024

Glyph Lefkowitz

The Macintosh

A 4k ultrawide classic MacOS desktop screenshot featuring various Finder windows and MPW Workshop

Today is the 40th anniversary of the announcement of the Macintosh. Others have articulated compelling emotional narratives that easily eclipse my own similar childhood memories of the Macintosh family of computers. So instead, I will ask a question:

What is the Macintosh?

As this is the anniversary of the beginning, that is where I will begin. The original Macintosh, the classic MacOS, the original “System Software” are a shining example of “fake it till you make it”. The original mac operating system was fake.

Don’t get me wrong, it was an impressive technical achievement to fake something like this, but what Steve Jobs did was to see a demo of a Smalltalk-76 system, an object-oriented programming environment with 1-to-1 correspondences between graphical objects on screen and runtime-introspectable data structures, a self-hosting high level programming language, memory safety, message passing, garbage collection, and many other advanced facilities that would not be popularized for decades, and make a fake version of it which ran on hardware that consumers could actually afford, by throwing out most of what made the programming environment interesting and replacing it with a much more memory-efficient illusion implemented in 68000 assembler and Pascal.

The machine’s RAM didn’t have room for a kernel. Whatever application was running was in control of the whole system. No protected memory, no preemptive multitasking. It was a house of cards that was destined to collapse. And collapse it did, both in the short term and the long. In the short term, the system was buggy and unstable, and application crashes resulted in system halts and reboots.

In the longer term, the company based on the Macintosh effectively went out of business and was reverse-acquired by NeXT, but they kept the better-known branding of the older company. The old operating system was gradually disposed of, quickly replaced at its core with a significantly more mature generation of operating system technology based on BSD UNIX and Mach. With the removal of Carbon compatibility 4 years ago, the last vestigial traces of it were removed. But even as early as 2004 the Mac was no longer really the Macintosh.

What NeXT had built was much closer to the Smalltalk system that Jobs was originally attempting to emulate. Its programming language, “Objective C” explicitly called back to Smalltalk’s message-passing, right down to the syntax. Objects on the screen now did correspond to “objects” you could send messages to. The development environment understood this too; that was a major selling point.

The NeXSTEP operating system and Objective C runtime did not have garbage collection, but it provided a similar developer experience by providing reference-counting throughout its object model. The original vision was finally achieved, for real, and that’s what we have on our desks and in our backpacks today (and in our pockets, in the form of the iPhone, which is in some sense a tiny next-generation NeXT computer itself).

The one detail I will relate from my own childhood is this: my first computer was not a Mac. My first computer, as a child, was an Amiga. When I was 5, I had a computer with 4096 colors, real multitasking, 3D graphics, and a paint program that could draw hard real-time animations with palette tricks. Then the writing was on the wall for Commodore and I got a computer which had 256 colors, a bunch of old software that was still black and white, an operating system that would freeze if you held down the mouse button on the menu bar and couldn’t even play animations smoothly. Many will relay their first encounter with the Mac as a kind of magic, but mine was a feeling of loss and disappointment. Unlike almost everyone at the time, I knew what a computer really could be, and despite many pleasant and formative experiences with the Macintosh in the meanwhile, it would be a decade before I saw a real one again.

But this is not to deride the faking. The faking was necessary. Xerox was not going to put an Alto running Smalltalk on anyone’s desk. People have always grumbled that Apple products are expensive, but in 2024 dollars, one of these Xerox computers cost roughly $55,000.

The Amiga was, in its own way, a similar sort of fake. It managed its own miracles by putting performance-critical functions into dedicated hardware which rapidly became obsolete as software technology evolved much more rapidly.

Jobs is celebrated as a genius of product design, and he certainly wasn’t bad at it, but I had the rare privilege of seeing the homework he was cribbing from in that subject, and in my estimation he was a B student at best. Where he got an A was bringing a vision to life by creating an organization, both inside and outside of his companies.

If you want a culture-defining technological artifact, everybody in the culture has to be able to get their hands on one. This doesn’t just mean that the builder has to be able to build it. The buyer also has to be able to afford it, obviously. Developers have to be able to develop for it. The buyer has to actually want it; the much-derided “marketing” is a necessary part of the process of making a product what it is. Everyone needs to be able to move together in the direction of the same technological future.

This is why it was so fitting that Tim Cook was made Jobs's successor. The supply chain was the hard part.

The crowning, final achievement of Jobs’s career was the fact that not only did he fake it — the fakes were flying fast and thick at that time in history, even if they mostly weren’t as good — it was that he faked it and then he built the real version and then he bridged the transitions to get to the real thing.

I began here by saying that the Mac isn’t really the Mac, and speaking in terms of a point in time analysis that is true. Its technology today has practically nothing in common with its technology in 1984. This is not merely an artifact of the length of time here: the technology at the core of various UNIXes in 1984 bears a lot of resemblance of UNIX-like operating systems today¹. But looking across its whole history from 1984 to 2024, there is undeniably a continuity to the conceptual “Macintosh”.

Not just as a user, but as a developer moving through time rather than looking at just a few points: the “Macintosh”, such as it is, has transitioned from the Motorola 68000 to the PowerPC to Intel 32-bit to Intel 64-bit to ARM. From obscurely proprietary to enthusiastically embracing open source and then, sadly, much of the way back again. It moved from black and white to color, from desktop to laptop, from Carbon to Cocoa, from Display PostScript to Display PDF, all the while preserving instantly recognizable iconic features like the apple menu and the cursor pointer, while providing developers documentation and SDKs and training sessions that helped them transition their apps through multiple near-complete rewrites as a result of all of these changes.

To paraphrase Abigail Thorne’s first video about Identity, identity is what survives. The Macintosh is an interesting case study in the survival of the idea of a platform, as distinct from the platform itself. It is the Computer of Theseus, a thought experiment successfully brought to life and sustained over time.

If there is a personal lesson to be learned here, I’d say it’s that one’s own efforts need not be perfect. In fact, a significantly flawed vision that you can achieve right now is often much, much better than a perfect version that might take just a little bit longer, if you don’t have the resources to actually sustain going that much longer². You have to be bad at things before you can be good at them. Real artists, as Jobs famously put it, ship.

So my contribution to the 40th anniversary reflections is to say: the Macintosh is dead. Long live the Mac.

Acknowledgments

including, ironically, the modern macOS. ↩
And that is why I am posting this right now, rather than proofreading it further. ↩

by Glyph at January 25, 2024 06:31 AM

Unsigned Commits

I am going to tell you why I don’t think you should sign your Git commits, even though doing so with SSH keys is now easier than ever. But first, to contextualize my objection, I have a brief hypothetical for you, and then a bit of history from the evolution of security on the web.

paper reading “Sign Here:” with a pen poised over it

It seems like these days, everybody’s signing all different kinds of papers.

Bank forms, permission slips, power of attorney; it seems like if you want to securely validate a document, you’ve gotta sign it.

So I have invented a machine that automatically signs every document on your desk, just in case it needs your signature. Signing is good for security, so you should probably get one, and turn it on, just in case something needs your signature on it.

We also want to make sure that verifying your signature is easy, so we will have them all notarized and duplicates stored permanently and publicly for future reference.

No? Not interested?

Hopefully, that sounded like a silly idea to you.

Most adults in modern civilization have learned that signing your name to a document has an effect. It is not merely decorative; the words in the document being signed have some specific meaning and can be enforced against you.

In some ways the metaphor of “signing” in cryptography is bad. One does not “sign” things with “keys” in real life. But here, it is spot on: a cryptographic signature can have an effect.

It should be an input to some software, one that is acted upon. Software does a thing differently depending on the presence or absence of a signature. If it doesn’t, the signature probably shouldn’t be there.

Consider the most venerable example of encryption and signing that we all deal with every day: HTTPS. Many years ago, browsers would happily display unencrypted web pages. The browser would also encrypt the connection, if the server operator had paid for an expensive certificate and correctly configured their server. If that operator messed up the encryption, it would pop up a helpful dialog box that would tell the user “This website did something wrong that you cannot possibly understand. Would you like to ignore this and keep working?” with buttons that said “Yes” and “No”.

Of course, these are not the precise words that were written. The words, as written, said things about “information you exchange” and “security certificate” and “certifying authorities” but “Yes” and “No” were the words that most users read. Predictably, most users just clicked “Yes”.

In the usual case, where users ignored these warnings, it meant that no user ever got meaningful security from HTTPS. It was a component of the web stack that did nothing but funnel money into the pockets of certificate authorities and occasionally present annoying interruptions to users.

In the case where the user carefully read and honored these warnings in the spirit they were intended, adding any sort of transport security to your website was a potential liability. If you got everything perfectly correct, nothing happened except the browser would display a picture of a small green purse. If you made any small mistake, it would scare users off and thereby directly harm your business. You would only want to do it if you were doing something that put a big enough target on your site that you became unusually interesting to attackers, or were required to do so by some contractual obligation like credit card companies.

Keep in mind that the second case here is the best case.

In 2016, the browser makers noticed this problem and started taking some pretty aggressive steps towards actually enforcing the security that HTTPS was supposed to provide, by fixing the user interface to do the right thing. If your site didn’t have security, it would be shown as “Not Secure”, a subtle warning that would gradually escalate in intensity as time went on, correctly incentivizing site operators to adopt transport security certificates. On the user interface side, certificate errors would be significantly harder to disregard, making it so that users who didn’t understand what they were seeing would actually be stopped from doing the dangerous thing.

Nothing fundamental¹ changed about the technical aspects of the cryptographic primitives or constructions being used by HTTPS in this time period, but socially, the meaning of an HTTP server signing and encrypting its requests changed a lot.

Now, let’s consider signing Git commits.

You may have heard that in some abstract sense you “should” be signing your commits. GitHub puts a little green “verified” badge next to commits that are signed, which is neat, I guess. They provide “security”. 1Password provides a nice UI for setting it up. If you’re not a 1Password user, GitHub itself recommends you put in just a few lines of configuration to do it with either a GPG, SSH, or even an S/MIME key.

But while GitHub’s documentation quite lucidly tells you how to sign your commits, its explanation of why is somewhat less clear. Their purse is the word “Verified”; it’s still green. If you enable “vigilant mode”, you can make the blank “no verification status” option say “Unverified”, but not much else changes.

This is like the old-style HTTPS verification “Yes”/“No” dialog, except that there is not even an interruption to your workflow. They might put the “Unverified” status on there, but they’ve gone ahead and clicked “Yes” for you.

It is tempting to think that the “HTTPS” metaphor will map neatly onto Git commit signatures. It was bad when the web wasn’t using HTTPS, and the next step in that process was for Let’s Encrypt to come along and for the browsers to fix their implementations. Getting your certificates properly set up in the meanwhile and becoming familiar with the tools for properly doing HTTPS was unambiguously a good thing for an engineer to do. I did, and I’m quite glad I did so!

However, there is a significant difference: signing and encrypting an HTTPS request is ephemeral; signing a Git commit is functionally permanent.

This ephemeral nature meant that errors in the early HTTPS landscape were easily fixable. Earlier I mentioned that there was a time where you might not want to set up HTTPS on your production web servers, because any small screw-up would break your site and thereby your business. But if you were really skilled and you could see the future coming, you could set up monitoring, avoid these mistakes, and rapidly recover. These mistakes didn’t need to badly break your site.

We can extend the analogy to HTTPS, but we have to take a detour into one of the more unpleasant mistakes in HTTPS’s history: HTTP Public Key Pinning, or “HPKP”. The idea with HPKP was that you could publish a record in an HTTP header where your site commits² to using certain certificate authorities for a period of time, where that period of time could be “forever”. Attackers gonna attack, and attack they did. Even without getting attacked, a site could easily commit “HPKP Suicide” where they would pin the wrong certificate authority with a long timeline, and their site was effectively gone for every browser that had ever seen those pins. As a result, after a few years, HPKP was completely removed from all browsers.

Git commit signing is even worse. With HPKP, you could easily make terrible mistakes with permanent consequences even though you knew the exact meaning of the data you were putting into the system at the time you were doing it. With signed commits, you are saying something permanently, but you don’t really know what it is that you’re saying.

Today, what is the benefit of signing a Git commit? GitHub might present it as “Verified”. It’s worth noting that only GitHub will do this, since they are the root of trust for this signing scheme. So, by signing commits and registering your keys with GitHub, you are, at best, helping to lock in GitHub as a permanent piece of infrastructure that is even harder to dislodge because they are not only where your code is stored, but also the arbiters of whether or not it is trustworthy.

In the future, what is the possible security benefit? If we all collectively decide we want Git to be more secure, then we will need to meaningfully treat signed commits differently from unsigned ones.

There’s a long tail of unsigned commits several billion entries long. And those are in the permanent record as much as the signed ones are, so future tooling will have to be able to deal with them. If, as stewards of Git, we wish to move towards a more secure Git, as the stewards of the web moved towards a more secure web, we do not have the option that the web did. In the browser, the meaning of a plain-text HTTP or incorrectly-signed HTTPS site changed, in order to encourage the site’s operator to change the site to be HTTPS.

In contrast, the meaning of an unsigned commit cannot change, because there are zillions of unsigned commits lying around in critical infrastructure and we need them to remain there. Commits cannot meaningfully be changed to become signed retroactively. Unlike an online website, they are part of a historical record, not an operating program. So we cannot establish the difference in treatment by changing how unsigned commits are treated.

That means that tooling maintainers will need to provide some difference in behavior that provides some incentive. With HTTPS where the binary choice was clear: don’t present sites with incorrect, potentially compromised configurations to users. The question was just how to achieve that. With Git commits, the difference in treatment of a “trusted” commit is far less clear.

If you will forgive me a slight straw-man here, one possible naive interpretation is that a “trusted” signed commit is that it’s OK to run in CI. Conveniently, it’s not simply “trusted” in a general sense. If you signed it, it’s trusted to be from you, specifically. Surely it’s fine if we bill the CI costs for validating the PR that includes that signed commit to your GitHub account?

Now, someone can piggy-back off a 1-line typo fix that you made on top of an unsigned commit to some large repo, making you implicitly responsible for transitively signing all unsigned parent commits, even though you haven’t looked at any of the code.

Remember, also, that the only central authority that is practically trustable at this point is your GitHub account. That means that if you are using a third-party CI system, even if you’re using a third-party Git host, you can only run “trusted” code if GitHub is online and responding to requests for its “get me the trusted signing keys for this user” API. This also adds a lot of value to a GitHub credential breach, strongly motivating attackers to sneakily attach their own keys to your account so that their commits in unrelated repos can be “Verified” by you.

Let’s review the pros and cons of turning on commit signing now, before you know what it is going to be used for:

Pro	Con
Green “Verified” badge	Unknown, possibly unlimited future liability for the consequences of running code in a commit you signed
	Further implicitly cementing GitHub as a centralized trust authority in the open source world
	Introducing unknown reliability problems into infrastructure that relies on commit signatures
	Temporary breach of your GitHub credentials now lead to potentially permanent consequences if someone can smuggle a new trusted key in there
	New kinds of ongoing process overhead as commit-signing keys become new permanent load-bearing infrastructure, like “what do I do with expired keys”, “how often should I rotate these”, and so on

I feel like the “Con” column is coming out ahead.

That probably seemed like increasingly unhinged hyperbole, and it was.

In reality, the consequences are unlikely to be nearly so dramatic. The status quo has a very high amount of inertia, and probably the “Verified” badge will remain the only visible difference, except for a few repo-specific esoteric workflows, like pushing trust verification into offline or sandboxed build systems. I do still think that there is some potential for nefariousness around the “unknown and unlimited” dimension of any future plans that might rely on verifying signed commits, but any flaws are likely to be subtle attack chains and not anything flashy and obvious.

But I think that one of the biggest problems in information security is a lack of threat modeling. We encrypt things, we sign things, we institute rotation policies and elaborate useless rules for passwords, because we are looking for a “best practice” that is going to save us from having to think about what our actual security problems are.

I think the actual harm of signing git commits is to perpetuate an engineering culture of unquestioningly cargo-culting sophisticated and complex tools like cryptographic signatures into new contexts where they have no use.

Just from a baseline utilitarian philosophical perspective, for a given action A, all else being equal, it’s always better not to do A, because taking an action always has some non-zero opportunity cost even if it is just the time taken to do it. Epsilon cost and zero benefit is still a net harm. This is even more true in the context of a complex system. Any action taken in response to a rule in a system is going to interact with all the other rules in that system. You have to pay complexity-rent on every new rule. So an apparently-useless embellishment like signing commits can have potentially far-reaching consequences in the future.

Git commit signing itself is not particularly consequential. I have probably spent more time writing this blog post than the sum total of all the time wasted by all programmers configuring their git clients to add useless signatures; even the relatively modest readership of this blog will likely transfer more data reading this post than all those signatures will take to transmit to the various git clients that will read them. If I just convince you not to sign your commits, I don’t think I’m coming out ahead in the felicific calculus here.

What I am actually trying to point out here is that it is useful to carefully consider how to avoid adding junk complexity to your systems. One area where junk tends to leak in to designs and to cultures particularly easily is in intimidating subjects like trust and safety, where it is easy to get anxious and convince ourselves that piling on more stuff is safer than leaving things simple.

If I can help you avoid adding even a little bit of unnecessary complexity, I think it will have been well worth the cost of the writing, and the reading.

Acknowledgments

Yes yes I know about heartbleed and Bleichenbacher attacks and adoption of forward-secret ciphers and CRIME and BREACH and none of that is relevant here, okay? Jeez. ↩
Do you see what I did there. ↩

by Glyph at January 25, 2024 12:29 AM

January 23, 2024

Glyph Lefkowitz

Your Text Editor (Probably) Isn’t Malware Any More

In 2015, I wrote one of my more popular blog posts, “Your Text Editor Is Malware”, about the sorry state of security in text editors in general, but particularly in Emacs and Vim.

It’s nearly been a decade now, so I thought I’d take a moment to survey the world of editor plugins and see where we are today. Mostly, this is to allay fears, since (in today’s landscape) that post is unreasonably alarmist and inaccurate, but people are still reading it.

Problem	Is It Fixed?
`vim.org` is not available via `https`	Yep! `http://www.vim.org/` redirects to `https://www.vim.org/` now.
Emacs's HTTP client doesn't verify certificates by default	Mostly! The documentation is incorrect and there are some UI problems¹, but it doesn’t blindly connect insecurely.
ELPA and MELPA supply plaintext-HTTP package sources	Kinda. MELPA correctly responds to HTTP only with redirects to HTTPS, and ELPA at least offers HTTPS and uses HTTPS URLs exclusively in the default configuration.
You have to ship your own trust roots for Emacs.	Fixed! The default installation of Emacs on every platform I tried (including Windows) seems to be providing trust roots.
MELPA offers to install code off of a wiki.	Yes. Wiki packages were disabled entirely in 2018.

The big takeaway here is that the main issue of there being no security whatsoever on Emacs and Vim package installation and update has been fully corrected.

Where To Go Next?

Since I believe that post was fairly influential, in particular in getting MELPA to tighten up its security, let me take another big swing at a call to action here.

More modern editors have made greater strides towards security. VSCode, for example, has enabled the Chromium sandbox and added some level of process separation. Emacs has not done much here yet, but over the years it has consistently surprised me with its ability to catch up to its more modern competitors, so I hope it will surprise me here as well.

Even for VSCode, though, this sandbox still seems pretty permissive — plugins still seem to execute with the full trust of the editor itself — but it's a big step in the right direction. This is a much bigger task than just turning on HTTPS, but I really hope that editors start taking the threat of rogue editor packages seriously before attackers do, and finding ways to sandbox and limit the potential damage from third-party plugins, maybe taking a cue from other tools.

Acknowledgments

the documention still says “gnutls-verify-error” defaults to nil and that means no certificate verification, and maybe it does do that if you are using raw TLS connections, but in practice, url-retrieve-synchronously does appear to present an interactive warning before proceeding if the certificate is invalid or expired. It still has yet to catch up with web browsers from 2016, in that it just asks you “do you want to do this horribly dangerous thing? y/n” but that is a million times better than proceeding without user interaction. ↩

by Glyph at January 23, 2024 02:05 AM

January 22, 2024

Glyph Lefkowitz

Okay, I’m A Centrist I Guess

Today I saw a short YouTube video about “cozy games” and started writing a comment, then realized that this was somehow prompting me to write the most succinct summary of my own personal views on politics and economics that I have ever managed. So, here goes.

Apparently all I needed to trim down 50,000 words on my annoyance at how the term “capitalism” is frustratingly both a nexus for useful critque and also reductive thought-terminating clichés was to realize that Animal Crossing: New Horizons is closer to my views on political economy than anything Adam Smith or Karl Marx ever wrote.

Cozy games illustrate that the core mechanics of capitalism are fun and motivating, in a laboratory environment. It’s fun to gather resources, to improve one’s skills, to engage in mutually beneficial exchanges, to collect things, to decorate. It’s tremendously motivating. Even merely pretending to do those things can captivate huge amounts of our time and attention.

In real life, people need to be motivated to do stuff. Not because of some moral deficiency, but because in a large complex civilization it’s hard to tell what needs doing. By the time it’s widely visible to a population-level democratic consensus of non-experts that there is an unmet need — for example, trash piling up on the street everywhere indicating a need for garbage collection — that doesn’t mean “time to pick up some trash”, it means “the sanitation system has collapsed, you’re probably going to get cholera”. We need a system that can identify utility signals more granularly and quickly, towards the edges of the social graph. To allow person A to earn “value credits” of some kind for doing work that others find valuable, then trade those in to person B for labor which they find valuable, even if it is not clearly obvious to anyone else why person A wants that thing. Hence: money.

So, a market can provide an incentive structure that productively steers people towards needs, by aggregating small price signals in a distributed way, via the communication technology of “money”. Authoritarian communist states are famously bad at this, overproducing “necessary” goods in ways that can hold their own with the worst excesses of capitalists, while under-producing “luxury” goods that are politically seen as frivolous.

This is the kernel of truth around which the hardcore capitalist bootstrap grindset ideologues build their fabulist cinematic universe of cruelty. Markets are motivating, they reason, therefore we must worship the market as a god and obey its every whim. Markets can optimize some targets, therefore we must allow markets to optimize every target. Markets efficiently allocate resources, and people need resources to live, therefore anyone unable to secure resources in a market is undeserving of life. Thus we begin at “market economies provide some beneficial efficiencies” and after just a bit of hand-waving over some inconvenient details, we get to “thus, we must make the poor into a blood-sacrifice to Moloch, otherwise nobody will ever work, and we will all die, drowning in our own laziness”. “The cruelty is the point” is a convenient phrase, but among those with this worldview, the prosperity is the point; they just think the cruelty is the only engine that can possibly drive it.

Cozy games are therefore a centrist¹ critique of capitalism. They present a world with the prosperity, but without the cruelty. More importantly though, by virtue of the fact that people actually play them in large numbers, they demonstrate that the cruelty is actually unnecessary.

You don’t need to play a cozy game. Tom Nook is not going to evict you from your real-life house if you don’t give him enough bells when it’s time to make rent. In fact, quite the opposite: you have to take time away from your real-life responsibilities and work, in order to make time for such a game. That is how motivating it is to engage with a market system in the abstract, with almost exclusively positive reinforcement.

What cozy games are showing us is that a world with tons of “free stuff” — universal basic income, universal health care, free education, free housing — will not result in a breakdown of our society because “no one wants to work”. People love to work.

If we can turn the market into a cozy game, with low stakes and a generous safety net, more people will engage with it, not fewer. People are not lazy; laziness does not exist. The motivation that people need from a market economy is not a constant looming threat of homelessness, starvation and death for themselves and their children, but a fun opportunity to get a five-star island rating.

Acknowledgments

Okay, I guess “far left” on the current US political compass, but in a just world socdems would be centrists. ↩

by Glyph at January 22, 2024 05:41 PM

December 21, 2023

Hynek Schlawack

Don’t Start Pull Requests from Your Main Branch

When contributing to other users’ repositories, always start a new branch in your fork.

by Hynek Schlawack (hs@ox.cx) at December 21, 2023 05:00 AM

December 08, 2023

Glyph Lefkowitz

Annotated At Runtime

PEP 0593 added the ability to add arbitrary user-defined metadata to type annotations in Python.

At type-check time, such annotations are… inert. They don’t do anything. Annotated[int, X] just means int to the type-checker, regardless of the value of X. So the entire purpose of Annotated is to provide a run-time API to consume metadata, which integrates with the type checker syntactically, but does not otherwise disturb it.

Yet, the documentation for this central purpose seems, while not exactly absent, oddly incomplete.

The PEP itself simply says:

A tool or library encountering an Annotated type can scan through the annotations to determine if they are of interest (e.g., using isinstance()).

But it’s not clear where “the annotations” are, given that the PEP’s entire “consuming annotations” section does not even mention the __metadata__ attribute where the annotation’s arguments go, which was only even added to CPython’s documentation. Its list of examples just show the repr() of the relevant type.

There’s also a bit of an open question of what, exactly, we are supposed to isinstance()-ing here. If we want to find arguments to Annotated, presumably we need to be able to detect if an annotation is an Annotated. But isinstance(Annotated[int, "hello"], Annotated) is both False at runtime, and also a type-checking error, that looks like this:

Argument 2 to "isinstance" has incompatible type "<typing special form>"; expected "_ClassInfo"

The actual type of these objects, typing._AnnotatedAlias, does not seem to have a publicly available or documented alias, so that seems like the wrong route too.

Now, it certainly works to escape-hatch your way out of all of this with an Any, build some version-specific special-case hacks to dig around in the relevant namespaces, access __metadata__ and call it a day. But this solution is … unsatisfying.

What are you looking for?

Upon encountering these quirks, it is understandable to want to simply ask the question “is this annotation that I’m looking at an Annotated?” and to be frustrated that it seems so obscure to straightforwardly get an answer to that question without disabling all type-checking in your meta-programming code.

However, I think that this is a slight misframing of the problem. Code that is inspecting parameters for an annotation is going to do something with that annotation, which means that it must necessarily be looking for a specific set of annotations. Therefore the thing we want to pass to isinstance is not some obscure part of the annotations’ internals, but the actual interesting annotation type from your framework or application.

When consuming an Annotated parameter, there are 3 things you probably want to know:

What was the parameter itself? (type: The type you passed in.)
What was the name of the annotated object (i.e.: the parameter name, the attribute name) being passed the parameter? (type: str)
What was the actual type being annotated? (type: type)

And the things that we have are the type of the Annotated we’re querying for, and the object with annotations we are interrogating. So that gives us this function signature:

def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    ...

To extract this information, all we need are get_args and get_type_hints; no need for __metadata__ or get_origin or any other metaprogramming. Here’s a recipe:

def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    for k, v in get_type_hints(annotated, include_extras=True).items():
        all_args = get_args(v)
        if not all_args:
            continue
        actual, *rest = all_args
        for arg in rest:
            if isinstance(arg, kind):
                yield k, arg, actual

It might seem a little odd to be blindly assuming that get_args(...)[0] will always be the relevant type, when that is not true of unions or generics. Note, however, that we are only yielding results when we have found the instance type in the argument list; our arbitrary user-defined instance isn’t valid as a type annotation argument in any other context. It can’t be part of a Union or a Generic, so we can rely on it to be an Annotated argument, and from there, we can make that assumption about the format of get_args(...).

This can give us back the annotations that we’re looking for in a handy format that’s easy to consume. Here’s a quick example of how you might use it:

@dataclass
class AnAnnotation:
    name: str

def a_function(
    a: str,
    b: Annotated[int, AnAnnotation("b")],
    c: Annotated[float, AnAnnotation("c")],
) -> None:
    ...

print(list(annotated_by(a_function, AnAnnotation)))

# [('b', AnAnnotation(name='b'), <class 'int'>),
#  ('c', AnAnnotation(name='c'), <class 'float'>)]

Acknowledgments

by Glyph at December 08, 2023 01:56 AM

December 06, 2023

Glyph Lefkowitz

Safer, Not Later

Facebook — and by extension, most of Silicon Valley — rightly gets a lot of shit for its old motto, “Move Fast and Break Things”.

As a general principle for living your life, it is obviously terrible advice, and it leads to a lot of the horrific outcomes of Facebook’s business.

I don’t want to be an apologist for Facebook. I also do not want to excuse the worldview that leads to those kinds of outcomes. However, I do want to try to help laypeople understand how software engineers—particularly those situated at the point in history where this motto became popular—actually meant by it. I would like more people in the general public to understand why, to engineers, it was supposed to mean roughly the same thing as Facebook’s newer, goofier-sounding “Move fast with stable infrastructure”.

Move Slow

In the bad old days, circa 2005, two worlds within the software industry were colliding.

The old world was the world of integrated hardware/software companies, like IBM and Apple, and shrink-wrapped software companies like Microsoft and WordPerfect. The new world was software-as-a-service companies like Google, and, yes, Facebook.

In the old world, you delivered software in a physical, shrink-wrapped box, on a yearly release cycle. If you were really aggressive you might ship updates as often as quarterly, but faster than that and your physical shipping infrastructure would not be able to keep pace with new versions. As such, development could proceed in long phases based on those schedules.

In practice what this meant was that in the old world, when development began on a new version, programmers would go absolutely wild adding incredibly buggy, experimental code to see what sorts of things might be possible in a new version, then slowly transition to less coding and more testing, eventually settling into a testing and bug-fixing mode in the last few months before the release.

This is where the idea of “alpha” (development testing) and “beta” (user testing) versions came from. Software in that initial surge of unstable development was extremely likely to malfunction or even crash. Everyone understood that. How could it be otherwise? In an alpha test, the engineers hadn’t even started bug-fixing yet!

In the new world, the idea of a 6-month-long “beta test” was incoherent. If your software was a website, you shipped it to users every time they hit “refresh”. The software was running 24/7, on hardware that you controlled. You could be adding features at every minute of every day. And, now that this was possible, you needed to be adding those features, or your users would get bored and leave for your competitors, who would do it.

But this came along with a new attitude towards quality and reliability. If you needed to ship a feature within 24 hours, you couldn’t write a buggy version that crashed all the time, see how your carefully-selected group of users used it, collect crash reports, fix all the bugs, have a feature-freeze and do nothing but fix bugs for a few months. You needed to be able to ship a stable version of your software on Monday and then have another stable version on Tuesday.

To support this novel sort of development workflow, the industry developed new technologies. I am tempted to tell you about them all. Unit testing, continuous integration servers, error telemetry, system monitoring dashboards, feature flags... this is where a lot of my personal expertise lies. I was very much on the front lines of the “new world” in this conflict, trying to move companies to shorter and shorter development cycles, and move away from the legacy worldview of Big Release Day engineering.

Old habits die hard, though. Most engineers at this point were trained in a world where they had months of continuous quality assurance processes after writing their first rough draft. Such engineers feel understandably nervous about being required to ship their probably-buggy code to paying customers every day. So they would try to slow things down.

Of course, when one is deploying all the time, all other things being equal, it’s easy to ship a show-stopping bug to customers. Organizations would do this, and they’d get burned. And when they’d get burned, they would introduce Processes to slow things down. Some of these would look like:

Let’s keep a special version of our code set aside for testing, and then we’ll test that for a few weeks before sending it to users.
The heads of every department need to sign-off on every deployed version, so everyone needs to spend a day writing up an explanation of their changes.
QA should sign off too, so let’s have an extensive sign-off process where each individual tester does a fills out a sign-off form.

Then there’s my favorite version of this pattern, where management decides that deploys are inherently dangerous, and everyone should probably just stop doing them. It typically proceeds in stages:

Let’s have a deploy freeze, and not deploy on Fridays; don’t want to mess up the weekend debugging an outage.
Actually, let’s extend that freeze for all of December, we don’t want to mess up the holiday shopping season.
Actually why not have the freeze extend into the end of November? Don’t want to mess with Thanksgiving and the Black Friday weekend.
Some of our customers are in India, and Diwali’s also a big deal. Why not extend the freeze from the end of October?
But, come to think of it, we do a fair amount of seasonal sales for Halloween too. How about no deployments from October 10 onward?
You know what, sometimes people like to use our shop for Valentine’s day too. Let’s just never deploy again.

This same anti-pattern can repeat itself with an endlessly proliferating list of “environments”, whose main role ends up being to ensure that no code ever makes it to actual users.

… and break things anyway

As you may have begun to suspect, there are a few problems with this style of software development.

Even back in the bad old days of the 90s when you had to ship disks in boxes, this methodology contained within itself the seeds of its own destruction. As Joel Spolsky memorably put it, Microsoft discovered that this idea that you could introduce a ton of bugs and then just fix them later came along with some massive disadvantages:

The very first version of Microsoft Word for Windows was considered a “death march” project. It took forever. It kept slipping. The whole team was working ridiculous hours, the project was delayed again, and again, and again, and the stress was incredible. [...] The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote “return 12;” and waited for the bug report to come in [...]. The schedule was merely a checklist of features waiting to be turned into bugs. In the post-mortem, this was referred to as “infinite defects methodology”.

Which lead them to what is perhaps the most ironclad law of software engineering:

In general, the longer you wait before fixing a bug, the costlier (in time and money) it is to fix.

A corollary to this is that the longer you wait to discover a bug, the costlier it is to fix.

Some bugs can be found by code review. So you should do code review. Some bugs can be found by automated tests. So you should do automated testing. Some bugs will be found by monitoring dashboards, so you should have monitoring dashboards.

So why not move fast?

But here is where Facebook’s old motto comes in to play. All of those principles above are true, but here are two more things that are true:

No matter how much code review, automated testing, and monitoring you have some bugs can only be found by users interacting with your software.
No bugs can be found merely by slowing down and putting the deploy off another day.

Once you have made the process of releasing software to users sufficiently safe that the potential damage of any given deployment can be reliably limited, it is always best to release your changes to users as quickly as possible.

More importantly, as an engineer, you will naturally have an inherent fear of breaking things. If you make no changes, you cannot be blamed for whatever goes wrong. Particularly if you grew up in the Old World, there is an ever-present temptation to slow down, to avoid shipping, to hold back your changes, just in case.

You will want to move slow, to avoid breaking things. Better to do nothing, to be useless, than to do harm.

For all its faults as an organization, Facebook did, and does, have some excellent infrastructure to avoid breaking their software systems in response to features being deployed to production. In that sense, they’d already done the work to avoid the “harm” of an individual engineer’s changes. If future work needed to be performed to increase safety, then that work should be done by the infrastructure team to make things safer, not by every other engineer slowing down.

The problem is that slowing down is not actually value neutral. To quote myself here:

If you can’t ship a feature, you can’t fix a bug.

When you slow down just for the sake of slowing down, you create more problems.

The first problem that you create is smashing together far too many changes at once.

You’ve got a development team. Every engineer on that team is adding features at some rate. You want them to be doing that work. Necessarily, they’re all integrating them into the codebase to be deployed whenever the next deployment happens.

If a problem occurs with one of those changes, and you want to quickly know which change caused that problem, ideally you want to compare two versions of the software with the smallest number of changes possible between them. Ideally, every individual change would be released on its own, so you can see differences in behavior between versions which contain one change each, not a gigantic avalanche of changes where any one of hundred different features might be the culprit.

If you slow down for the sake of slowing down, you also create a process that cannot respond to failures of the existing code.

I’ve been writing thus far as if a system in a steady state is inherently fine, and each change carries the possibility of benefit but also the risk of failure. This is not always true. Changes don’t just occur in your software. They can happen in the world as well, and your software needs to be able to respond to them.

Back to that holiday shopping season example from earlier: if your deploy freeze prevents all deployments during the holiday season to prevent breakages, what happens when your small but growing e-commerce site encounters a catastrophic bug that has always been there, but only occurs when you have more than 10,000 concurrent users. The breakage is coming from new, never before seen levels of traffic. The breakage is coming from your success, not your code. You’d better be able to ship a fix for that bug real fast, because your only other option to a fast turn-around bug-fix is shutting down the site entirely.

And if you see this failure for the first time on Black Friday, that is not the moment where you want to suddenly develop a new process for deploying on Friday. The only way to ensure that shipping that fix is easy is to ensure that shipping any fix is easy. That it’s a thing your whole team does quickly, all the time.

The motto “Move Fast And Break Things” caught on with a lot of the rest of Silicon Valley because we are all familiar with this toxic, paralyzing fear.

After we have the safety mechanisms in place to make changes as safe as they can be, we just need to push through it, and accept that things might break, but that’s OK.

Some Important Words are Missing

The motto has an implicit preamble, “Once you have done the work to make broken things safe enough, then you should move fast and break things”.

When you are in a conflict about whether to “go fast” or “go slow”, the motto is not supposed to be telling you that the answer is an unqualified “GOTTA GO FAST”. Rather, it is an exhortation to take a beat and to go through a process of interrogating your motivation for slowing down. There are three possible things that a person saying “slow down” could mean about making a change:

It is broken in a way you already understand. If this is the problem, then you should not make the change, because you know it’s not ready. If you already know it’s broken, then the change simply isn’t done. Finish the work, and ship it to users when it’s finished.
It is risky in a way that you don’t have a way to defend against. As far as you know, the change works, but there’s a risk embedded in it that you don’t have any safety tools to deal with. If this is the issue, then what you should do is pause working on this change, and build the safety first.
It is making you nervous in a way you can’t articulate. If you can’t describe an known defect as in point 1, and you can’t outline an improved safety control as in step 2, then this is the time to let go, accept that you might break something, and move fast.

The implied context for “move fast and break things” is only in that third condition. If you’ve already built all the infrastructure that you can think of to build, and you’ve already fixed all the bugs in the change that you need to fix, any further delay will not serve you, do not have any further delays.

Unfortunately, as you probably already know,

This motto did a lot of good in its appropriate context, at its appropriate time. It’s still a useful heuristic for engineers, if the appropriate context is generally understood within the conversation where it is used.

However, it has clearly been taken to mean a lot of significantly more damaging things.

Purely from an engineering perspective, it has been reasonably successful. It’s less and less common to see people in the industry pushing back against tight deployment cycles. It’s also less common to see the basic safety mechanisms (version control, continuous integration, unit testing) get ignored. And many ex-Facebook engineers have used this motto very clearly under the understanding I’ve described here.

Even in the narrow domain of software engineering it is misused. I’ve seen it used to argue a project didn’t need tests; that a deploy could be forced through a safety process; that users did not need to be informed of a change that could potentially impact them personally.

Outside that domain, it’s far worse. It’s generally understood to mean that no safety mechanisms are required at all, that any change a software company wants to make is inherently justified because it’s OK to “move fast”. You can see this interpretation in the way that it has leaked out of Facebook’s engineering culture and suffused its entire management strategy, blundering through market after market and issue after issue, making catastrophic mistakes, making a perfunctory apology and moving on to the next massive harm.

In the decade since it has been retired as Facebook’s official motto, it has been used to defend some truly horrific abuses within the tech industry. You only need to visit the orange website to see it still being used this way.

Even at its best, “move fast and break things” is an engineering heuristic, it is not an ethical principle. Even within the context I’ve described, it’s only okay to move fast and break things. It is never okay to move fast and harm people.

So, while I do think that it is broadly misunderstood by the public, it’s still not a thing I’d ever say again. Instead, I propose this:

Make it safer, don’t make it later.

Acknowledgments

by Glyph at December 06, 2023 08:01 PM

November 20, 2023

Jp Calderone

First Cabal code contribution

First Cabal code contribution 2023-11-20

First Cabal code contribution

introduction

A few weeks ago Shae Erisson and I took a look at Cabal to try to find a small contribution we could make. And we managed it! Here’s the story.

First, an introduction. Cabal is a system for building and packaging Haskell libraries and programs. A few years ago I started learning Haskell in earnest. After a little time spent with stack, a similar tool, I’ve more recently switched to using Cabal and have gotten a great deal of value from it.

I did contribute a trivial documentation improvement earlier this year but before the experience related here I hadn’t looked at the code.

Shae and I had a hunch that we might find some good beginner tasks in the Cabal issue tracker so we started there.

I had previously noticed that the Cabal project uses a label for supposedly “easy” issues. Shae also pointed out a group of labels related to “new comers”. We found what appeared to be a tiered system with issues group by the anticipated size of the change. For example, there is a label for “one line change” issues and another for “small feature” and a couple more in between. We looked at the “one line change” issues first but the correct one line change was not obvious to us. These issues have a lot of discussion on them and we didn’t want to try to figure out how to resolve the open questions. We decided to look at a more substantial issue and hope that there would be less uncertainty around them.

The candidate we found, Use ProjectFlags in CmdClean, was labeled with “one file change” and looked promising. It didn’t have a description (presumably the reporter felt the summary made the task sufficiently obvious) but there was a promising hint from another contributor and a link to a very similar change made in another part of the codebase.

dev env

While we were looking around, Shae was getting a development environment set up. The Cabal project has a nice contributor guideline linked from the end of their README that we found without too much trouble. Since Shae and I both use Nix, Shae took the Nix-oriented route and got a development environment using:

nix develop github:yvan-sraka/cabal.nix

(I’ll replicate the disclaimer from the contributor that this is “supported solely by the community”.)

This took around 30 minutes to download and build (and Shae has pretty fast hardware). A nice improvement here might be to set up a Nix build cache for this development shell for everyone who wants to do Nix-based development on Cabal to share. I expect this would take the first-run setup time from 30 minutes to just one or two at most (depending mostly on download bandwidth available).

By the time we felt like we had a handle on what code changes to make and had found some of the pieces in a checkout of the project the development environment was (almost) ready. The change itself was quite straightforward.

the task

Our objective was to remove some implementation duplication in support of the --project-dir and --project-file options. Several cabal commands accept these options and some of them re-implement them where they could instead be sharing code.

We took the existing data type representing the options for the cabal clean command:

data CleanFlags = CleanFlags
  { cleanSaveConfig :: Flag Bool
  , cleanVerbosity :: Flag Verbosity
  , cleanDistDir :: Flag FilePath
  , cleanProjectDir :: Flag FilePath
  , cleanProjectFile :: Flag FilePath
  }
  deriving (Eq)

And removed the final two fields which exactly overlap with the fields defined by ProjectFlags. Then we changed the type of cleanCommand. It was:

cleanCommand :: CommandUI CleanFlags

And now it is:

cleanCommand :: CommandUI (ProjectFlags, CleanFlags)

Finally, we updated nearby code to reflect the type change. liftOptionL made this straightforward since it let us lift the existing operations on CleanFlags or ProjectFlags into the tuple.

One minor wrinkle we encountered is that ProjectFlags isn’t a perfect fit for the clean command. Specifically, the former includes support for requesting that a cabal.project file be ignored but this option doesn’t make sense for the clean command. This was easy to handle because someone had also already noticed and written a helper to remove the extra option where it is not wanted. However, this might suggest that the factoring of the types could be further improved so that it’s easier to add only the appropriate options.

At this point we thought we were done. Cabal CI proved we were mistaken. The details of the failure were a little tricky to track down (as is often the case for many projects, alas). Eventually we dug the following error out of the failing job’s logs:

Error: cabal: option `--project-dir' is ambiguous; could be one of:
  --project-dir=DIR Set the path of the project directory
  --project-dir=DIR Set the path of the project directory

Then we dug the test command out of the GitHub Actions configuration to try to reproduce the failure locally. Unfortunately, when we ran the command we got far more test failures than CI had. We decided to build cabal and test the case manually. This went smoothly enough and we were quickly able to find the problem and manually confirm the fix (which was simply to remove the field for --project-dir in a place we missed). GitHub Actions rewarded us for this change with a green checkmark.

the rest

At this point we waited for a code review on the PR. We quickly received two and addressed the minor point made (we had used NamedFieldPuns, of which I am a fan, but the module already used RecordWildCards and it’s a bit silly to use both of these so close together).

Afterwards, the PR was approved and we added the “squash+merge me” label to trigger some automation to merge the changes (which it did after a short delay to give other contributors a chance to chime in if they wished).

In the end the change was +41 / -42 lines. For a refactoring that reduced duplication this isn’t quite as much of a reduction as I was hoping for. The explanation ends up being that importing the necessary types and helper functions almost exactly balanced out the deleted duplicate fields. Considering the very small scope of this refactoring this ends up not being very surprising.

Thanks to Shae for working on this with me, to the Cabal maintainers who reviewed our PR, to all the Cabal contributors who made such a great tool, and to the “Cabal and Hackage” room on matrix.org for answering random questions.

by Jean-Paul Calderone at November 20, 2023 12:00 AM

November 19, 2023

Jp Calderone

Good News Everyone

Good News Everyone 2023-11-19

Good News Everyone

I’m turning my blog back on! I know you’re so excited.

I’m going to try to gather up my older writing and host it here. Also, I promise to try to consider making this a little prettier.

Meanwhile, check out my new post about contributing to Cabal!

by Jean-Paul Calderone at November 19, 2023 12:00 AM

November 09, 2023

Glyph Lefkowitz

iOS Mail To Omnifocus Task

One of my longest-running frustrations with iOS is that the default mail app does not have a “share” action, making it impossible to do the one thing that a mail client needs to be able to do for me, which is to selectively turn messages into tasks. This deficiency has multiple components which makes it difficult to work around:

There is no UI to “share” a message from within the mail app, so you can’t share it to OmniFocus or a shortcut.
There is no way to query for a mail message in Shortcuts and then get some kind of “message” object, so there’s no way you can start in Shortcuts and manually invoke something.
There’s no such thing as MailKit on iOS, so third-party apps can’t query your mail database either.

To work around this, I’ve long subscribed to the “AirMail” app, which has a “message to omnifocus” action but is otherwise kind of a buggy mess.

But today, I read that you can set up an iPad in a split-screen view and drag messages from the built-in Mail app’s message list view into the OmniFocus inbox, and I extrapolated, and discovered how to get Mail-message-to-Omnifocus-task work on an iPhone.

I’m thrilled that this functionality exists, but it is a bit of a masterclass in how to get a terrible UX out of a series of decisions that were probably locally reasonable. So, without any further ado, here’s how you do it:

Open up mail.app, and find the message you want to share in the message list. Here, you have two choices:
1. With one finger, press and hold on a message until you feel a haptic. When you feel the haptic, immediately move your finger a little bit, before you see the preview come up. This lets you operate directly from the message list, but is very fiddly.
2. Tap the message to open the detail view, then press and hold on the sent date in the top right.
Continue holding down with your first finger. With a second finger, swipe up from the bottom to enter the multitasking view, or to go back to your home screen. While holding your first finger in place, either switch to or launch OmniFocus.
With your second finger, navigate to the Inbox, drag your first finger to the bottom of the list, and release it. Voila! You should have a task with a brief summary and a link back to the message.
Swipe up from the bottom to switch back to Mail, then archive the message.

by Glyph at November 09, 2023 05:48 AM

October 06, 2023

Hynek Schlawack

How to Stop macOS from Stealing Focus after Switching Apps Across Desktops

For years, I’ve had a macOS bug that occurred randomly and that I could only fix by rebooting or logging out. The Internet is full of similar problems, but I never found a solution to mine, and my pleas for help on Twitter went unanswered. Now, I have found a workaround myself.

by Hynek Schlawack (hs@ox.cx) at October 06, 2023 12:00 AM

August 29, 2023

Glyph Lefkowitz

Get Your Mac Python From Python.org

One of the most unfortunate things about learning Python is that there are so many different ways to get it installed, and you need to choose one before you even begin. The differences can also be subtle and require technical depth to truly understand, which you don’t have yet.¹ Even experts can be missing information about which one to use and why.

There are perhaps more of these on macOS than on any other platform, and that’s the platform I primarily use these days. If you’re using macOS, I’d like to make it simple for you.

The One You Probably Want: Python.org

My recommendation is to use an official build from python.org.

I recommed the official installer for most uses, and if you were just looking for a choice about which one to use, you can stop reading now. Thanks for your time, and have fun with Python.

If you want to get into the nerdy nuances, read on.

For starters, the official builds are compiled in such a way that they will run on a wide range of macs, both new and old. They are universal2 binaries, unlike some other builds, which means you can distribute them as part of a mac application.

The main advantage that the Python.org build has, though, is very subtle, and not any concrete technical detail. It’s a social, structural issue: the Python.org builds are produced by the people who make CPython, who are more likely to know about the nuances of what options it can be built with, and who are more likely to adopt their own improvements as they are released. Third party builders who are focused on a more niche use-case may not realize that there are build options or environment requirements that could make their Pythons better.

I’m being a bit vague deliberately here, because at any particular moment in time, this may not be an advantage at all. Third party integrators generally catch up to changes, and eventually achieve parity. But for a specific upcoming example, PEP 703 will have extensive build-time implications, and I would trust the python.org team to be keeping pace with all those subtle details immediately as releases happen.

(And Auto-Update It)

The one downside of the official build is that you have to return to the website to check for security updates. Unlike other options described below, there’s no built-in auto-updater for security patches. If you follow the normal process, you still have to click around in a GUI installer to update it once you’ve clicked around on the website to get the file.

I have written a micro-tool to address this and you can pip install mopup and then periodically run mopup and it will install any security updates for your current version of Python, with no interaction besides entering your admin password.

(And Always Use Virtual Environments)

Once you have installed Python from python.org, never pip install anything globally into that Python, even using the --user flag. Always, always use a virtual environment of some kind. In fact, I recommend configuring it so that it is not even possible to do so, by putting this in your ~/.pip/pip.conf:

[global]
require-virtualenv = true

This will avoid damaging your Python installation by polluting it with libraries that you install and then forget about. Any time you need to do something new, you should make a fresh virtual environment, and then you don’t have to worry about library conflicts between different projects that you may work on.

If you need to install tools written in Python, don’t manage those environments directly, install the tools with pipx. By using pipx, you allow each tool to maintain its own set dependencies, which means you don’t need to worry about whether two tools you use have conflicting version requirements, or whether the tools conflict with your own code.²

The Others

There are, of course, several other ways to install Python, which you probably don’t want to use.

The One For Running Other People’s Code, Not Yours: Homebrew

In general, Homebrew Python is not for you.

The purpose of Homebrew’s python is to support applications packaged within Homebrew, which have all been tested against the versions of python libraries also packaged within Homebrew. It may upgrade without warning on just about any brew operation, and you can’t downgrade it without breaking other parts of your install.

Specifically for creating redistributable binaries, Homebrew python is typically compiled only for your specific architecture, and thus will not create binaries that can be used on Intel macs if you have an Apple Silicon machine, or will run slower on Apple Silicon machines if you have an Intel mac. Also, if there are prebuilt wheels which don’t yet exist for Apple Silicon, you cannot easily arch -x86_64 python ... and just install them; you have to install a whole second copy of Homebrew in a different location, which is a headache.

In other words, homebrew is an alternative to pipx, not to Python. For that purpose, it’s fine.

The One For When You Need 20 Different Pythons For Debugging: pyenv

Like Homebrew, pyenv will default to building a single-architecture binary. Even worse, it will not build a Framework build of Python, which means several things related to being a mac app just won’t work properly. Remember those build-time esoterica that the core team is on top of but third parties may not be? “Should I use a Framework build” is an enduring piece of said esoterica.

The purpose of pyenv is to provide a large matrix of different, precise legacy versions of python for library authors to test compatibility against those older Pythons. If you need to do that, particularly if you work on different projects where you may need to install some random super-old version of Python that you would not normally use to test something on, then pyenv is great. But if you only need one version of Python, it’s not a great way to get it.

The Other One That’s Exactly Like pyenv: asdf-python

The issues are exactly the same as with pyenv, as the tool is a straightforward alternative for the exact same purpose. It’s a bit less focused on Python than pyenv, which has pros and cons; it has broader community support, but it’s less specifically tuned for Python. But a comparative exploration of their differences is beyond the scope of this post.

The Built-In One That Isn’t Really Built-In: `/usr/bin/python3`

There is a binary in /usr/bin/python3 which might seem like an appealing option — it comes from Apple, after all! — but it is provided as a developer tool, for running things like build scripts. It isn’t for building applications with.

That binary is not a “system python”; the thing in the operating system itself is only a shim, which will determine if you have development tools, and shell out to a tool that will download the development tools for you if you don’t. There is unfortunately a lot of folk wisdom among older Python programmers who remember a time when apple did actually package an antedeluvian version of the interpreter that seemed to be supported forever, and might suggest it for things intended to be self-contained or have minimal bundled dependencies, but this is exactly the reason that Apple stopped shipping that.

If you use this option, it means that your Python might come from the Xcode Command Line Tools, or the Xcode application, depending on the state of xcode-select in your current environment and the order in which you installed them.

Upgrading Xcode via the app store or a developer.apple.com manual download — or its command-line tools, which are installed separately, and updated via the “settings” application in a completely different workflow — therefore also upgrades your version of Python without an easy way to downgrade, unless you manage multiple Xcode installs. Which, at 12G per install, is probably not an appealing option.³

The One With The Data And The Science: Conda

As someone with a limited understanding of data science and scientific computing, I’m not really qualified to go into the detailed pros and cons here, but luckily, Itamar Turner-Trauring is, and he did.

My one coda to his detailed exploration here is that while there are good reasons to want to use Anaconda — particularly if you are managing a data-science workload across multiple platforms and you want a consistent, holistic development experience across a large team supporting heterogenous platforms — some people will tell you that you need Conda to get you your libraries if you want to do data science or numerical work with Python at all, because Conda is how you install those libraries, and otherwise things just won’t work.

This is a historical artifact that is no longer true. Over the last decade, Python Wheels have been comprehensively adopted across the Python community, and almost every popular library with an extension module ships pre-built binaries to multiple platforms. There may be some libraries that only have prebuilt binaries for conda, but they are sufficiently specialized that I don’t know what they are.

The One for Being Consistent With Your Cloud Hosting

Another way to run Python on macOS is to not run it on macOS, but to get another computer inside your computer that isn’t running macOS, and instead run Python inside that, usually using Docker.⁴

There are good reasons to want to use a containerized configuration for development, but they start to drift away from the point of this post and into more complicated stuff about how to get your Python into the cloud.

So rather than saying “use Python.org native Python instead of Docker”, I am specifically not covering Docker as a replacement for a native mac Python here because in a lot of cases, it can’t be one. Many tools require native mac facilities like displaying GUIs or scripting applications, or want to be able to take a path name to a file without elaborate pre-work to allow the program to access it.

Summary

If you didn’t want to read all of that, here’s the summary.

If you use a mac:

Get your Python interpreter from python.org.
Update it with mopup so you don’t fall behind on security updates.
Always use venvs for specific projects, never pip install anything directly.
Use pipx to manage your Python applications so you don’t have to worry about dependency conflicts.
Don’t worry if Homebrew also installs a python executable, but don’t use it for your own stuff.
You might need a different Python interpreter if you have any specialized requirements, but you’ll probably know if you do.

Acknowledgements

If somebody sent you this article because you’re trying to get into Python and you got stuck on this point, let me first reassure you that all the information about this really is highly complex and confusing; if you’re feeling overwhelmed, that’s normal. But the good news is that you can really ignore most of it. Just read the next little bit. ↩
Some tools need to be installed in the same environment as the code they’re operating on, so you may want to have multiple installs of, for example, Mypy, PuDB, or sphinx. But for things that just do something useful but don’t need to load your code — such as this small selection of examples from my own collection: certbot, pgcli, asciinema, gister, speedtest-cli — pipx means you won’t have to debug wonky dependency interactions. ↩
The command-line tools are a lot smaller, but cannot have multiple versions installed at once, and are updated through a different mechanism. There are odd little details like the fact that the default bundle identifier for the framework differs, being either org.python.python or com.apple.python3. They’re generally different in a bunch of small subtle ways that don’t really matter in 95% of cases until they suddenly matter a lot in that last 5%. ↩
Or minikube, or podman, or colima or whatever I guess, there’s way too many of these containerization Pokémon running around for me to keep track of them all these days. ↩

by Glyph at August 29, 2023 08:17 PM

July 20, 2023

Glyph Lefkowitz

Bilithification

Several years ago at O’Reilly’s Software Architecture conference, within a comprehensive talk on refactoring “Technical Debt: A Masterclass”, r0ml¹ presented a concept that I think should be highlighted.

If you have access to O’Reilly Safari, I think the video is available there, or you can get the slides here. It’s well worth watching in its own right. The talk contains a lot of hard-won wisdom from a decades-long career, but in slides 75-87, he articulates a concept that I believe resolves the perennial pendulum-swing between microservices and monoliths that we see in the Software as a Service world.

I will refer to this concept as “the bilithification strategy”.

Background

Personally, I have long been a microservice skeptic. I would generally articulate this skepticism in terms of “YAGNI”.

Here’s the way I would advise people asking about microservices before encountering this concept:

Microservices are often adopted by small teams due to their advertised benefits. Advocates from very large organizations—ones that have been very successful with microservices—frequently give talks claiming that microservices are more modular, more scalable, and more fault-tolerant than their monolithic progenitors. But these teams rarely appreciate the costs, particularly the costs for smaller orgs. Specifically, there is a fixed operational marginal cost to each new service, and a fairly large fixed operational overhead to the infrastructure for an organization deploying microservices in at all.

With a large enough team, the operational cost is easy to absorb. As the overhead is fixed, it trends towards zero as your total team size and system complexity trend towards infinity. Also, in very large teams, the enforced isolation of components in separate services reduces complexity. It does so specifically intentionally causing the software architecture to mirror the organizational structure of the team that deploys it. This — at the cost of increased operational overhead and decreased efficiency — allows independent parts of the organization to make progress independently, without blocking on each other. Therefore, in smaller teams, as you’re building, you should bias towards building a monolith until the complexity costs of the monolith become apparent. Then you should build the infrastructure to switch to microservices.

I still stand by all of this. However, it’s incomplete.

What does it mean to “switch to microservices”?

The biggest thing that this advice leaves out is a clear understanding of the “micro” in “microservice”. In this framing, I’m implicitly understanding “micro” services to be services that are too small — or at least, too small for your team. But if they do work for large organizations, then at some point, you need to have them. This leaves open several questions:

What size is the right size for a service?
When should you split your monolith up into smaller services?
Wait, how do you even measure “size” of a service? Lines of code? Gigabytes of memory? Number of team members?

In a specific situation I could probably look at these questions for that situation, and make suggestions as to the appropriate course of action, but that’s based largely on vibes. There’s just a lot of drawing on complex experiences, not a repeatable pattern that a team could apply on their own.

We can be clear that you should always start with a monolith. But what should you do when that’s no longer working? How do you even tell when it’s no longer working?

Bilithification

Every codebase begins as a monolith. That is a single (mono) rock (lith). Here’s what it looks like.

a circle with the word “monolith” on it

Let’s say that the monolith, and the attendant team, is getting big enough that we’re beginning to consider microservices. We might now ask, “what is the appropriate number of services to split the monolith into?” and that could provoke endless debate even among a team with total consensus that it might need to be split into some number of services.

Rather than beginning with the premise that there is a correct number, we may observe instead that splitting the service into N services where N is more than one may be accomplished splitting the service in half N-1 times.

So let’s bi (two) lithify (rock) this monolith, and take it from 1 to 2 rocks.

The task of splitting the service into two parts ought to be a manageable amount of work — two is a definitively finite number, as composed to the infinite point-cloud of “microservices”. Thus, we should search, first, for a single logical seam along which we might cleave the monolith.

a circle with the word “monolith” on it and a line through it

In many cases—as in the specific case that r0ml gave—the easiest way to articulate a boundary between two parts of a piece of software is to conceptualize a “frontend” and a “backend”. In the absence of any other clear boundary, the question “does this functionality belong in the front end or the back end” can serve as a simple razor for separating the service.

Remember: the reason we’re splitting this software up is because we are also splitting the team up. You need to think about this in terms of people as well as in terms of functionality. What division between halves would most reduce the number of lines of communication, to reduce the quadratic increase in required communication relationships that comes along with the linear increase in team size? Can you identify two groups who need to talk amongst themselves, but do not need to talk with all of each other?²

two circles with the word “hemilith” on them and a double-headed arrow between them

Once you’ve achieved this separation, we no longer have a single rock, we have two half-rocks: hemiliths to borrow from the same Greek root that gave us “monolith”.

But we are not finished, of course. Two may not be the correct number of services to end up with. Now, we ask: can we split the frontend into a frontend and backend? Can we split the backend? If so, then we now have four rocks in place of our original one:

four circles with the word “tetartolith” on them and double-headed arrows connecting them all

You might think that this would be a “tetralith” for “four”, but as they are of a set, they are more properly a tetartolith.

Repeat As Necessary

At some point, you’ll hit a point where you’re looking at a service and asking “what are the two pieces I could split this service into?”, and the answer will be “none, it makes sense as a single piece”. At that point, you will know that you’ve achieved services of the correct size.

One thing about this insight that may disappoint some engineers is the realization that service-oriented architecture is much more an engineering management tool than it is an engineering tool. It’s fun to think that “microservices” will let you play around with weird technologies and niche programming languages consequence-free at your day job because those can all be “separate services”, but that was always a fantasy. Integrating multiple technologies is expensive, and introducing more moving parts always introduces more failure points.

Advanced Techniques: A Multi-Stack Microservice Environment

You’ll note that splitting a service heavily implies that the resulting services will still all be in the same programming language and the same tech stack as before. If you’re interested in deploying multiple stacks (languages, frameworks, libraries), you can proceed to that outcome via bilithification, but it is a multi-step process.

First, you have to complete the strategy that I’ve already outlined above. You need to get to a service that is sufficiently granular that it is atomic; you don’t want to split it up any further.

Let’s call that service “X”.

Second, you identify the additional complexity that would be introduced by using a different tech stack. It’s important to be realistic here! New technology always seems fun, and if you’re investigating this, you’re probably predisposed to think it would be an improvement. So identify your costs first and make sure you have them enumerated before you move on to the benefits.

Third, identify the concrete benefits to X’s problem domain that the new tech stack would provide.

Finally, do a cost-benefit analysis where you make sure that the costs from step 2 are clearly exceeded by the benefits from step three. If you can’t readily identify that in advance – sometimes experimentation is required — then you need to treat this as an experiment, rather than as a strategic direction, until you’ve had a chance to answer whatever questions you have about the new technology’s benefits benefits.

Note, also, that this cost-benefit analysis requires not only doing the technical analysis but getting buy-in from the entire team tasked with maintaining that component.

Conclusion

To summarize:

Always start with a monolith.
When the monolith is too big, both in terms of team and of codebase, split the monolith in half until it doesn’t make sense to split it in half any more.
(Optional) Carefully evaluate services that want to adopt new technologies, and keep the costs of doing that in mind.

There is, of course, a world of complexity beyond this associated with managing the cost of a service-oriented architecture and solving specific technical problems that arise from that architecture.

If you remember the tetartolith, though, you should at least be able to get to the right number and size of services for your system.

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from more specificity on the sort of insight you've seen here.

AKA “my father” ↩
Denouncing “silos” within organizations is so common that it’s a tired trope at this point. There is no shortage of vaguely inspirational articles across the business trade-rag web and on LinkedIn exhorting us to “break down silos”, but silos are the point of having an organization. If everybody needs to talk to everybody else in your entire organization, if no silos exist to prevent the necessity of that communication, then you are definitionally operating that organization at minimal efficiency. What people actually want when they talk about “breaking down silos” is a re-org into a functional hierarchy rather than a role-oriented hierarchy (i.e., “this is the team that makes and markets the Foo product, this is the team that makes and markets the Bar product” as opposed to “this is the sales team”, “this is the engineering team”). But that’s a separate post, probably. ↩

by Glyph at July 20, 2023 08:59 PM

Planet Twisted

August 15, 2025

Perpetually Proffering Permuted Prompts

Income from Investment in Inference

Figuring out the Fraction is Frustrating

Challenges to Computing Checking Costs

Margins, Money, and Metacognition

Problematically Plausible Presentation

Mallory Mangling Mentorship

But That’s Not All

Problems with the Prediction of P

Surprisingly Small Space for Slip-Ups

Harm to the Humans

Abridging the Artificial Arithmetic (Alliteratively)

Acknowledgments

August 09, 2025

Acknowledgments

August 08, 2025

Ancient History

Ancienter History

Science!

Caveats

The screen is wide, though.

Code Isn’t Prose

What about soft-wrap?

So what’s the optimal line width?

Acknowledgments

June 05, 2025

The Problem

The Anti-Antis

The Aesthetics

The Affordances

The Economics

The Energy Usage

The Educational Impact

The Invasion of Privacy

The Stealing

The Fatigue

The Challenge

The Surrender

Acknowledgments

April 17, 2025

The History

The Problems

The Solutions

Using dataclass attributes to create an __init__ for you

Using classmethod factories to create objects

Using NewType to address object validity

In Summary - The New Best Practice

Acknowledgments

April 01, 2025

A Database File

35 Years Later

Aside: But... are they, though?

Knowing Things

Acknowledgments

January 15, 2025

Acknowledgments

January 10, 2025

But what about...

Lookups?

enum.Flag?

3rd-party usage?

A Case Study

Conclusion

Acknowledgments

December 16, 2024

Acknowledgments

November 11, 2024

A Mission For Progress

Shared Values — A Basis for Hope

Politics Matter

We Can Teach, Not Sell

The Challenge Is Enormous

Good Works

Jobs Jobs Jobs

Take Credit

Let’s You And Me — Yes YOU — Get Started

Acknowledgments

November 03, 2024

Using `dataclass` attributes to create an `init` for you

Using `classmethod` factories to create objects

Using `NewType` to address object validity

`enum.Flag`?

`/etc/` builds