Planet Twisted

August 02, 2020

Glyph Lefkowitz

Lenses

I suffer from ADHD.

photo of a man with his head in his hands Photo by Taylor Young on Unsplash

I want to be clear: when I say I suffer from this disorder, I am making a self-diagnosis. I’ve obliquely referred to suffering from ADHD in previous posts, but rarely at any length. The main reason for my avoidance of the topic is that it still makes me super uncomfortable to write publicly about a “self-diagnosis”, since there’s a tremendous amount of Internet quackery thanks to amateur diagnosticians.

This despite the fact that I’ve known for the past 15 years that I have ADHD.

I am absolutely not trying to set myself up as a maverick unlicensed freelance psychiatrist here. If you think you might have ADHD, or any other ailment, whether mental or physical, call your primary care physician. Don’t email me.

At the same time, for me, this diagnosis is not really ambiguous or in a gray area. This is me looking down and noticing I’ve only got one arm, and diagnosing myself as a one-armed person. I’ve taken numerous ADHD screening questionnaires and reliably scored well into the range of “there is no ambiguity whatsoever, you absolutely have ADHD”, so I feel confident to describe myself as having it.

Terminology aside, this post is about a set of cognitive and metacognitive issues that I have, and some tools that I found useful to remedy them. I think others might find those same tools useful in similar situations. So if you’re also uncomfortable with the inherently unreliable nature of self-diagnosis, or the clinical specificity of the term “ADHD” — and I absolutely don’t blame you if you are — I invite you to read “ADHD” as a shorthand for some character traits that I informally believe fit that label, and not a robust clinical analysis of myself or anyone else.

With that extended disclaimer out of the way, I’ll get started on the post itself; and where better to do that than at the start of my own challenges.

The ‘Laziness’ model

photo of a cat relaxing on a couch Photo by Zosia Korcz on Unsplash

Throughout my childhood, I was labeled an “underachiever”. I performed well on tests and didn’t do homework. I was frequently told by adults — especially my teachers — that I was “brilliant” but “lazy”.

Was I lazy? Is there even such a thing as “laziness”? Here’s a spoiler for you — “no”1 — but I didn’t know that at the time. All I knew was that I couldn’t seem to do certain things — boring things: homework, long division, and cleaning up my room, for a few examples. I couldn’t seem to do the things that my peers found routine and trivial.

This is a common enough experience that it shows up clearly even in systematic reviews and meta-analyses of adult sufferers of ADHD. Everybody tells you you’re lazy, and so you believe it. It sure looks like laziness from the outside!

In retrospect, that’s the interesting problem with this false diagnosis: “from the outside”. Assuming for the moment that laziness does in fact exist and is a salient character flaw, what would the experience of the interiority of such laziness actually feel like?

It seems unlikely that it would feel like I what I actually felt at the time:

  1. Frequently, suddenly remembering, in contexts where it wouldn’t help — walking to school, in an unrelated class, while walking to work — that I had to Do The Thing.

  2. Anxiously, yearningly, often desperately wishing I could Do The Thing.

  3. Trying to Do The Thing at the responsible time, finding that my mind would wander and I would lose several hours of time... sitting for hours, literally bored to tears, while I attempted and failed to Do The Thing.

  4. At long last, finally managing to start. Once I was truly exhausted and starting to panic, I’d drink a gallon of heavily-caffeinated and very sugary soda at 2 in the morning and finally finally find that I suddenly had the ability to Do The Thing, and white-knuckle my way through an all-nighter to finish The Thing. (This step was more common after I got to my late teens; before that, The Thing just wouldn’t get Done.)

Sitting up night after night destroying my mental and physical health, depriving myself of sleep, focusing with every ounce of my will on tasks that I absolutely hated doing but was forcing myself to complete at all costs: it doesn’t seem to line up with the popular conception of what “laziness” might be like! Yet, I absolutely believed that I was lazy. If I were not lazy, surely Doing The Thing wouldn’t be so difficult!

I took pains at the start of this post to point out that mental health diagnosis is usually best left to professionals. I think that at this point in the story I should emphasize that “I’m lazy” is also itself a self-diagnosis, and — at least in every case where I’ve ever heard it used — a much worse one than “I have ADHD”.

If you are not a licensed psychologist or psychiatrist, any time you decide with certainty that someone (even yourself!) has an intrinsic, persistent character flaw, you’re effectively diagnosing them. If you decide that they’re inherently lazy, or selfish, or arrogant, you’re effectively diagnosing them with a sort of personality disorder of your own invention.

So, although I didn’t see it at the time, laziness didn’t seem to describe me terribly well. What description fits better?

The ‘Attention Deficit’ model

photo of a squirrel in a grass field Photo by Tom Bradley on Unsplash

In my late 20s, my Uncle Joel gave me a gift that changed my life: the book “Driven to Distraction: Recognizing and Coping with Attention Deficit Disorder”2 by Edward M. Hallowell. The life-changing aspect of this book was not so much that it showed that there were other people “like me”, or that my problem had a name, but that it gave me a different, and more accurately predictive, model to understand my own behavior.

In other words, it allowed me to see — for the first time — that the scarcest resource limiting my efficacy wasn’t the will to do the work, but rather the ability to focus. With this enhanced understanding, I could select a more effective strategy for dealing with the problem.

I did select such a strategy! It worked very well — albeit with some caveats. I’ll get to those in a moment.

Although my limiting factor was the ability to pay attention, the problem that prevented me from recognizing this was one of metacognition — the way I was thinking about how I think.

My early model of my own mind was that I was a lazy person who just needed to do what I had assumed everyone else must be doing: forcing myself to do the tasks that I was having trouble completing. If I really wanted to get them done, then what possible other reason could there be for me to not do them?

The ‘laziness’ model didn’t generate particularly good predictions. For any given project at school, it would predict that I would not try very hard to do it, since the very dictionary definition of ‘lazy’ is “unwilling to work or use energy”. The observed behavior, by contrast, was constant, panicked, intense (albeit failed, or at least highly inefficient) uses of significant amounts of energy.

The main reason to have a model of a thing is to make predictions about that thing. If the predictions that a model gives you are consistently wrong, then the model isn’t directly useful. At that point, it’s time to discard it and find a better one. At the very least, it’s time to revise the model in question until it starts giving you more accurate, actionable information.

The ‘laziness’ model is wrong, but worse than that, it’s harmful. What it routinely predicts, regardless of context, is that I need more negative self-talk, more ‘motivation’ in the form of vicious self-criticism, more forcing myself to “just do it”. All of these things, particularly when performed habitually, cause real, significant harm.

If I gave myself the most negative self-talk I could muster, the most vicious criticism, and really put Maximum Effort into forcing myself to do the thing I wanted done... if it didn’t work, of course that just meant that I needed to engage in even more self-abuse! I could always try harder!

This is the worst way that a model can be inaccurate: an unfalsifiable, self-reinforcing prediction. I could never demonstrate to myself that I’d really been as unkind to myself as was possible; there was always room for escalation. Psychologically, it’s also the worst kind of behavioral advice, which is the kind that generates a self-reinforcing negative feedback loop.


Once I started putting my newfound knowledge into practice, the difference between interventions predicated on an understanding of the problem as “lack of usable attention span” and those based on “lack of willpower” was night and day. I stopped trying to white-knuckle my way through all of my challenges and developed non-judgmental ways to remind myself to do things.

I knew that I, personally, was never going to spontaneously remember to do things at the right time, so I developed ways of letting computers remind me. I knew that I’d never be able to stick with routine, repetitive tasks, so I made a unified list of all the tedious administrative tasks I need to perform. I can’t keep important dates and times in mind, so I rely completely upon my calendar.

Even given these successes, “it worked!” is a colossal oversimplification. Today, it’s about 15 years later, and I’m still sifting through the psychological rubble wrought by the destructive, maladaptive coping mechanisms that I just described, and still trying to find better ways to remain effective when I’m feeling distracted... which is most of the time.

Simply having a better model at the coarsest level is just the first step. Instantiating that model in a working, fleshed out technological system is a ton of work in its own right.3 But it’s work that starts having little successes, which is a lot easier to build on and maintain momentum with than the same failure repeated day after day.

Given that I was starting — nearly from scratch — at 25, and had a lifetime worth of bad habits to unlearn, constructing a workable system that addressed my personal organizational needs still took the better part of a decade.

So as I move into the next, slightly more prescriptive section here, I don’t want to give anybody the idea that I think this is easy.

Don’t give up!

Listen up, Simon. Don’t believe in yourself. Believe in me! Believe in the Kamina who believes in you!

Kamina, Episode 1,
Tengen Toppa Gurren Lagann

At the start of this post, I specifically mentioned that I hadn’t wanted to write at length about ADHD due to my discomfort with self-diagnosis. So, you might be wondering: what was it that overcame this resistance and prompted me to finally write about my own experiences with ADHD?

The original inspiration was a pattern of complaints about suffering from ADHD I see periodically — mainly on Twitter — that look roughly like this:

  • “ADHD means never being on time for a meeting and having no excuse, forever.”
  • “It’s great to have ADHD and never be able to complete a routine task. Sigh.”
  • “I can’t take out the trash and my roommates just can’t understand that this is just part of who I am and I will never get better.”
  • “Why can’t neurotypicals understand that I’m just never going to “get stuff done” like they can. It’s exhausting.”

These are paraphrased and anonymized on purpose; I really don’t want to direct any negative attention towards someone specific, particularly someone just venting about struggles.

Of course, no blog post in mid-2020 would be complete without some reference to the ... situation. The original inspiration for this post predates the dawn of the new hell-world we all now inhabit, but, to say the least, COVID-194 has presented some new challenges to the coping mechanisms I’m writing about here. (Still, I know that I’m considerably better off than the average American in this mess.)

The message I’m trying to get across here is hopeful — others suffering with executive-function deficits similar to mine might be able to do what I did and fix a lot of their problems with this one weird trick! — and the constant drumbeat of despair all around us right now makes that sort of message feel more urgent.

Posts like the ones I described above seem to represent a recurring pattern of despair, and they make me sad. Not because I can’t identify with them; I have absolutely had these feelings. Not even because they’re wrong, exactly: it really is harder for folks with ADHD to handle some of these situations, and the struggle really is lifelong.

They make me sad because they’re expressing a fatalistic perspective; a fixed mindset5 that precludes any hope of future improvement. The through line that I have seen from all of these posts is a familiar, specific kind of despair; a thought I’ve had myself multiple times:

When somebody that I care about asks me, ‘Can you do the dishes later?’, I want to say ‘yes’ and have them believe me. I want to be able to believe myself, and I don’t think I will ever be able to.

Unlike myself when I was younger, the authors of these posts already have a name for their problem: ADHD. Sometimes they’ve even tried some amount of therapy or even medication.

Even so, they’re still buying in to the maladaptive strategy of “just try harder”. Since they already know that ADHD is, at least in part, a structural brain difference, they despair of ever being able to actually do that though, which leaves “giving up” as the only viable strategy.

Don’t give up! I believe in you!

Different problems, different tools

I have had another lifelong problem since when I was young: I am severely nearsighted. Yet, I never developed any psychological hangups around that; nobody ever told me that I needed to buckle down and just squint harder. This problem was socially quite well understood, so… I got glasses. Then I could see, as long as I consistently used those glasses.

Nobody ever expected me to be able to see without glasses.

photo of a pair of eyeglasses resting on a book Photo by NordWood Themes on Unsplash

Calendars, to-do lists, and systems like Getting Things Done are the corrective lenses for the ADHD brain.6.

If a to-do list is a corrective lens for ADHD, one of the major issues around understanding how to use it is that the mass-market literature around to-do lists assumes a certain level of neurotypicality. Assistive devices may frequently be useful to non-disabled people, but their relationship to and use of such affordances is very different.

Through the Looking-Glass ...

Let’s stretch this lens metaphor into absurdity.

In our metaphorical world, ADHD is myopia, and so most — or at least many — folks are “sight-typical”. Productivity systems are our “lenses”.

If nearsightedness were as poorly understood as ADHD, and you were nearsighted, you wouldn’t be able to pop on down to Lenscrafters and pick up a pair of spectacles. You might realize that the problem was with your eyes, and think, “lenses might help me see farther”. Many kinds of lenses might be commercially available in such a world! Lenses for telescopes, cameras, microscopes...

The way that someone with 20/20 vision might use a lens to see farther is to use a telescope to see something really far away. But you, my hypothetically-nearsighted friend, don’t need a powerful zoom lens to take surveillance photographs from a helicopter. Even if you could make such lenses work to correct your vision, you wouldn’t want to carry a pair of 2-kilogram DSLR zoom lenses everywhere you go. You want eyeglasses, which are something different.

photo of a picture of a giraffe with DSLR lenses over its eyes Photo by James Bold on Unsplash

The lenses in eyeglasses are — while operating on fundamentally the same principles of optics as the lenses in a telescope or a microscope — constructed and packaged in a completely different way. But most importantly, the way you use them is to wear them every day, not to deploy them on special occasions in the rare event where you need to do something extreme, but all the time, every day, in the same way.

... and What I Found There

photo of a to-do list written in a notebook Photo by Glenn Carstens-Peters on Unsplash

A person with nominal executive function might use the occasional free-floating to-do list to track a big, complex project with a lot of small interrelated tasks. Most folks in the modern information-driven economy routinely need to do projects that are too complex to easily memorize all the required steps. Even doing your own personal taxes has enough steps to require at least a little bit of tracking.

Such a person could make a to-do list for that one project — their telescope, if you will — put it in a place where they’d remember to look at it when they’re working on that project, and then remember to check things off when they’re done.

They could have one to-do list on the fridge for groceries, a note on their phone for stuff to get for their spouse, and a wiki page outlining some tasks at work. They would probably have enough free-floating executive function to remember which list maps to which project and when each project is relevant, and remember to check each one at the appropriate time.

I spent a lot of time trying to make disconnected to-do lists like this work for me. They never have. Even when I’m feeling particularly productive there is a cycle of list-generation, that goes like this:

  1. When I want to work on the project in question, I can’t remember where the to-do list is, but I need to figure out what I need to do again.

  2. So I go and write a new to-do list, spend a bunch of time rewriting the one I’d already written but can’t quickly find. Then I do some work on the project, check off a few things, and put the list away.

  3. Later, I’ll find both lists, both half checked off, and now I waste a bunch of time trying to figure out which one is the right one.

  4. Repeat this process a few times, and now I have a dozen lists. The lists themselves start generating more work than the actual project, because now I am constantly re-making and finding lists, trying to figure out which one is the most up to date.

This is the simplest case, but the real problem happens at a higher level: one of the biggest problems caused by any executive function deficit like ADHD is the difficulty of task initiation.

The more irrelevant distractions I can see while I’m trying to work out what to do next, the harder that decision becomes. And there’s nothing quite so distracting as the detritus of a thousand half-finished to-do lists.

One List To Rule Them All

What I’ve found works for me is a single, primary to-do list that I can obsessively check in with every minute of every day, which subsumes every other list related to every other project in my life.

I’m hardly the only person to have this insight — if you start engaging with the “productivity” noosphere, reading all the books, listening to the podcasts, this is a recurring theme. You don’t just have an ‘app’ or a ‘list’, you have to have a System. It has to be reliable; you have to know you’re going to keep checking it, or it’s worthless for storing your commitments. But unfortunately this is frequently buried under a lot of other technical complexity about the fiddly details of how to set up one system or the other. It’s very easy to miss the forest for the trees.

Having ADHD means that I routinely forget what I’ve decided to do over the course of only a minute or two after I’ve decided to do it. Just this week, I had to remind myself no fewer than three times to write down “buy more olive oil” because I kept remembering that we were running low when I was in the kitchen and by the time I finished washing my hands to put it into my phone I’d already forgotten why I did that and went back to making dinner.

I need to write everything down. I’m not going to remember five or six, or even two or three places to check for what to do next. I need to have one place to check what comes next, and then build the habit of constantly going back to it, both to add new things and to see what needs to be done.

Technology can help. Technology might even be necessary — it is for me.

But if you’re considering trying this out for the first time, be mindful that piles of to-do apps can be just as distracting as piles of paper. The important thing is to clearly, singularly decide on the one place which is the ‘root’ of your task tracking system.

You can even do this with a pen and paper. Carry the same, single notebook with you everywhere, and make it absolutely clear that it is your primary list, which is where you have to put any references to other lists. Some people have a lot more success with something tactile, to engage all the senses.

For me personally, the high-tech portion of this strategy is indispensable. I use a combination of OmniFocus for things that have to be done and Apple’s built-in calendar application for places I have to be at a particular time.7

OmniFocus8 defines the core gameplay loop of my life. Rather than having to cultivate and retain an elaborate series of interlocking habits and rituals to remain functional, I have a single root habit which triggers every other habit.

That habit? Consulting the unified “what should I do next” perspective in OmniFocus. Every time I am even marginally distracted, I check that view again.

Any time I have trouble initiating a task, I start breaking down the top task in that list into smaller and smaller “next physical action”. I don’t even rely on myself to do this; since I know I’ll forget to break things down, I frequently make tasks that look like this:

  • thing I want to do
  • plan the thing I want to do
    • break down the planning task into tiny actions and write them down here
    • break down the task itself into tiny actions and write them down here

To reduce distraction, I routinely close down any windows that are not necessary for whatever I’m currently working on. Particularly, I routinely sweep to get rid of browser tabs, asking (as I would with an email) “does this window represent a task I should do?”. If yes, it goes in the task system, if no, I close it so it won’t distract me further.

To facilitate this clean-up, on every computer that I use, I have a global hot-key set up to turn the thing that I’m looking at — some selected text, an image, an email message, a browser tab, a chat message at work — into a task that I can look at later.

Everything I have to do on a regular basis is in this system as a recurring task; for example:

  • taking out the trash
  • doing the dishes
  • logging in to Jira at work to look for assigned tasks
  • checking my email
  • brushing my teeth

Yes, even basic personal hygiene is in here. Not because I’ll necessarily forget, or that it takes a lot of energy, but I don’t want to waste one iota of brainpower I could be devoting to my current task to worrying about whether I might need to do something else later. If I don’t see ‘brush teeth’ in my “what should I do next” view, then I know, with certainty, that I don’t need to be thinking about tooth-brushing right now.

The “what should I do next” view is available on all of my computers, on my tablet, on my phone, and it even dominates my watch-face; I check it more often than I check the time:

screenshot of an apple watch face displaying a to-do item saying “write ADHD blog post”

No single feature is a hard requirement of my system; I could get along without any one of them in a pinch. However, the way that they combine to constantly reinforce what the next thing I need to do is in any given context, at any given time, means that I need to expend less energy trying to consciously hang on to all the context.

Limitations and Risks

I don’t want to give an overly rosy view of this strategy. Getting a single unified to-do system that works for you is not the same as getting a brain that can remember to do stuff. So here are some caveats:

  1. Implementing and maintaining such a system is never easy. It just takes tasks like ‘making sure I renew my passport before I need to travel’, ‘show up on time for the meeting’ and ‘buy a gift at least a week before the wedding’ from totally impossible to possible to do at least somewhat reliably with a sustainable level of effort. The main thing that I believe is possible for everyone is being able to commit to simple future tasks.
  2. Building enough data about one’s own habits and procrastination triggers also takes time, and to make such a system effective, one needs to do that work as well. (A passive time-tracking tool like Screen Time on your phone or RescueTime on your workstation can be quite illuminating — and surprising.)
  3. The initial wave of relief I felt when I started tracking tasks masked a gradual increase in my general anxiety over time. Checking and re-checking the ‘what to do next’ list can become a bit of an anxious compulsion, a safety behavior that doesn’t always help me plan my day. As one builds the habit of routinely checking the list, it’s important to avoid developing constant anxiety about the list as the only motivation to do so.
  4. Similarly, it is important to learn to under-commit. Not only does one need to avoid putting an unrealistic amount of stuff into the system, everybody (but especially everybody with ADHD!) needs non-trivial chunks of unstructured, unplanned time, where the system will clearly say ‘nothing to do now, just relax’. The “poor self-observation” and “time blindness” symptoms of ADHD ensure that properly estimating things before committing is a constant challenge that never really goes away either.
  5. This strategy definitely won’t be sufficient for some folks. ADHD is a spectrum and there’s no precise mechanism to calibrate where you are on it. Some folks will respond really well to this strategy, some folks will need medication before it helps to a useful degree.

Finishing up (about finishing up)

If you’re suffering from ADHD and despairing that you will never finish a task or be on time to an appointment: you can. It’s possible to do it at least pretty reliably. I believe if you commit to one and only one task tracking system, and consistently use it every single day, all the time, you can commit to tasks and get them done.

If you do it consistently enough, it will eventually become muscle memory, and not something you need to consciously remember to do every day.

It’s still never going to be easy to Do The Thing, even if your digital brain can perfectly remember what The Thing is right now.

At the very least, it was possible for me to learn to trust myself when I say that I will do something in the future, by designing a system around my own limited attention, and if I can do it, I think you can too.


Acknowledgments

This was a big one! I’d like to particularly thank my Uncle Joel, without whom this post (and many of my other achievements) would not be possible for the reasons described above, as well as Moshe Zadka, Amber Brown, Tom Most, and Eevee for extensive feedback on previous drafts of this post.

Additionally, I’d like to thank David Reid for introducing me to many of the tools and techniques that I still use every day, and Cory Benfield, Jonathan Lange, and Hynek Schlawack for many illuminating conversations over the years about the specifics and detailed mechanics of the tools whose use I describe in this post.

Any errors, of course, remain my own.


  1. The broader topic of the nature of “character”, fundamental attribution error and the extent to which the entire concept of a “character flaw” is a cognitive illusion that arises from the expedient but cruel habit of ignoring the context in which someone is making decisions is more than enough fodder for another post, but here are some good articles covering a newly-emerging psychological consensus that laziness as we understand it doesn’t really exist, and that there are always mitigating factors

  2. Paid link. See disclosures

  3. It was 54 years from Einstein figuring out the photoelectric effect in 1905 to the first MOSFET in 1959; and another 45 before we got MicroSD cards out of the theory of quantum mechanics. 

  4. Hello, future archaeologists! If you’re reading this in the far-flung future, as of this writing, shit is just incredibly fucked up right now. Just incredibly, horrifically fucked up. 

  5. Growth Mindset is a useful concept, but it definitely has a lot of problems, and has been a particularly pointed casualty of the replication crisis. This is a pretty good post outlining its remaining utility, even in the face of its relatively small remaining effect size. 

  6. This is to say nothing of medication, which is also quite useful. More or less necessary, in fact, for some of those suffering from ADHD. One thing I want to be very careful to point out is that in this post I’m talking about my own experiences with ADHD here; without a proper diagnosis, I haven’t had the opportunity to try a pharmacological solution, so I can’t comment on its efficacy for me. However, there’s a school of thought that since some people can resolve some of their ADHD problems with non-medicative interventions, therefore all people should refrain from medication. I want to be as clear as possible that I do not endorse this point of view. 

  7. I’m not going to get into what I use for email here, since I’ve written about that before, and you can just go read that. 

  8. Since I know many of my readers are not in the Apple ecosystem, and might be motivated by this post to put some of these ideas into action, there are plenty of cross-platform apps with similar capabilities. You might check out Taskwarrior, Todoist, or Remember The Milk. There’s definitely something out there that can work for you! 

by Glyph at August 02, 2020 08:00 AM

July 24, 2020

Moshe Zadka

The Hardest Logic Puzzle Ever (In Python)

The Labyrinth is a children’s movie. The main character is 16 years old, and solving a logic puzzle that will literally decide if she lives or dies. In fiction, characters are faced with realistic challenges: ones they can solve, even if they have to make an effort.

So, it makes sense that the designer of the eponymous labyrinth did not consult logicians Richard Smullyan, George Boolos (no relation to the inventor of boolean algebra), and John McCarthy (yes, the same person who invented Lisp and suggested that a “2-month, 10-(person) study” would make significant headway in the study of Artificial Intelligence). Those three would suggest that the designer use The Hardest Logic Puzzle Ever.

There are persistent rumors of the movie getting a reboot, or perhaps a sequel. Like any good sequel, the protagonists should face newer and bigger challenges. In the interests of helping the screen writers for the sequel/reboot, here is my explanation of the “Hardest Logic Puzzle Ever”, together with clear code.

from __future__ import annotations

Do you remember movies from your childhood that just did not age that well? You cringe at some tasteless joke, you say “I can’t believe it was acceptable to show this back then”? I think we can agree we want to future-proof the script for a bit.

Using modern-style annotations will make our code much easier to maintain.

import attr

This movie will come out in the 2020s, and we should use 2020-era code to model it. The attrs library is almost always the right solution to implement classes.

import random

The protagonist will face unknown challenges. Randomness is a state of knowledge. From her perspective, the true state of the world is random.

from zope import interface
from typing import Callable, Mapping, Tuple

This is already a hard logic puzzle, no reason to make it harder on ourselves. Clear interface and type hints will make the logic easier to understand.

This is why zope.interface is appropriate for The Hardest Logic Puzzle Ever. We also need the Callable interface, since our protagonist will be asking the Gods questions in the form of functions, and Mapping, since she will eventually need to answer the question “which God is which”.

The Tuple type will most be used by the careful protagonist in her internal type hints. This is high stakes code, and she wants to make sure she gets it right.

In the HLPE (Hardest Logic Puzzle Ever), the ones who answer the questions are Gods. Just like in the Marvel Cinematic Universe, they might not be literal “Gods”, but clearly powerful aliens. As aliens, they have their own language. The words for “yes” and “no” are “da” and “ja” – but we do not know which is which.

I suggest that this will not be revealed in the script. This is good fodder for endless fan discussions later on Reddit. As such, the best way is to make sure we do not know the answer ourselves: make it a random language!

import enum

@enum.unique
class GodWords(enum.Enum):
    ja = "ja"
    da = "da"

def make_god_language() -> Mapping[bool, GodWords]:
    words = list(GodWords)
    random.shuffle(words)
    ret = {}
    return {True: words.pop(), False: words.pop()}

Shuffling the words for “yes” and “no” means that the .pop() call will get a random one. Now we have the language: it maps an abstract concept (a Python Boolean) to a string.

In the HLPE, there are three Gods, called “A”, “B”, and “C”. They can be asked any question that refers to the Gods.

@enum.unique
class GodNames(enum.Enum):
    A = "A"
    B = "B"
    C = "C"

class IGod(interface.Interface):
    def ask(question: Question) -> GodWords:
        """Ask"""

But what questions is the protagonist allowed to ask? Questions are something the audience-identification protagonist will ask. For this, a Protocol is appropriate. (Also, those are more convenient for describing functions, which we do not want to annotate with explicit implementation declarations.)

from typing_extensions import Protocol

class Question(Protocol):
    def __call__(self, gods: Dict[GodName, IGod]) -> bool:
        pass

Because of how annotations work, at this point we need not know what GodName or IGod are.

The simplest of the Gods is the one who speaks always truly (in the God language).

@interface.implementer(IGod)
@attr.s(auto_attribs=True, frozen=True)
class TrueGod:
    _gods: Mapping[GodNames, IGod]
    _language: Mapping[bool, GodWords]

    def ask(self, question: Question) -> GodWords:
        return self._language[bool(question(self._gods))]

The next God always lies.

@attr.s(auto_attribs=True, frozen=True)
class FalseGod:
    _gods: Mapping[GodNames, IGod]
    _language: Mapping[bool, GodWords]

    def ask(self, question: Question) -> GodWords:
        return self._language[not bool(question(self._gods))]

But how to implement Random? This is a harder question than it seems.

The original statement just said “whether Random speaks truly or falsely is a completely random matter”. What does that mean? If you ask it two questions, can it answer truthfully to one and lie to the other?

Boolos wrote a “clarification”: “Whether Random speaks truly or not should be thought of as depending on the flip of a coin hidden in (their) brain: if the coin comes down heads, (they) speaks truly; if tails, falsely.” This clarification fails to elucidate much: it does not answer, for example, the question above.

Finally, based on the suggested solution, and assuming that the obvious simpler solutions do not work, Raben and Raben suggested suggested the clear guideline: “Whether Random says ja or da should be thought of as depending on the flip of a coin hidden in his brain: if the coin comes down heads, he says ja; if tails, he says da.”

This is the guideline I have chosen to implement here.

@attr.s(auto_attribs=True, frozen=True)
class RandomGod:
    _gods: Mapping[GodNames, IGod]
    _language: Mapping[bool, GodWords]

    def ask(self, question: Question) -> GodWords:
        return self._language[random.choice([True, False])]

So much back and forth discussion, over two millenia, for such simple code.

I’m sure the God-like aliens would have used code to describe the puzzle from the beginning, avoiding the messiness of natural language.

Accessing the God classes themselves would be the height of hubris, but the goal of the protagonist is to answer “which God is which”. We will build a special enumeration of the Gods’ identities, to be used in the solution.

GodIdentities = enum.Enum('God Identities',
                          {klass.__name__: klass.__name__
                           for klass in [TrueGod, FalseGod, RandomGod]})

With the Gods’ personalities implemented, we can write the code that creates a random world that complies with the terms of the puzzle. Three Gods, known as “A”, “B”, and “C”, assigned to the names randomly, and speaking in a randomly generated language.

def make_gods() -> Mapping[GodNames, IGod]:
    language = make_god_language()
    gods = {}
    god_list = [klass(gods=gods, language=language)
                for klass in [TrueGod, FalseGod, RandomGod]]
    random.shuffle(god_list)
    for name in GodNames:
        gods[name] = god_list.pop()
    return gods

The code so far corresponds to the first part of the scene: where the protagonist comes to the place of the Gods, learns the local rules, and realizes that she must either solve the puzzle correctly, or fail.

This time she is granted the chance to ask three questions. However, there are many more unknowns: there are 6 possible assignments for the Gods, and two possible meanings for the language. Frankly, I doubt that I would be able to solve the puzzle on my feet, when I am afraid for my life.

But luckily, I have Wikipedia and time, and so I have written up the code to find a solution here.

Part of the solution is to ask a God a question about themselves: “if I asked you SOME QUESTION, would you say ja”. While no question asked of Random is useful (the answer is random) for either True or False, this question would result in “ja” the right answer is “yes”, and “da* if the right answer is”no". This means that wrapping questions like this means we have to care neither the identity of the God, nor about the details of the God language.

This makes it a useful abstraction!

def if_asked_a_question_would_you_say_ja_is_ja(
        god_name: GodNames,
        gods: Mapping[GodNames, IGod],
        question: Callable[[IGod, Mapping[GodNames, IGod]], bool],
    ) -> bool:
    you = gods[god_name]
    def add_you(gods):
        return question(you, gods)
    return you.ask(lambda gods: you.ask(add_you) == GodWords.ja) == GodWords.ja

This would be a wonderful chance for a flash-“forward” into a hypothetical scene: the movie goes into black-and-white, and the protagonist voice-overs: if “A” is “True”, what would happen if I ask them “is 1 equal 1”? What would happen if I ask them “if I asked you, ‘is 1 equal 1?’, would you answer ’ja? What if”A" is “False”?

def hypothetical():
    language = make_god_language()
    gods = {}
    hypothetical_true_god = TrueGod(gods=gods, language=language)
    hypothetical_false_god = FalseGod(gods=gods, language=language)
    gods[GodNames.A] = hypothetical_true_god
    gods[GodNames.B] = hypothetical_false_god
    for (name, identity) in [(GodNames.A, GodIdentities.TrueGod), (GodNames.B, GodIdentities.FalseGod)]:
        for question in [lambda x,y: 1==1, lambda x, y: 1==0]:
            objective_value = question(None, None)
            print(
                f"{identity}, asked {objective_value} question:",
                if_asked_a_question_would_you_say_ja_is_ja(name, gods, question)
            )
hypothetical()
del hypothetical
God Identities.TrueGod, asked True question: True
God Identities.TrueGod, asked False question: False
God Identities.FalseGod, asked True question: True
God Identities.FalseGod, asked False question: False

Movie goes back to color. A smile spreads on the protagonist’s face. She knows how to solve this.

The next challenge is to find a God that is not Random. If you ask God B about whether A is Random, then if the answer is “ja”, then it means either the correct answer is that A is Random or it means B is Random. Either way, C is not Random.

For similar reasons, if B answers “da”, then “A” is not Random.

Either way, we have found a non-Random God. Now we know that we can find the truth from them by using the wrapper!

So we ask them whether they’re True. Now we ask them about whether “B” is Random.

So we know:

  • One God who is not Random (who we will call “the interlocutor”, since we’ll spend the rest of the conversation with them)
  • Whether the interlocutor is True
  • Whether B is Random
def ask_questions(gods: Mapping[GodNames, IGod]) -> Tuple[GodNames, bool, bool]:
    is_a_random_according_to_b = if_asked_a_question_would_you_say_ja_is_ja(
        GodNames.B,
        gods,
        lambda you, gods: isinstance(gods[GodNames.A], RandomGod)
    )
    interlocutor = GodNames.C if is_a_random_according_to_b else GodNames.A
    is_interlocutor_true = if_asked_a_question_would_you_say_ja_is_ja(
        interlocutor,
        gods,
        lambda you, gods: isinstance(you, TrueGod)
    )
    is_b_random = if_asked_a_question_would_you_say_ja_is_ja(
        interlocutor,
        gods,
        lambda you, gods: isinstance(gods[GodNames.B], RandomGod)
    )
    return interlocutor, is_interlocutor_true, is_b_random

Once again, the movie goes into a black and white flash-forward as the protagonist plans her move.

def hypothetical():
    for experiment in range(1, 5):
        print("Experiment", experiment)
        gods = make_gods()
        interlocutor, is_interlocutor_true, is_b_random = ask_questions(gods)
        print(f"I think {interlocutor} is {is_interlocutor_true}God. They're {type(gods[interlocutor]).__name__}.")
        print(f"B is {'' if is_b_random else 'not '}RandomGod. They're {type(gods[GodNames.B]).__name__}.")
hypothetical()
del hypothetical
Experiment 1
I think GodNames.C is FalseGod. They're FalseGod.
B is not RandomGod. They're TrueGod.
Experiment 2
I think GodNames.A is FalseGod. They're FalseGod.
B is not RandomGod. They're TrueGod.
Experiment 3
I think GodNames.A is TrueGod. They're TrueGod.
B is RandomGod. They're RandomGod.
Experiment 4
I think GodNames.A is FalseGod. They're FalseGod.
B is not RandomGod. They're TrueGod.

Now that we know the answers, it is time to put them all together. We first record whether the interlocutor is True or False. If B is Random, we mark them as such. If not, we know that neither the interlocutor or B is Random, so the other God must be Random.

Now that we know two Gods’ identities, the last name that remains belongs to the last possible identity.

def analyze_answers(interlocutor: GodNames, is_interlocutor_true: bool, is_b_random: bool):
    solution = {}
    solution[interlocutor] = (GodIdentities.TrueGod if is_interlocutor_true
                              else GodIdentities.FalseGod)
    [other_god] = set(GodNames) - set([interlocutor, GodNames.B])
    random_god = GodNames.B if is_b_random else other_god
    solution[random_god] = GodIdentities.RandomGod
    [last_god] = set(GodNames) - set(solution.keys())
    [last_god_value] = set(GodIdentities) - set(solution.values())
    solution[last_god] = last_god_value
    return solution

Putting the questioning and the analysis together is straightforward.

def find_solution(gods: Mapping[GodNames, IGod]) -> Mapping[GodNames, GodIdentities]:
    interlocutor, is_interlocutor_true, is_b_random = ask_questions(gods)
    return analyze_answers(interlocutor, is_interlocutor_true, is_b_random)

Our little checker returns both a description of the situation, as well as whether the solution was correct.

def check_solution() -> Tuple[Mapping[str, str], Mapping[str, str], bool]:
    gods = make_gods()
    solution = find_solution(gods)
    reality = sorted([(name.value, type(god).__name__) for name, god in gods.items()])
    deduced = sorted([(name.value, identity.value) for name, identity in solution.items()])
    return reality, deduced, reality == deduced

The last step is to test our solution multiple times. Again, remember that there are 6 * 2 = 12 possible situations. However, if “B” is Random, we can get two different answers. This means the total number of options for a path is more than 12, but less than 24.

If we run the solution for a 1000 times, the probability that a given path will not be taken is less than (23/24)**1000 * 24. How much is it?

(23/24)**1000 * 24
7.885069901070409e-18

Good enough!

Now we can see if the script works. We lack Hollywood’s professional script doctors, but we do have a powerful Python interpreter.

for i in range(1000):
    reality, deduced, correct = check_solution()
    if not correct:
        raise ValueError("Solution is incorrect", reality, deduced)
    if i % 200 == 0:
        print("Solved correctly for", reality)
Solved correctly for [('A', 'RandomGod'), ('B', 'TrueGod'), ('C', 'FalseGod')]
Solved correctly for [('A', 'RandomGod'), ('B', 'FalseGod'), ('C', 'TrueGod')]
Solved correctly for [('A', 'RandomGod'), ('B', 'TrueGod'), ('C', 'FalseGod')]
Solved correctly for [('A', 'TrueGod'), ('B', 'FalseGod'), ('C', 'RandomGod')]
Solved correctly for [('A', 'TrueGod'), ('B', 'RandomGod'), ('C', 'FalseGod')]

We printed out every 200th situation, to have some nice output!

Python confirms it. Hollywood should buy our script.

Thanks to Glyph Lefkowitz for his feedback on the Labyrinth post, some of which inspired changes in this post. Thanks to Mark Williams for his feedback on an early draft. Any mistakes or issues that remain are my responsibility.

by Moshe Zadka at July 24, 2020 04:00 PM

July 22, 2020

Glyph Lefkowitz

I Want A New Duck

Get it?
Quack quack quack quack
Quack quack quack quack

Weird Al Yancovic,
I Want A New Duck

Mypy makes most things better

Mypy is a static type checker for Python. If you’re not already familiar, you should check it out; it’s rapidly becoming a standard for Python projects. All the cool kids are doing it. With Mypy, you get all the benefits of high-level dynamic typing for rapid experimentation, and all the benefits of rigorous type checking to complement your tests and improve reliability.1 The best of both worlds!

Mypy can change how you write Python code. In most cases, this is for the better. For example, I have opined on numerous occasions about how bad None is. But this can significantly change with Mypy. Now, when you return None, you can say -> Optional[str] and rest assured that all your callers will be quickly, statically checked for places where they might encounter an AttributeError on that None, which makes this a more appealing option than risking raising a runtime exception (which Mypy can’t check).

But sometimes, things can get worse

But in some cases, as you add type annotations, they can make your code more brittle, especially if you annotate with the most initially obvious types. Which is to say, you define some custom classes, and then say that the parameters to and return values from your functions and methods are simply instances of those classes you just defined.

Most Mypy tutorials give you a bunch of examples with str, int, List[int], maybe an Optional[float] or two, and then leave you to your own devices when it comes to defining your own classes; yet, huge amounts of real-world applications are custom classes.

So if you’re new to Mypy, particularly if you’re applying it to a large existing codebase, it’s quite natural to write a little code like this:

1
2
3
4
5
6
from dataclasses import dataclass
@dataclass
class Duck:
    quiet: bool = False
    def quack(self) -> None:
        print("Quack." if self.quiet else "QUACK!")

and then, later, write some code like this:

1
2
3
4
def duck_war(aggressor: Duck, defender: Duck) -> None:
    aggressor.quack()
    defender.quack()
    print("The only winning move is not to play.")

In untyped, pre-Mypy python, in addition to being a poignant message about the futility of escalating violence, duck_war is a very flexible function, regardless of where it’s defined. It can take anything with a quack() method.

But, while the strictness of the Mypy type-check here brings a level of safety — no None accidentally masquerading as a Duck here — it also adds a level of brittleness. Tests which use carefully-constructed fakes will now fail to type check, because duck_war technically insists upon only precisely instances of Duck and nothing else.

A few sub-optimal answers to this question

So when you want something else that has slightly different behavior — when you want a new Duck2, so to speak — what do you do?

There are a couple of anti-patterns you might arrive at to work around this as you begin your Mypy journey:

  1. add # type: ignore comments to all your tests, or remove their type signatures so they don’t get type checked. This solution throws the baby out with the bath water, as it eliminates any safety that any of these callers experience.
  2. add calls to cast(Duck, ...) around any things which you know are “enough like” a Duck for your purposes. This is much more fine-grained and targeted (and can be a great hack for working with libraries who provide type stubs which are too specific) but this also trades off a bit too much safety, since nothing at the point of the cast verifies anything unless you build your own ad-hoc system to do so.
  3. subclassing Duck. This can also be expedient, and not too bad if you have to; Mypy removes some of the sharpest edges from Python inheritance, providing some guard rails around overriding methods, but it remains a bad idea for all the usual reasons.

The good answer: typing.Protocol

Mypy has a feature, typing.Protocol , that provides a straightforward way to describe any object that has a quack() method.

You can do this like so:

1
2
3
4
5
from typing import Protocol

class Ducky(Protocol):
    def quack(self) -> None:
        "Quack."

Now, with only a small modification to its signature — while leaving the implementation the same — duck_war can now support anything sufficiently duck-like:

1
2
def duck_war(aggressor: Ducky, defender: Ducky) -> None:
    ...

In addition to making it possible for other code — for example, a unit test — to pass its own implementation of Ducky into duck_war without subclassing or tricking the type checker, this change also improves the safety of duck_war’s implementation itself. Previously, when it took a Duck, it would have been equally valid for duck_war to access the .quiet attribute of Duck as it would have been to access .quack, even though .quiet is ostensibly an internal implementation detail.

Now, we could add an underscore prefix to quiet to make it “private”, but the type checker will still happily let you access it. So a Protocol allows you to clearly reveal your intention about what types you expect your arguments to be.

Why isn’t everything like this?

Unfortunately, typing.Protocol began its life as typing_extensions.Protocol: a custom extended feature of the type system that wasn’t present in Mypy initially, and isn’t in the standard library until Python 3.8. Built-in types like Iterable and Sequence are type-checked as if they’re Protocols by being slightly special within Mypy, but it’s not clear to the casual user how this is happening.

However, other types, like io.TextIO, don’t quite behave this way, and some early-adopter projects for Mypy have types that are either too strict or too permissive because they predated this.

So I really wanted to write this post to highlight the Protocol style of describing types and encourage folks to use it.

In conclusion

The concept I’ve described above is not new in the world of type theory.

The way that typing works in Mypy with most types — builtins, custom classes, and abstract base classes — is known as nominal typing. Nominal as in “based on names”; if the object you have directly references the name of the type it’s being compared to, by being an instance of it, then it matches.

In other words: if it’s named “Duck”, it’s a duck. There are some advantages to nominal typing3, but this brittleness is not very Pythonic!

In contrast, the type of type-checking accomplished by Protocol is known as structural typing.4 Whether the caller matches a Protocol depends on the structure of your object — in other words, what methods and attributes it has.

In even other-er words - if it .quack()s like a duck, it is a duck.

If you’re just starting to use Mypy — particularly if you’re building a library that exports types that users are expected to implement — consider using Protocol to describe those types. With Protocol, while you get much-improved safety from type-checking, you don’t lose the wonderful flexibility and easy testability that duck typing has always given you in Python.


  1. And the early promise of using those type hints to making your code really fast with mypyc, although it’s still a bit too limited and poorly documented to start encouraging it too broadly... 

  2. I said the title of the post, in the post! I love it when that happens. 

  3. I have another post coming up about using zope.interface with Mypy, which combines the abstract typing of Protocol and the avoidance of traditional inheritance with the heightened safety that prevents accidentally matching similar signatures that are named the same but mean something else. 

  4. The official documentation for Protocol in Mypy itself is even titled “Protocols and structural subtyping”. 

by Glyph at July 22, 2020 06:43 AM

July 20, 2020

Hynek Schlawack

Python in GitHub Actions

GitHub’s own CI called GitHub Actions has been out of closed beta for a while and offers generous free quotas and a seamless integration with the rest of the site. Let’s have a look at how to use it for an open source Python package.

by Hynek Schlawack (hs@ox.cx) at July 20, 2020 12:00 AM

July 13, 2020

Moshe Zadka

Hey, Back Off!

The choice in parameters for back-off configuration is important. It can be the difference between a barely noticable blip in service quality and an hours-long site outage. In order to explore the consequences of the choice, I wrote a little fictional ditty about a fictional website.

I hope you enjoy escaping into this fictional reality as much as I enjoyed writing about it.

Your recipe site is different. After all, recipe sites are a dime-a-dozen. With today's modern technology, any kid can put a quick mock-up together with Django, React, and MongoDB to store recipes and retrieve them by various attributes.

In order to make your recipe site stand above the rest, you made sure it uses really cutting edge techniques. From details of the web requests coming in, using sophisiticated language parsing and machine learning algorithm, with just a few words about the user's likes and dislikes, you find the perfect recipe just for them.

HackerNews called it "just a bunch of buzzwords", of course. But once the graphs went up into the right, with 50% month-over-month growth rates, everyone explained that they knew that this one was different. Popularity sky-rocketed, the engineers worked on scaling up the site, and though it was not the world's most sophisticated microservice architecture, it was medium-service architecture, at least.

The web front end would call the machine learning cluster, running on special GPU machines, to get the appropriate keywords by which to look up the recipe. Maybe not a the kind of 50-microservices-architecture that takes three whiteboards to explain, but at least it was easy enough to scale up horizontally. You hired a great Site Reliability Engineer, who built a sophisticated continuous delivery machine. As your machine learning team fine-tuned the model, it would slowly roll out into the cluster, running continous A/B tests that would immediately roll back the change if the model performed worse than before.

The SRE also made sure the web front-end would back off of a malfunctioning machine learning node, as those do sometimes. Instead of immediately reconnecting, it would try reconnecting at intervals starting at 5 milliseconds and increasing by powers of 2: 10ms, 20ms, and so on.

Amazing.

You will never forget what happened next. A new model was rolled out. It looked great. Performed wonderfully. The only problem was that it had a hidden time-bomb. The model had a small overfit that happend to match the sum of the user's ID and the time. This, in itself, is the kind of overfit that happens every day. Unfortunately, if that value exceeded a threshold, it triggered a bug in the machine learning library. When that happened, the node would become flakey, and start dropping random connections. Even worse, for the specific input used in the health checks, it would work: all nodes appeared healthy in the monitoring dashboard, and to the service discovery framework.

A nightmare scenario.

One by one, nodes started dropping off, as users with higher IDs connected and the seconds ticked on mercilessly. The front end started backing off: by 5ms, then 10ms, then 20ms.

The exponential function explodes fast, so of course the backoff (with jitter, as is common practice) had an upper limit. When the Site Reliability Engineer set the upper limit, there was a lot of things on their mind. Decisions are hard to make. Sometimes, one neuron firing makes all the difference. But neurons are small, and electrons are smaller.

When we look at the firing of a neuron, we can no longer imagine the world is as Maxwell and Newton imagined it to be. Quantum effects must be taken into account. As per the obviously correct Many-Worlds interpretation of Quantum Mechanics, the one world had split into two that would never interact again.

In one world, the SRE had set the maximum to 1 second. In the other, to 10 minutes. For a while, these worlds seemed to parallel each other closely: sure, a few bits on in a YAML file were different, but what storm could come from such a small butterfly?

It was not until the disasterous model roll-out that the worlds would diverge wildly. In the world where the limit was set to one second, the front ends were hammering the nodes mercilessly. Rolling back the model would mean that a node would start getting too many connections, and before a few seconds had passed, it would fall back down.

Eventually, in this world, the whole front-end had to be brought down, the model fully rolled back, and only then the front-end brought back up. This was too bad, because the front-end had a bad-but-working model built-in, and some recipes would still be found while the "outage" was happening. Sure, they were not the "wow you read my mind" quality, but they were still decent results.

Bringing down the whole front-end cluster, and bringing it up again, turned the outage from a blip in recipe click-through rates into a three-hour, news-coverage-worthy outage.

In the other world, where the back-off had a maximum of ten minutes, things progressed much more smoothly. As a machine learning node's model was rolled back, it came back up, and machines started connecting to it. The first one to be brought up did go down from connection overload -- but took ten minutes to do so, enough time to ascertain that this solution was correct. A few more nodes were rolled back, and as the connections grew, the roll back flowed through the fleet.

The outage has been managed, and other than a few irate customers posting on Twitter about looking for vegan recipes and getting a "meat lover's delight", the outage mostly went unnoticed.

Back-off is important. Exponential back-off with jitter and a maximum is almost always the right solution. Yes, the exponent matters a little. The initial back off matters a little. But the maximum matters a lot. Set the maximum to "human time frames": a few minutes (1-30) is a good balance.

Even in a big cluster, an extra action every 10 minutes will probably not cause serious downstream repercussions. But even the most entitled customer can be mollified by a support technician for five minutes doing "everything possible to find a solution" while waiting the clock out on the 10 minute back-off clock.

Choose maximums for back-off carefully, or the site you bring down may be your own.

Thanks to Avy Faingezicht for his feedback on an earlier draft. Any mistakes or issues that remain are my reponsibility.

by Moshe Zadka at July 13, 2020 04:00 AM

July 05, 2020

Glyph Lefkowitz

Zen Guardian

There should be one — and preferably only one — obvious way to do it.

Tim Peters, “The Zen of Python”


Moshe wrote a blog post a couple of days ago which neatly constructs a wonderful little coding example from a scene in a movie. And, as we know from the Zen of Python quote, there should only be one obvious way to do something in Python. So my initial reaction to his post was of course to do it differently — to replace an __init__ method with the new @dataclasses.dataclass decorator.

But as I thought about the code example more, I realized there are a number of things beyond just dataclasses that make the difference between “toy”, example-quality Python, and what you’d do in a modern, professional, production codebase today.

So let’s do everything the second, not-obvious way!


There’s more than one way to do it

Larry Wall, “The Other Zen of Python”


Getting started: the __future__ is now

We will want to use type annotations. But, the Guard and his friend are very self-referential, and will have lots of annotations that reference things that come later in the file. So we’ll want to take advantage of a future feature of Python, which is to say, Postponed Evaluation of Annotations. In addition to the benefit of slightly improving our import time, it’ll let us use the nice type annotation syntax without any ugly quoting, even when we need to make forward references.

So, to begin:

1
from __future__ import annotations

Doors: safe sets of constants

Next, let’s tackle the concept of “doors”. We don’t need to gold-plate this with a full blown Door class with instances and methods - doors don’t have any behavior or state in this example, and we don’t need to add it. But, we still wouldn’t want anyone using using this library to mix up a door or accidentally plunge to their doom by accidentally passing "certian death" when they meant certain. So a Door clearly needs a type of its own, which is to say, an Enum:

1
2
3
4
5
from enum import Enum

class Door(Enum):
    certain_death = "certain death"
    castle = "castle"

Questions: describing type interfaces

Next up, what is a “question”? Guards expect a very specific sort of value as their question argument and we if we’re using type annotations, we should specify what it is. We want a Question type that defines arguments for each part of the universe of knowledge that these guards understand. This includes who they are themselves, who the set of both guards are, and what the doors are.

We can specify it like so:

1
2
3
4
5
6
7
from typing import Protocol, Sequence

class Question(Protocol):
    def __call__(
        self, guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]
    ) -> bool:
        ...

The most flexible way to define a type of thing you can call using mypy and typing is to define a Protocol with a __call__ method and nothing else1. We could also describe this type as Question = Callable[[Guard, Sequence[Guard], Door], bool] instead, but as you may be able to infer, that doesn’t let you easily specify names of arguments, or keyword-only or positional-only arguments, or required default values. So Protocol-with-__call__ it is.

At this point, we also get to consider; does the questioner need the ability to change the collection of doors they’re passed? Probably not; they’re just asking questions, not giving commands. So they should receive an immutable version, which means we need to import Sequence from the typing module and not List, and use that for both guards and doors argument types.

Guards and questions: annotating existing logic with types

Next up, what does Guard look like now? Aside from adding some type annotations — and using our shiny new Door and Question types — it looks substantially similar to Moshe’s version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from dataclasses import dataclass

@dataclass
class Guard:
    _truth_teller: bool
    _guards: Sequence[Guard]
    _doors: Sequence[Door]

    def ask(self, question: Question) -> bool:
        answer = question(self, self._guards, self._doors)
        if not self._truth_teller:
            answer = not answer
        return answer

Similarly, the question that we want to ask looks quite similar, with the addition of:

  1. type annotations for both the “outer” and the “inner” question, and
  2. using Door.castle for our comparison rather than the string "castle"
  3. replacing List with Sequence, as discussed above, since the guards in this puzzle also have no power to change their environment, only to answer questions.
  4. using the [var] = value syntax for destructuring bind, rather than the more subtle var, = value form
1
2
3
4
5
6
7
8
9
def question(guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]) -> bool:
    [other_guard] = (candidate for candidate in guards if candidate != guard)

    def other_question(
        guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]
    ) -> bool:
        return doors[0] == Door.castle

    return other_guard.ask(other_question)

Eliminating global state: building the guard post

Next up, how shall we initialize this collection of guards? Setting a couple of global variables is never good style, so let’s encapsulate this within a function:

1
2
3
4
5
6
7
from typing import List

def make_guard_post() -> Sequence[Guard]:
    doors = list(Door)
    guards: List[Guard] = []
    guards[:] = [Guard(True, guards, doors), Guard(False, guards, doors)]
    return guards

Defining the main point

And finally, how shall we actually have this execute? First, let’s put this in a function, so that it can be called by things other than running the script directly; for example, if we want to use entry_points to expose this as a script. Then, let's put it in a "__main__" block, and not just execute it at module scope.

Secondly, rather than inspecting the output of each one at a time, let’s use the all function to express that the interesting thing is that all of the guards will answer the question in the affirmative:

1
2
3
4
5
6
def main() -> None:
    print(all(each.ask(question) for each in make_guard_post()))


if __name__ == "__main__":
    main()

Appendix: the full code

To sum up, here’s the full version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Protocol, Sequence
from enum import Enum


class Door(Enum):
    certain_death = "certain death"
    castle = "castle"


class Question(Protocol):
    def __call__(
        self, guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]
    ) -> bool:
        ...


@dataclass
class Guard:
    _truth_teller: bool
    _guards: Sequence[Guard]
    _doors: Sequence[Door]

    def ask(self, question: Question) -> bool:
        answer = question(self, self._guards, self._doors)
        if not self._truth_teller:
            answer = not answer
        return answer


def question(guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]) -> bool:
    [other_guard] = (candidate for candidate in guards if candidate != guard)

    def other_question(
        guard: Guard, guards: Sequence[Guard], doors: Sequence[Door]
    ) -> bool:
        return doors[0] == Door.castle

    return other_guard.ask(other_question)


def make_guard_post() -> Sequence[Guard]:
    doors = list(Door)
    guards: List[Guard] = []
    guards[:] = [Guard(True, guards, doors), Guard(False, guards, doors)]
    return guards


def main() -> None:
    print(all(each.ask(question) for each in make_guard_post()))


if __name__ == "__main__":
    main()

Acknowledgments

I’d like to thank Moshe Zadka for the post that inspired this, as well as Nelson Elhage, Jonathan Lange, Ben Bangert and Alex Gaynor for giving feedback on drafts of this post.


  1. I will hopefully have more to say about typing.Protocol in another post soon; it’s the real hero of the Mypy saga, but more on that later... 

by Glyph at July 05, 2020 08:44 PM

July 03, 2020

Moshe Zadka

A Labyrinth of Lies

In the 1986 movie Labyrinth, a young girl (played by Jennifer Connelly) is faced with a dilemma. The adorable Jim Henson puppets explain to her that one guard always lies, and one guard always tells the truth. She needs to figure out which door leads to the castle at the center of the eponymous Labyrinth, and which one to certain death (dun-dun-dun!).

I decided that like any reasonable movie watcher, I need to implement this in Python.

First, I implemented two guards: one who always tells the truth, and one who always lies. The guards know who they are, and what the doors are, but can only answer True or False.

import dataclasses
from typing import List

guards = [None, None]
doors = ["certain death", "castle"]

@dataclasses.dataclass
class Guard:
    _truth_teller: bool
    _guards: List
    _doors: List[str]

    def ask(self, question):
        answer = bool(question(self, self._guards, self._doors))
        if not self._truth_teller:
            answer = not answer
        return answer

guards[0] = Guard(True, guards, doors)
guards[1] = Guard(False, guards, doors)

This being a children’s movie, the girl defeats all odds and figures out what to ask the guard: “would he (points to the other guard) tell me that this (points to the door on the left) door leads to the castle?”

def question(guard, guards, doors):
    other_guard, = (candidate for candidate in guards if candidate != guard)
    def other_question(ignored, guards, doors):
        return doors[0] == "castle"
    return other_guard.ask(other_question)

What would the truth-teller answer?

guards[0].ask(question)
True

And the liar?

guards[1].ask(question)
True

No matter who she asks, now she can count on a lie. After a short exposition, she confidently walks through the other door. It’s a piece of cake!

Thanks to Rina Arstain and Veronica Hanus for their feedback on an earlier draft. Thanks to Glyph Lefkowitz for the idea to use dataclasses. All mistakes and issues that remain are my responsibility.

Some related work in the field of formalizing logic puzzles:

by Moshe Zadka at July 03, 2020 05:30 PM

June 25, 2020

Itamar Turner-Trauring

Your dev environment matters less than you think

How do you setup your dev environment? Depending on your language there are many choices of editor, package manager, build tool, linter, on and on. And every article you find will have a different combination of suggested tools, each of which claiming that their list is The Right Way To Do Things.

So which do you choose?

The short answer: it doesn’t matter. Your choice of dev environment is meaningless.

The slightly less flippant answer is that, yes, there are some contraints on which tools you should pick, but otherwise you should just pick something and move on.

Let’s see why dev environments don’t matter that much in the end, and what limited constraints you should apply when choosing your tools.

Learning how to cook

Imagine you’re training to become a chef. You will need to learn how to use a knife correctly, to chop and dice safely and quickly.

And yes, you need a sharp knife. But when you’re starting out, it doesn’t matter which knife you use: just pick something sharp and good enough, and move on. After all, the knife is just a tool.

The people eating the food you cook don’t care about which knife you used: they care how the food tastes and looks.

After six months in the kitchen, you’ll start understanding how you personally use a knife, what cuisines you want to pursue, what techniques you want to vary. And then you’ll have the knowledge to pick a specific knife or knives exactly suited to your needs.

But remember: the people eating your food still won’t care which knife you used.

Choosing a dev environment

When you use a website, you don’t care which build tool the programmer used. When you run an app, you don’t care which editor they used. You want the software to work, to do what it says, to be easy to use, to get out of your way—and you don’t care how they did it.

And that applies just as much to the users of your code: they don’t care which tools you used.

And when you’re starting out, whether programming in general or a new language or framework, you don’t know how you will like to work. So instead of obsessing over finding the ideal development environment and toolchain, just pick tools that are good enough:

  • Popular: So you can easily find help.
  • Easy to get going: Your goal is ship useful code, and as a beginner time spent fiddling with your dev environment won’t help with that.

Once you have enough experience, you will start developing opinions. You might become choosy about which tools you use, or end up customizing them to your needs. You might even write an article about your particular dev environment and favorite tools.

But however strong your preferences are, chances are that given tools you don’t quite like as much, you will still do just fine. If you know what you’re doing, you can chop vegetables with any sharp knife, even if it’s not your favorite.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you’d produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

June 25, 2020 04:00 AM

June 14, 2020

Moshe Zadka

Conditionally Logging Expensive Tasks

(I have shown this technique in my mailing list. If this kind of thing seems interesting, why not subscribe?)

Imagine you want to log something that is, potentially, expensive to calculate. For example, in DEBUG mode, you would like to count the classes of the objects in gc.get_objects() and log those counts: this is often a useful technique for diagnosing reference leaks. This is pretty heavy to calculate, and logging it always sounds wasteful.

The logging module has lazy interpolation.

A line like

logging.debug("total number: %r", 2500)

will not bother calculating the string interpolation total number: 2500 unless the logging level is set to output it.

However, calling logging.debug("total number: %r", get_object_counts()) will still call get_object_counts() regardless. It is not the cost of calculating the interpolation, but that of calculating the counts themselves that you would like to avoid.

One way to piggy-back on the lazy interpolation to do lazy evaluation is to write a custom class with a lazy __repr__():

class OnDemand:
    def __init__(self, callable):
        self.callable = callable
    def __repr__(self):
        return repr(self.callable())

This defines a class, OnDemand, which will only call the function when it needs its repr.

This allows us to write code like:

logging.debug("total number: %r", OnDemand(get_object_counts))`

Now, get_object_counts() will not be called at all: notice that the logging line does not call it. This makes the lazy evaluation explicit: we explicitly delay evaluation with OnDemand.

Here is a simple example of how it would work:

>>> def get_object_counts():
...     print("doing something expensive")
...     return 5
...
>>> debug_logging.debug("result is %r", OnDemand(get_object_counts))
doing something expensive
DEBUG:debug:result is 5
>>> warnonly_logging.debug("result is %r", OnDemand(get_object_counts))
>>> warnonly_logging.debug("result is %r", get_object_counts())
doing something expensive

Note that when the warnonly_logging is called with a .debug() method, the expensive calculation is not done: not only is the log message ignored, but the function that calculates the value is not done.

The last line shows that without the careful usage of OnDemand, the expensive calculation is done.

This should make it easy to sprinkle heavy calculations in logging.debug statements. Coupled with easy ways to trigger logging level changes (which are beyond the scope of this article) this gives a powerful way to get insight into your program's innards.

(Thanks to Adi Stav, Avy Faingezicht, Chris Withers, Dave Briccetti, and Lucas Wiman on their feedback on an early draft of this article. Any mistakes or issues that remain are my responsibility.)

by Moshe Zadka at June 14, 2020 03:30 AM

May 18, 2020

Itamar Turner-Trauring

To get a better programming job, explain your problem-solving skills

When you’re looking for a new programming job, how do you explain your value? The usual approach is a long list of technologies, but this leaves out a critical skill: your ability to solve problems.

If you can convey your level of skill at problem solving, you can get:

  • More job offers.
  • Jobs with technologies you don’t know.
  • A higher salary by getting slotted into a higher pay grade.

Often just a few extra words can make a big difference in demonstrating your skills. Let’s see how you can do it.

Three levels of problem solving skills

As I discussed elsewhere in more detail, problem solving comes in three stages, each approximately corresponding to a particular career stage (these names are due to Randall Koutnik):

  1. Staff or principal software engineers are Finders: they find new problems.
  2. Senior software engineers are Solvers: they solve already-identified problems.
  3. Junior software engineers are Implementers: they implement already-identified solutions.

The earlier you are in the problem-solving process, the more productive you are, and therefore the more valuable as an employee.

As a result, you need to communicate how advanced your skill is across these three levels to demonstrate your productivity. Everything from your resume to the stories you tell in interviews should communicate your level of skill.

Explaining your skill level

Explaining your skill level involves telling stories that use the correct words and sufficient information to demonstrate your skill. I’m going to use resumes as an example here, but you should ensure you do this in interviews as well—if you’re practicing with a friend, make sure they’re checking for this, it’s easy to leave the information out.

Consider the following entry from a resume:

Moved deployment from manually-managed hosts to a new Kubernetes cluster.

This experience entry uses an implementation-level verb: “moved”. Similarly, “coded”, “tested”, “wrote”, “fixed”, “optimized"—these are all about implementation. And maybe it’s implementation was tricky and difficult, and it’s good to convey that, but if you also solved the problem or identified the problem it’s impossible to tell from this phrasing.

Any task you write about in your resume and talk about in interviews was the result of someone identifying a problem, and someone coming up with the solution. If it was you, make sure you say that.

Let’s say in our example above you were the one tasked with figuring out an alternative to manually-managed hosts. If so, you need to add additional context and verbs that convey that:

Investigated alternatives to manually-managed hosts, decided on Kubernetes, and moved the deployment to a new cluster.

Now we can clearly see you solved the problem.

If you were the one who identified the problem, again you need to make sure you explicitly call that out:

Identified manually-managed hosts as an operational problem, and got management buy-in to change to a better system. Subsequently investigated alternatives, decided on Kubernetes, and moved the deployment to a new cluster.

Now we can see all the value you provided.

If you only identified a problem, that’s fine too, just say so:

Noticed a critical customer-facing bug that was impacting many users; after the team responsible for that area fixed it, they reported a $300,000 increase in revenue as a result.

Communicating value at different career stages

Now that you’ve seen how to phrase your skill level, let’s see how this works at different career stages.

Implementers

When you’re at the start of your career you will mostly be implementing other people’s solutions. However, in one or two cases you might be starting to solve problems, or even find problems. Make sure your resume highlights those instances, however small.

Sometimes this will happen in non-programming contexts. For example, I knew one early stage software engineer who made very insightful suggestions about hiring. Mention it anyway.

If you had another career before switching to programming, you’ll likely have plenty of examples. Make sure to highlight those even if they’re unrelated to coding; those problem-finding and solving skills will at least partially transfer.

Solvers

If you can solve problems on your own, you want to both:

  1. Communicate this fact.
  2. Highlight the places where you did identify problems, even if it’s happened in only a few cases.

Review your resume and make sure all the relevant entries explicitly talk about the ways in which you came up with the solution. If you already have a suitable job title, like senior software engineer, then getting the phrasing right isn’t quite as important—but it’s still worth doing.

If you still have a junior job title but your skills have progressed, it’s doubly important to ensure you’re highlighting your problems solving skills.

Once you’ve done that, try to expand on any places where you were involved in identifying problems.

Finders

Your goal as a Finder is to ensure you’re not confused with a Solver. It’s very easy to phrase things in a way that doesn’t make clear you identified the problem—after all, identifying the problem may only have a taken a few minutes.

But those initial few minutes where you noticed something needs to be done are quite often the most value parts of the whole process, so make sure you explicitly talk about finding the problem. This is even more important if your current job title doesn’t reflect your actual level of skill.

What to do next

Even if submitting resumes isn’t the best way to find a job, you still need one, and writing it is a good way to rehearse for an interview.

So get your resume out, and for each experience entry make sure it’s clear whether you implemented the solution, solved the problem, and/or found the problem. This will take you an hour, no more, and at the end of this process you’ll have a much easier time communicating some of your most valuable non-technological skills.

And if you’d like to learn some more of those skills, check out my new book, The Secret Skills of Productive Programmers.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you’d produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

May 18, 2020 04:00 AM

May 14, 2020

Itamar Turner-Trauring

How to prepare for losing your programming job

Another week has passed, and another 3 million people in the US have filed for unemployment. While the current situation hasn’t impacted programming jobs quite as much, it’s just a matter of time before the economic damage hits most everywhere. There will be layoffs, and plenty of them, and occasionally whole companies shutting down.

So even if your job is secure now, you might still lose it in the future. How can you prepare? What can you do to reduce your future risks?

The first thing you need to do is come up with a plan, which is what this article is all about. In particular, you will want to:

  • Try to make sure you have the necessary financial resources.
  • Make your future job hunt easier, by building a network, making sure your skills are up-to-date, and making sure you have visible public proof of your skills.
  • Come up with a series of fallback plans if things don’t go well.

Let’s go over these one-by-one.

Money in the bank

If you lose your job, you lose your paycheck—but you still have to pay your bills. And after the dot-com bust, the last big tech recession, it took years for all the jobs to come back

If you have at least six months of living expenses in cash, that’s a good start. If not, it’s best to think about how to get there.

There are two sides to this:

  1. If possible, you need to cut your expenses, which will both allow you to save and reduce how much money you need for each unsalaried month. See this more detailed article.
  2. Ensuring your financial assets, if you have any, aren’t correlated with your job.
    1. If you own stock in your own company, you are making a double bet: if the company goes down, you will lose money and your job.
    2. If you work for a startup that needs to raise money soon, a crashing stock market will also greatly reduce the viability of your current job.
    3. More broadly, if you own stocks and to a lesser extent corporate bonds, how correlated are they with your ability to keep a job?
    4. Even more broadly, how much of your net worth is tied to the tech industry, or the economy as a whole?

In short, you want cash on hand, and plenty of it.

Making your future job hunt easier

Searching for a job will be much easier if you:

  • Know lots of people.
  • Have useful skills.
  • Can visibly demonstrate you have those skills.

Let’s cover those one by one.

Knowing lots of people

Applying for a job by sending in your resume is the hardest way to get hired. It’s much easier if you know someone who can vouch for you, can get you past the initial screen, or can fill you in on what the hiring manager really wants.

So the more people you know, the better off you are. Elsewhere I have a guest post about (social) networking, but that can take time and is harder during a pandemic. But there are still a few easy things you can do in the short term:

  • Join a public Slack or two for the technology area you specialize in. You can help answer people’s questions, see when people mention they’re hiring, and more broadly get a better sense of the zeitgeist, which is useful for building your skills (see below).
  • Keep the contact info for former co-workers. This can be done via LinkedIn, for example, and often there will be an ex-employee Slack. If there isn’t one, you can start it—especially if your company is having initial rounds of layoffs. This too can often be very educational, as former employees might be more forthcoming.
  • Find ways to help other people. Can you teach useful skills? Join a local mutual aid organization?

Build useful skills

If you’ve been working at the same job for a while, it’s easy for your technical skills to get a little stale. Unless you’re working at the right place, hang out with the right people, or do the right things, you might not be aware of the latest technology, or you might be using out-of-date practices.

So you’ll want to update your skills a little. As always, doing this extensively outside of your job may not be possible, so try to:

  1. Spend an hour a week, ideally during work hours, getting up-to-date on the latest technologies. The goal here is breadth, not depth: sign up for a newsletter for your technology stack (he’s a partial list), skim the topics at a relevant conference, maybe watch a talk or two. I cover learning for breadth here, but the basic idea is that knowing a tool exists and what it does can take very little time, and is quite valuable on its own: both on the job, but also in interviews (“I haven’t used it myself, but I believe tool X is how you would solve this”).
  2. Try to learn more technologies on the job, because that is the best place to do so.

Create visible proof of skills

Having skills is one thing, proving you have them is another. It is therefore quite useful during a job hunt to have some visible, public proof you have these skills. For example:

Open source: When I moved to the US in my 20s, my work on an open source project made it much easier for me to get job interviews, and eventually job offers. It wasn’t just that my resume said that I knew computer networking, I could point to a publicly available project used by real people and say “I worked on that”.

Even if you share code that isn’t widely used, it can still be useful as proof of skill.

Conference talks: Speaking publicly about a particular skill, technology, or project is a great way to get public proof of skills. With conferences moving online, speaking at conferences is now much easier. You don’t have to travel or pay for you travel, and you don’t have to get approval from your manager to lose work. If there’s a topic where you know enough to help someone else, look for conferences on the topic and submit a proposal.

Blogging: Have something to share, or learning something new? Write it down and share it publicly. Writing well is an immensely useful skill in general, so this will also count as improving your skills. You can write for your own blog, or you can propose a blog post on your company’s tech blog, if they have one.

Fallback plans

In an ideal world you would lose your job, start a job search, find a new job within a month, and everything will be fine. Sadly we don’t always live in an ideal world.

So if you live in a country like the US that a shitty social net it’s worth coming up with a series of fallback plans, if only for your own peace of mind.

For example, how can you make your money last longer?

  1. As soon as you lose your job, apply for unemployment.
  2. Cut additional costs.
  3. If time stretches on and you still don’t have a job, figure out ways to reduce housing costs. Are you young and have the ability to move back in with your parents? Have more room than you need and the option of adding roommates? All in a pandemic-safe way, of course.
  4. Any ways you can make money some other way, if it’s really taking too long?

If you can’t find a job immediately, you will have probably have more time to upgrade your skills.

  1. Which skills are worth working on?
  2. What’s the best way to improve them?

You’ll also want to meet more people who can help you find a job.

  1. Can you go to online meetups?
  2. Find more places to interact with people online?

Write this all down, and when you’re worried you’ll at least have the comfort of knowing there will be some things you can do if and when your job goes away.

We’re all in this together

As with most big problems, there is only so much you can do as an individual: to meaningfully improve the situations we need to work together, whether in mutual aid groups or via political organization. On the other hand, you need to ensure that you as an individual are doing OK; you can’t help others if you’re collapsing under your own troubles.

And since this can all be overwhelming, start with a few simple actions:

  1. Cut an expense or two.
  2. Get in touch with some old co-workers.
  3. Sign up for a newsletter.
  4. Start writing down your fallback plans.

And then, once you have things under control emotionally, when you have a plan and you know what you’re doing next, start thinking about how you can help others, and work with other people to improve things for everyone.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you’d produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

May 14, 2020 04:00 AM

May 04, 2020

Hynek Schlawack

Why You Should Document Your Tests

Some projects have the policy that all tests must have an explanatory comment – including all of mine. At first, I found that baffling. If that’s you right now, this article is for you.

by Hynek Schlawack (hs@ox.cx) at May 04, 2020 12:00 AM

April 27, 2020

Moshe Zadka

Numbers in Python

Numbers in Python come in all shapes and forms. The reason different kind of representations of numbers exist is because they all have different trade-offs. These trade-offs are often surprising!

Integers

The most surprising things about integers is how easily they stop being integers. Dividing two integers, for example, 4/3, gives a float, and (4/3)*3 is the float 4.0. Even if a program has no floating point numbers coming in, all that is needed for floating point numbers to exist somewhere is a division operation.

Floats

Floats do not behave like numbers. Numbers obey certain mathematical properties: subtraction is the inverse to addition, addition is associative, and more.

For example

>>> 1 + 2 - 2 - 1
0
>>> 0.1 + 0.2 - 0.2 - 0.1
2.7755575615628914e-17

adding two numbers, and then subtracting them one at a time, does not result in the same value.

They do not obey the associative law of addition, a + (b + c) = (a + b) + c:

>>> a = 2**-53
>>> (a + a) + 1 == a + (a + 1)
False

These show just two of the corner cases that floating point numbers exhibit, which can be surprising. A full treatise on the ways that floating point behavior can be surprising is too big to fit in the margin of this blog post.

Fractions

Many algorithms that look straightforward "explode" with exact fractions. Explosion usually starts as time explosion: the algorithm becomes "quadratic": the time it takes is proportional not to the input length, but to the scare of the input's length. In other words, doubling the input size quadruples the time it takes.

If enough time is spent, memory explosion is also possible: the space requirements increase, until all memory fills up.

One weird protection against memory explosion is that usually it will take too long to get it, and the program will be killed for "hanging".

One such "algorithm" is addition.

>>> print(set(type(p) for p in primes))
>>> one = fractions.Fraction(1)
>>> before = datetime.now()
>>> res = sum(one/p for p in primes[:10000])
>>> after = datetime.now()
>>> print("It took", after-before)
>>> print("Size of output", len(str(res)))
>>> print("Approximate value", float(res))
{<class 'int'>}
It took 0:01:16.033260
Size of output 90676
Approximate value 2.7092582487972945

This is just adding the inverses to some primes (I removed the first few from the list, and then chopped the list to be the next 10,000). On a nice laptop designed as a gaming rig, adding 10,000 numbers took over a minute, and resulted in an output that was over 90K!

In comparison, running the same algorithm with floats is much more efficient:

>>> print(set(type(p) for p in primes))
>>> before = datetime.now()
>>> res = sum(1/p for p in primes[:10000])
>>> after = datetime.now()
>>> print("It took", after-before)
>>> print("Size of output", len(str(res)))
>>> print("Approximate value", float(res))
{<class 'int'>}
It took 0:00:00.000480
Size of output 17
Approximate value 2.709258248797317

The time it took is less than a millisecond, and some of that is possibly measurement error from datetime. This is around 10,000 times faster. The output can be saved in 17 bytes: a mere 1000 reduction in space. However, the result is inaccurate:

Approximate value 2.7092582487972945
Approximate value 2.709258248797317
                    1234567891234

The results are off by less than 1e-14. This would be like getting the distance to the moon wrong by one millimeter. In cases that do not involve sending a rocket to the moon with less than a millimeter (one grain of sand) tolerance, floats give a result that is precise enough and several orders of magnitude more efficient.

A lot of the responses to this were along the lines of "fractions are slow because they are implemented in Python". Python can be responsible for a 10x slowdown, but not 10,000x. There is a third-party module, quicktions, which implements fractions using Cython.

Using quicktions was, indeed, quicker. It took the time down from a minute and sixteen seconds to to a minute and fifteen seconds on my laptop.

Fundamentally, the problem is that this is a quadratic algorithm. I chose the inputs carefully: the worst case behavior for fraction addition is on prime numbers. But unless you can predict the inputs to an algorithm, you cannot rely on anything but the worst-case behavior.

Decimals

Decimal numbers are useful when managing financial transactions. This is for the most boring reason possible: the laws governing finance are specified in decimals. However, all decimal point calculations in Python are governed by hidden global state: the context. The context determines precision, and is taken from the caricature of how action at a distance is problematic for APIs.

Quoting the documentation (for Python 3.8):

>>> getcontext().prec = 6
>>> Decimal(1) / Decimal(7)
Decimal('0.142857')
>>> getcontext().prec = 28
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')

In practice, code might have hundreds of lines between setting the precision and doing a calculation. The calculation can be in another function, or even another file.

The only safe way to use decimal numbers in Python is with localcontext:

>>> getcontext().prec = 6
>>> # 6853 lines elided
... with localcontext() as ctx:
...     ctx.prec = 10
...     Decimal(1) / Decimal(7)
...
Decimal('0.1428571429')

As long as you are careful to use localcontext, decimals work pretty well. It is thread-safe and signal-safe.

Summary

Before you do things with numbers in your code, stop and think. What types should you use? What do you want to happen? What tolerances are important?

Not thinking means letting the corner cases in the code just happen.

(Thanks to Adi Stav, Aaron Hall, and Avy Faingezicht for their feedback on an earlier draft. All issues and mistakes that remain are my responsibility.)

by Moshe Zadka at April 27, 2020 12:00 AM

My Little Pony -- DevOps is Magic

(This article is based on the one I originally published on OpenSource.com.)

In 2010, the My Little Pony franchise was rebooted with the animated show My Little Pony: Friendship is Magic. The combination of accessibility to children with the sophisticated themes the show tackled garnered a following that cut across ages. I was swept up in the wave and discovered there is a lot to learn about DevOps from the show.

The show begins with Twilight Sparkle reading obscure documentation, only to realize that Equestria, where the show is set, is due to suffer a calamity. Though Nightmare Moon has been imprisoned for a thousand years, there is a prophecy she will return.

Lesson 1: Technical debt matters.

Document technical debt. Pay attention to the signs of risk no matter how infrequently they occur. Have a plan to resolve it.

Twilight Sparkle goes to her manager with the news, only to be told that it is not a current priority. She is sent to Ponyville to prepare for the coming celebration, instead.

Lesson 2: Communication with management is key.

Twilight Sparkle communicated her priority but did not convince her management that it was more important than the celebration.

We all need to make clear what the business case is for resolving critical issues. It is also not straightforward to explain technical debt in business terms. If management does not agree on the severity, find new ways to communicate the risk, and team up with others who speak that language.

As the prophecy has foreseen, Nightmare Moon returns and declares eternal night. Twilight quickly understands that she cannot resolve the issue by herself, and she recruits the ponies who will become, with her, the "Mane Six." They each stand for a different element of harmony — Applejack stands for Honesty, Fluttershy for Kindness, Pinkie Pie for Laughter, Rarity for Generosity, Rainbow Dash for Loyalty, and Twilight Sparkle herself for Magic.

Lesson 3: Few are the issues that can be resolved by one person.

When facing an outage, reach out to other people with complementary skills who can help you. It is best if they are different than you: different backgrounds leads to differing perspectives, and that can lead to better problem-solving.

Lesson 4: When resolving an outage, honest communication is key.

Throughout the struggle against the eternal night, the Mane Six have to speak openly and honestly about what's not working. Their blameless communication is part of problem-solving.

Lesson 5: When resolving an outage, kindness to yourself and to others is crucial.

Though tempers flare hot in the land of Equestria, we all benefit from coming back to working together.

Lesson 6: Laughter is important.

Even when everything comes crashing down, remember to take a break, drink a glass of water, and take a deep breath. Stressing out does not help anything.

Lesson 7: Be generous.

Even if you are not on-call right now, if your help is needed to resolve a problem, help out as you hope your colleagues will do for you.

Lesson 8: Be loyal.

An outage is not a time to settle rivalries between teams. Focus on how to collaborate and resolve the outage as a team.

Lesson 9: Though people skills are important, you have to understand the technology on a deep level.

Keep your tech skills sharp. Expertise is not only the ability to learn; it is knowing when that information is needed. Part of being an expert is practice.

After the issue is resolved, Princess Celestia realizes that the Mane Six are crucial to the long-term survival of Equestria, and tells Twilight Sparkle to stay in Ponyville and keep researching the magic of friendship.

Lesson 10: After an outage is resolved, conduct a review, take concrete lessons, and act on them.

I could go on, episode by episode, detailing lessons relevant for DevOps, but I will wrap up with one of my favorite ones.

In the "Winter Wrap-Up" episode, all the ponies in Ponyville help in preparing for the spring. As per tradition, they do not use magic, leaving Twilight Sparkle to wonder how she can contribute. Eventually, she realizes that she can help by making a checklist to make sure everything is done in the right order.

Lesson 11: When automation is impossible or inadvisable, write a solid checklist, and follow it. Do not depend on your memory.

Twilight Sparkle and the Mane Six overcome great obstacles as a team, and now have a system to improve as a team. I hope you, too, can help bring a collaborative DevOps culture to your work.

by Moshe Zadka at April 27, 2020 12:00 AM

April 22, 2020

Moshe Zadka

Goodbye, John H. Conway

John H. Conway passed away ten days ago, and I think it's only now I can write a proper eulogy.

I was first introduced to his work, if not his name, when I was at the end of elementary school. I am sure everyone has heard about the Game of Life, but did you know it had a 1D version? The 1D version is significantly simpler, but has the advantage that on a grid paper, you can just play with yourself manually by putting a generation on each line.

This was 12 year old me's "fidget spinner", how I kept myself calm in classes. Starting with an initial configuration and letting it evolve.

Later on, when I went to college, I got to borrow his amazing book, "On Numbers and Games". Now, I am definitely the sort of person who reads math books for fun, but most of them are not fun. They are dry, poorly written, and make leaps all the time. "ONAG" was the exact opposite. It's a short, delightful book, that tries to get across the thinking, the intuition, the methods, and, yes, the joy.

Fast forward a decade or two, and again I found myself enamored with another one of his inventions: the Look-and-Say sequence. My old interview coding question was getting too popular on the interview-question-sites, and I was getting worried. Writing code for the look-and-say sequence is reasonably straightforward, but does require basic skills: looping while keeping a bunch of state variables.

Then I read about his work on the look-and-say sequence, and was utterly amazed and delighted by it. Atoms and decay and asymptotic growth!

Throughtout his career, I think what made his things special is that he embodies the truest mathematician spirit, which is also the truest geek spirit: starting out with something simple, and then nerding out about it until you have built a whole universe.

Whether it is a place where guns shoot spaceships at 3/8 the speed of light, an algebraic field so vast it includes all other ordered fields and also all infinities, or a concept of numbers atomically decaying, he was a master at whipping out mathematicially consistent fictional worlds.

Goodbye John H. Conway, you were taken from us too soon.

by Moshe Zadka at April 22, 2020 12:00 AM

April 20, 2020

Itamar Turner-Trauring

The secret skills of productive programmers

This article was written during abnormal circumstances, with much of the planet under lockdown due to the COVID-19 pandemic. Parents with children at home have far less time, and pretty much everyone is feeling stressed and distracted.

Under more normal circumstances there are only so many hours in the day to do your job; now it’s even worse. And yet work needs to get done: code needs to get written, features need to be shipped, bugs need to be fixed.

Faced with an ever growing list of tasks, how do you get everything done?

The short answer is, you can’t. You will never get everything done.

What you can do, though, is choose the right work, the most valuable work, the most useful work, the work with most leverage. Choose the right work and you can gets orders of magnitude improvement in your output.

Let’s see how.

The goal: increased output

Your output as a programmer is based both on your productivity and on how much time you work:

Output = Productivity × Time Worked

The first thing to notice is that there is a hard limit on how much increasing your working hours can help. After all, there are only 168 hours in a week.

If you never slept, ate, or did anything but work—and this will literally kill you—you can work 4.2× as much as a 40-hour workweek, and that’s it. And even with smaller increases in work hours, the gains quickly decline. As you work more hours you’ll become fatigued and make more mistakes; beyond a certain point those extra work hours will decrease your productivity, canceling out any gains.

What is output for a programmer?

Since increasing working hours isn’t really an option, the key to increasing your output is increasing your productivity. Productivity is the output you produce in each fixed unit of time, for example:

Productivity = Output per week

If you’re going to improve your productivity, you need to understand how to measure output.

The obvious measure is how much code you write: the more code, the better. This measure is obvious, popular, and completely wrong.

All other things being equal, is it better to implement the same feature with 10 lines of code, or 10,000 lines of code? If we measure output by code produced, the latter solution is better, but in most cases a 10 line solution is preferable to 10,000 lines. More code means higher maintenance costs, not to mention more opportunities for defects.

Your job as a programmer is not writing code, your job is solving problems: software is a tool, a means to an end. Software becomes valuable because of the problems it solves.

As a rough measure, your output as a programmer can be measured by the problems you solve: the more significant the problems you can solve, the better.

If you work for a business, significance eventually translates directly or indirectly into monetary terms: money made or money saved. In other areas you can come up with domain-specific concrete measures of usefulness: number of people served, carbon emissions reduced, number of scientists using your software, and so on.

Note: If the problems you solve produce negative value you will become anti-productive: the better you are at your job, the more damage you will cause.

If making money hurts people or the environment, your work may be productive for your employer but anti-productive for society as a whole. So make sure you’re carefully considering the ethical consequences of your actions as a worker.

How to increase productivity

Given the above, here’s how you can increase your productivity:

  1. Find the most significant problem you can work on.
  2. Come up with the most efficient solution to that problem.
  3. Implement the solution with minimum wasted time.

Let’s go through these steps one by one, and see why they’re key to productivity.

1. Find the most significant problem

Let’s consider our formula for productivity again:

Productivity = Significant problems solved / Week

There are many problems you could be working on, so first you have to choose one. If you could solve either of these problems, should you be working on:

  1. Implementing a particular missing feature; this will increase revenue by $50,000.
  2. Fixing a bug that was decreasing customer retention; this will increase revenue by $1,000,000.

All other things being equal, the second problem is obviously the one you should be focusing on. Even if it takes 10× as long to solve and implement that bug fix, it should still be the highest priority:

Productivity of #1 =    $50,000 /  1 Week  =  $50,000 / Week
Productivity of #2 = $1,000,000 / 10 Weeks = $100,000 / Week

Here’s the issue: in order to fix that expensive bug and improve customer retention, you need to know the problem exists. If no one ever notices that customers are leaving, if no one ever finds that bug, if no one realizes the connection between the two—then that problem will never be solved.

And that’s why finding problems is the first and most valuable step in increasing productivity.

2. Come up with an efficient solution

Once you’ve identified the most significant problem—or once your manager assigns you a problem they identified—you need to come up with a solution.

Which solution do you think is better?

  1. Takes 1000 lines of code and 4 weeks to implement.
  2. Takes 100 lines of code and 3 days to implement.

All other things being equal, the second solution is obviously better. But again, you need to find that solution.

If you only ever find that first solution, then no matter how efficiently you implement it, no matter how focused you are, no matter how much you manage to speed things up—you’re still implementing a much less efficient solution.

And that’s why identifying better solution is the second most valuable step in increasing productivity.

3. Implement the solution without wasting a time

Once you’ve identified a problem and chosen the solution, there is only so much leverage you have to improve productivity. You obviously want to avoid getting stuck and spinning your wheels, because wasted time reduces your productivity.

But given a particular solution, there’s only so much waste you can reduce, only so fast you can go:

Wasted time → $50,000 / 2 weeks = $25,000 / week
No waste    → $50,000 / 1 week  = $50,000 / week

Efficient implementation is the last and least valuable way of increasing productivity.

Technological skills aren’t enough

While you get the most increased productivity from identifying problems and the least from efficient implementation, your career as a programmer progresses in the opposite direction:

  1. Junior engineers implement solutions.
  2. Senior engineers find solutions and implement them.
  3. Principal or staff engineers identify problems, find solutions, and implement them.

So becoming more productive isn’t just about helping your employer’s bottom line, it’s also about learning the skills that will give you more pay and more influence.

Critically, technological skills are necessary but not sufficient to increase your productivity:

  • Your JavaScript skills don’t matter if you can never meet deadlines.
  • Your testing skills don’t matter if you can’t convince your manager of the value of testing.
  • Your software architecture skills don’t matter if no one has ever heard of your product.

Why these skills are “secret”

Most discussions of programming productivity tend to end up focusing purely on technology, coding, and design skills, and skip over these problem-solving skills. Of course, this isn’t a conspiracy of silence, no one is deliberately hiding the existence of the skills.

My guess is that experienced programmers still have to learn new technologies, so they’re more likely to realize the need to explain those particular skills. But having learned them once, they apply skills like timeboxing, or considering multiple different solutions to a problem, without even noticing they’re doing it. And so they end up talking about problem-solving skills rather less, and about technological skills rather more.

How do you learn these skills?

This article is an excerpt from my book, The Secret Skills of Productive Programmers, covering the non-technical skills you need to get better at identifying problems, solving problems, and implementing them on schedule.

Elsewhere on this site you’ll find many free articles on building up your skills.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you’d produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

April 20, 2020 04:00 AM

April 14, 2020

Moshe Zadka

Using Twisted to Massively Parallelize Web Clients

The Twisted Requests (treq) package is an HTTP client built on the popular Twisted library that is used for web requests. Async libraries offer the ability to do large amounts of network requests in parallel with relatively little CPU impact. This can be useful in HTTP clients that need to make several requests before they have all the information they need.

This post shows an example of a problem like this, and how to solve it using treq.

I enjoy playing the real-time strategy game Clash Royale. Clash Royale is a mobile strategy player-vs-player game where players play cards in an arena to win. Each card has different strengths and weaknesses, and different players prefer different cards. Clash Royale remembers which card a player plays the most; this is their "favorite" card. Players come together in clans where they can help each other. Supercell, Clash Royale's developer, released an HTTP-based API where different statistics can be queried.

How can we write a program that will output the most popular favorite cards in a clan?

If you want to follow along, you will need to register an account. If you register an account, create an API token via the Clash Royale developer portal. Then choose "Create New Key" under your profile, and enter a name, description, and a valid IP address. (An exact address is required.) Since you should never save an API key in your code, keep it as a separate file in ~/.crtoken:

$ ls ~/.crtoken
/home/moshez/.crtoken

To make it easier to see what is going on, let's start with this introductory program that prints Hello world, and then we'll talk through what it does:

import collections, json, os, sys, urllib.parse
from twisted.internet import task, defer
import treq

with open(os.path.expanduser("~/.crtoken")) as fpin:
    token = fpin.read().strip()

def main(reactor):
    print("Hello world")
    return defer.succeed(None)

task.react(main, sys.argv[1:])

This imports many more modules than we need for the "Hello world" example. We will need these modules for the final version of the program, which will accomplish the more complex task of asynchronously querying an API. After the import, the program reads the token from the file and stores it in the variable token. (We are not going to do anything with the token right now, but it's good to see that code.) Next there is a main function that accepts a Twisted reactor. A reactor is sort of like an interface to the machinery of the Twisted package. In this case the function main is sent as a parameter to task.react, and, which will call main with the reactor and any arguments we give -- the command-line arguments, in this case.

The main function returns a defer.succeed(None). This is how it returns a value of the right type: a deferred value, but one that already has been "fired" or "called." Because of that, the program will exit immediately after printing Hello world, as we need.

Next, we will look at the concepts of async functions and ensureDeferred:

async def get_clan_details(clan):
     print("Hello world", clan)

def main(reactor, clan):
    return defer.ensureDeferred(get_clan_details(clan))

task.react(main, sys.argv[1:])

In this program, which should start with the same imports, we moved all the logic to the async function get_clan_details. Just like a regular function, an async function has an implicit return None at the end. However, async functions, sometimes called co-routines, are a different type than Deferred. In order to let Twisted, which has existed since Python 1.5.2, use this modern feature, we must adapt the co-routine using ensureDeferred.

While we could write all the logic without using co-routines, using the async syntax will allow us to write code that is easier to understand, and we will need to move a lot less of the code into embedded callbacks.

The next concept to introduce is that of await. Later, we will await a network call, but for simplicity, right now, we will await on a timer. Twisted has a special function, task.deferLater, which will call a function with given parameters after some time has passed.

The following program will take five seconds to complete:

async def get_clan_details(clan, reactor):
     out = await task.deferLater(
         reactor,
         5,
         lambda clan: f"Hello world {clan}",
         clan
     )
     print(out)

def main(reactor, clan):
    return defer.ensureDeferred(get_clan_details(clan, reactor))

task.react(main, sys.argv[1:])

A note about types: task.deferLater returns a Deferred, as do most Twisted functions that do not have the value already available. When running the Twisted event loop, we can await on both Deferred values and co-routines.

The function task.deferLater will wait five seconds and then call our lambda, calculating the string to print out.

Now we have all the Twisted building blocks needed to write an efficient clan-analysis program!

Since we will be using the global reactor, we no longer need to accept the reactor as a parameter in the function that calculates these statistics. The way to use the token is as a "bearer" token in the headers:

async def get_clan_details(clan):
    headers={b'Authorization': b'Bearer '+token.encode('ascii')}

We want clan tags to be sent, which will be strings. Clan tags begin with #, so they must be quoted before they're put in URLs. This is because # has the special meaning "URL fragment":

async def get_clan_details(clan):
     # ...
     clan = urllib.parse.quote(clan)

The first step is to get the details of the clan, including the clan members:

async def get_clan_details(clan):
     # ...
     res = await treq.get("https://api.clashroyale.com/v1/clans/" + clan,
                          headers=headers)

Notice that we have to await the treq.get call. We have to be explicit about when to wait and get information since it is an asynchronous network call. Just using the await syntax to call a Deferred function does not let us take full power of asynchronicity (we will see how to do it later).

Next, after getting the headers, we need to get the content. The treq library gives us a helper method that parses the JSON directly:

async def get_clan_details(clan):
     # ...
     content = await res.json()

The content includes some metadata about the clan, which is not interesting for our current purposes, and a memberList field that contains the clan members. Note that while it has some data about the players, the current favorite card is not part of it. It does include the unique "player tag" that we can use to retrieve further data.

We collect all player tags, and, since they also begin with #, we URL-quote them:

async def get_clan_details(clan):
     # ...
     player_tags = [urllib.parse.quote(player['tag'])
                    for player in content['memberList']]

Finally, we come to the real power of treq and Twisted: generating all requests for player data at once! That can really speed up tasks like this one, which queries an API over and over again. In cases of APIs with rate-limiting, this can be problematic.

There are times when we need to be considerate to our API owners and not run up against any rate limits. There are techniques to support rate-limiting explicitly in Twisted, but they are beyond the scope of this post. (One important tool is defer.DeferredSemaphore.)

async def get_clan_details(clan):
     # ...
     requests = [treq.get("https://api.clashroyale.com/v1/players/" + tag,
                          headers=headers)
                 for tag in player_tags]

Remember that requests do not return the JSON body directly. Earlier, we used await so that we did not have to worry about exactly what the requests return. They actually return a Deferred. A Deferred can have an attached callback that will modify the Deferred. If the callback returns a Deferred, the final value of the Deferred will be the value of the returned Deferred.

So, to each deferred, we attach a callback that will retrieve the JSON of the body:

async def get_clan_details(clan):
     # ...
     for request in requests:
         request.addCallback(lambda result: result.json())

Attaching callbacks to Deferreds is a more manual technique, which makes code that is harder to follow but uses the async features more efficiently. Specifically, because we are attaching all the callbacks at the same time, we do not need to wait for the network calls, which potentially can take a long time, to indicate how to post-process the result.

From Deferreds to values

We cannot calculate the most popular favorite cards until all results have been gathered. We have a list of Deferreds, but what we want is a Deferred that gets a list value. This inversion is exactly what the Twisted function defer.gatherResults does:

async def get_clan_details(clan):
     # ...
     all_players = await defer.gatherResults(requests)

This seemingly innocent call is where we use the full power of Twisted. The defer.gatherResults function immediately returns a deferred that will fire only when all the constituent Deferreds have fired and will fire with the result. It even gives us free error-handling: if any of the Deferreds error out, it will immediately return a failed deferred, which will cause the await to raise an exception.

Now that we have all the players' details, we need to munch some data. We get to use one of Python's coolest built-ins, collections.Counter. This class takes a list of things and counts how many times it has seen each thing, which is exactly what we need for vote counting or popularity contests:

async def get_clan_details(clan):
     # ...
     favorite_card = collections.Counter([player["currentFavouriteCard"]["name"]
                                          for player in all_players])

Finally, we print it:

async def get_clan_details(clan):
     # ...
     print(json.dumps(favorite_card.most_common(), indent=4))

So, putting it all together, we have:

import collections, json, os, sys, urllib.parse
from twisted.internet import task, defer
import treq

with open(os.path.expanduser("~/.crtoken")) as fpin:
    token = fpin.read().strip()


async def get_clan_details(clan):
     headers = headers={b'Authorization': b'Bearer '+token.encode('ascii')}
     clan = urllib.parse.quote(clan)
     res = await treq.get("https://api.clashroyale.com/v1/clans/" + clan,
                          headers=headers)
     content = await res.json()
     player_tags = [urllib.parse.quote(player['tag'])
                    for player in content['memberList']]
     requests = [treq.get("https://api.clashroyale.com/v1/players/" + tag,
                          headers=headers)
                 for tag in player_tags]
     for request in requests:
         request.addCallback(lambda result: result.json())
     all_players = await defer.gatherResults(requests)
     favorite_card = collections.Counter([player["currentFavouriteCard"]["name"]
                                          for player in all_players])
     print(json.dumps(favorite_card.most_common(), indent=4))

def main(reactor, clan):
    return defer.ensureDeferred(get_clan_details(clan))

task.react(main, sys.argv[1:])

Thanks to the efficiency and expressive syntax of Twisted and treq, this is all the code we need to make asynchronous calls to an API. If you were wondering about the outcome, my clan's list of favorite cards is Wizard, Mega Knight, Valkyrie, and Royal Giant, in descending order.

(This post is based on the article I wrote for opensource.com)

by Moshe Zadka at April 14, 2020 03:00 AM

April 05, 2020

Moshe Zadka

Comfort with Small Mistakes

It has been a long time since I learned how to program, and it is easy to forget some of the hard-won lessons in the beginning. Easy until I try to teach people to program. There is a lot of accidental and inherent complexity in programming, but I am ready for that: I remember to explain how carefully to follow the syntax, and the kind of syntactical gotchas that are easy to fail.

But there is one metalesson that is much easier to forget, and much harder to learn, and to teach. Humans are used to small mistakes having reasonably small consequences. But even in cases with catatrophic consequences, the consequences look related to the mistakes.

However, in programming, small mistakes can lead not just to big consequences, but to weird consequences. A missing comma might mean that things work fine in the testing environment, but in production, every third request gets a slightly wrong result.

This really stumps people. They copy code somewhat imperfectly from the board, or make a small mistake when they edit it to change from "Hello world" to "Goodbye world", and suddenly, a completely unrelated part of the program starts going haywire.

This happens to old hands too. The number of times I have edited code and ran the tests, only to discover a clearly unrelated test failing, is not small. The difference that comes with experience is that I take a deep breath, think "I've got this", and start down the troubleshooting path.

The troubleshooting can include any number of things: I might go in with a debugger, add print statements, do a bisect-diff to figure out what caused the problem, try random things to see what happens, or just trace the execution path carefully.

The troubleshooting process does not matter as much as what comes before it: the deep breath. This is my time to accept the problem has happened, and that I am in for a process which can take two hours, and at the end of which my entire productivity will be "added missing semicolon". Sometimes a breath is not enough, and I need to get up and get some tea. But the most important, and almost invisible step, at the beginning is to step back, remember that this is, weirdly, part of the job, and to become comfortable doing it.

If you want to be any kind of software developer, accept it now. Much of your life will be seeing weird consequences, and tracing it back to a small mistake. Eventually, like everyone, you will succeed. Flush with victory, make a note of your success somewhere: anywhere where the overhead of writing it is low, be it an e-mail to yourself or a note-keeping app.

If you ever decide to teach, or write a blog, this is an unending source of content.

(Thanks to Veronica Hanus for her feedback on an early draft. All issues and mistakes that remain are my responsibility.)

by Moshe Zadka at April 05, 2020 05:20 AM

March 23, 2020

Twisted Matrix Laboratories

Twisted Drops Python 2.7 Support

With the open-source Python community at large dropping Python 2.7 support in their projects, Twisted has decided to do the same. Twisted 20.3.0, the most recently released version, is the final release to offer Python 2.7 support.

Despite the break, the compatibility policy still applies. This means that if your code works with Twisted 20.3 on Python 2.7 and 3.5+, that updating your Twisted on Python 3 up to a theoretical 21.3 would not require changes that would make Python 2.7 + Twisted 20.3 stop working, despite a theoretical Twisted 21.3 not supporting 2.7. (This is, of course, in an ideal situation -- regressions and changes that are excepted from the policy such as security fixes do occur. Testing your applications on Twisted prereleases can help catch places where this happens, so, please do!)

- Amber (HawkOwl)

by Amber Brown (HawkOwl) (noreply@blogger.com) at March 23, 2020 07:42 PM

Twisted 20.3.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 20.3! The highlights of this release are:
  • curve25519-sha256 key exchange algorithm support in Conch.
  • "openssh-key-v1" key format support in Conch.
  • Security fixes to twisted.web, including preventing request smuggling attacks and rejecting malformed headers. CVE-2020-10108 and CVE-2020-10109 were assigned for these issues, see the NEWS file for full details.
  • twist dns --secondary now works on Python 3.
  • The deprecation of twisted.news.
  • ...and various other fixes, with 28 tickets closed in total. 
This is the final Twisted release to support Python 2.7.

You can find the downloads at <https://pypi.org/project/Twisted> (or alternatively <https://twistedmatrix.com/trac/wiki/Downloads>). The NEWS file is also available at <https://github.com/twisted/twisted/blob/twisted-20.3.0/NEWS.rst>.
Many thanks to everyone who had a part in this release — the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!
- hawkowl


by Unknown (noreply@blogger.com) at March 23, 2020 07:28 PM

Hynek Schlawack

Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

by Hynek Schlawack (hs@ox.cx) at March 23, 2020 12:00 AM

March 13, 2020

Moshe Zadka

Or else:

This was originally sent to my newsletter. I send one e-mail, always about Python, every other Sunday. If this blog post interests you, consider subscribing.

The underappreciated else keyword in Python has three distinct uses.

if/else

On an if statement, else will contain code that runs if the condition is false.

if anonymize:
    print("Hello world")
else:
    print("Hello, name")

This is probably the least surprising use.

loop/else

The easiest to explain is while/else: it works the same as if/else, and runs when the condition is false.

However, it does not run if the loop was broken out of using break or an exception: it serves as something that runs on normal loop termination.

for/else functions in the same way: it runs on normal loop termination, and not if the loop was broken out of using a break.

For example, searching for an odd element in a list:

for x in numbers:
    if x % 2 == 1:
        print("Found", x)
        break
else:
    print("No odd found")

This is a powerful way to avoid sentinel values.

try/except/else

When writing code that might raise an exception, we want to be able to catch it -- but we want to avoid catching unanticipated exceptions. This means we want to protect as little code with try as possible, but still have some code that runs only in the normal path.

try:
    before, after = things
except ValueError:
    part1 = things[0]
    part2 = 0
    after = 0
else:
    part1, part2 = before

This means that if things does not have two items, this is a valid case we can recover from. However, if it does have two items, the first one must also have two items. If this is not the case, this snippet will raise ValueError.

by Moshe Zadka at March 13, 2020 02:00 AM

February 23, 2020

Hynek Schlawack

Python in Production

I’m missing a key part from the public Python discourse and I would like to help to change that.

by Hynek Schlawack (hs@ox.cx) at February 23, 2020 04:45 PM

Python Packaging Metadata

Since this topic keeps coming up, I’d like to briefly share my thoughts on Python package metadata because it’s – as always – more complex than it seems.

by Hynek Schlawack (hs@ox.cx) at February 23, 2020 12:00 AM

February 20, 2020

Moshe Zadka

Forks and Threats

What is a threat? From a game-theoretical perspective, a threat is an attempt to get a better result by saying: "if you do not give me this result, I will do something that is bad for both of us". Note that it has to be bad for both sides: if it is good for the threatening side, they would do it anyway. While if it is good for the threatened side, it is not a threat.

Threats rely on credibility and reputation: the threatening side has to be believed for the threat to be useful. One way to gain that reputation is to follow up on threats, and have that be a matter of public record. This means that the threatening side needs to take into account that they might have to act on the threat, thereby doing something against their own interests. This leads to the concept of a "credible" or "proportionate" threat.

For most of our analysis, we will use the example of a teacher union striking. Similar analysis can be applied to nuclear war, or other cases. People mostly have positive feelings for teachers, and when teacher unions negotiate, they want to take advantage of those feelings. However, the one thing that leads people to be annoyed with teachers is a strike: this causes large amounts of unplanned scheduling crisis in people's lives.

In our example, a teacher union striking over, say, a minor salary raise disagreement is not credible: the potential harm is small, while the strike will significantly harm the teachers' image.

However, strikes are, to a first approximation, the only tool teacher unions have in their arsenal. Again, take the case of a minor salary raise. Threatening with a strike is so disproportional that there is no credibility. We turn to one of the fundamental insights of game theory: rational actors treat utility as linear in probability. So, while starting a strike that is twice as long is not twice as bad, increasing the probability of starting a strike from 0 to 1 is twice as bad (exactly!) as increasing the probability from 0 to 0.5.

(If you are a Bayesian who does not believe in 0 and 1 as probabilities, note that the argument works with approximations too: increasing the probability from a small e to 0.5 is approximately twice as bad as increasing it from e to 1-e.)

All one side has is a strike. Assume the disutility of a strike to that side is -1,000,000. Assume the utility of winning the salary negotiation is 1. They can threaten that if their position is not accepted, they will generate a random number, and if it is below 1/1,000,000, they will start the strike. Now the threat is credible. But to be gain that reputation, this number has to be generated in public, in an uncertain way: otherwise, no reputation is gained for following up on threats.

In practice, usually the randomness is generated by "inflaming the base". The person in charge will give impassioned speeches on how important this negotiation is. With some probability, their base will pressure them to start the strike, without them being able to resist it.

Specifically, note that often a strike is determined by a direct vote of the members, not the union leaders. This means that union leaders can credibly say, "please do not vote for the strike, we are against it". With some probability, that depends on how much they inflamed the base, the membership will ignore the request. The more impassioned the speech, the higher the probability. By limiting their direct control over the decision to strike, union leaders gain the ability to threaten probabilistically.

Nuclear war and union strikes are both well-studied topics in applied game theory. The explanation above is a standard part of many text books: in my case, I summarized the explanation from Games of Strategy, pg. 487.

What is not well studied are the dynamics of open source projects. There, we have a set of owners who can directly influence such decisions as which patches land, and when versions are released. More people will offer patches, or ask for a release to happen. The only credible threat they have is to fork the project if they do not like how it is managed. But forking is often a disproportinate threat: a patch not landing often just means an ugly work-around in user code. There is a cost, but the cost of maintaining a fork is much greater.

But similar to a union strike, or launching a nuclear war, we can consider a "probabilistic fork". Rant on twitter, or appropriate mailing lists. Link to the discussion, especially to places which make the owners not in the best light. Someone might decide to "rage-fork". More rants, or more extreme rants, increase the probability. A fork has to be possible in the first place: this is why the best way to evaluate whether something is open source is to consider "how possible is a fork".

This is why the possibility of a fork changes the dynamics of a project, even if forks are rare: because the main thing that happens are "low-probability maybe-forks".

by Moshe Zadka at February 20, 2020 04:00 AM

February 17, 2020

Glyph Lefkowitz

Modularity for Maintenance

Never send a human to do a machine’s job.

One of the best things about maintaining open source in the modern era is that there are so many wonderful, free tools to let machines take care of the busy-work associated with collaboration, code-hosting, continuous integration, code quality maintenance, and so on.

There are lots of great resources that explain how to automate various things that make maintenance easier.

Here are some things you can configure your Python project to do:

  1. Continuous integration, using any one of a number of providers:
    1. GitHub Actions
    2. CircleCI
    3. Azure Pipelines
    4. Appveyor
    5. GitLab CI&CD
    6. Travis CI
  2. Separate multiple test jobs with tox
  3. Lint your code with flake8
  4. Type-Check your code with Mypy
  5. Auto-update your dependencies, with one of:
    1. pyup.io
    2. requires.io, or
    3. Dependabot
  6. automatically find common security issues with Bandit
  7. check the status of your code coverage, with:
    1. Coveralls, or
    2. Codecov
  8. Auto-format your code with:
    1. Black for style
    2. autopep8 to fix common errors
    3. isort to keep your imports tidy
  9. Help your developers remember to do all of those steps with pre-commit
  10. Automatically release your code to PyPI via your CI provider
    1. including automatically building any C code for multiple platforms as a wheel so your users won’t have to
    2. and checking those build artifacts:
      1. to make sure they include all the files they should, with check-manifest
      2. and also that the binary artifacts have the correct dependencies for Linux
      3. and also for macOS
  11. Organize your release notes and versioning with towncrier

All of these tools are wonderful.

But... let’s say you1 maintain a few dozen Python projects. Being a good maintainer, you’ve started splitting up your big monolithic packages into smaller ones, so your utility modules can be commonly shared as widely as possible rather than re-implemented once for each big frameworks. This is great!

However, every one of those numbered list items above is now a task per project that you have to repeat from scratch. So imagine a matrix with all of those down one side and dozens of projects across the top - the full Cartesian product of these little administrative tasks is a tedious and exhausting pile of work.

If you’re lucky enough to start every project close to perfect already, you can skip some of this work, but that partially just front-loads the tedium; plus, projects tend to start quite simple, then gradually escalate in complexity, so it’s helpful to be able to apply these incremental improvements one at a time, as your project gets bigger.

I really wish there were a tool that could take each of these steps and turn them into a quick command-line operation; like, I type pyautomate pypi-upload and the tool notices which CI provider I use, whether I use tox or not, and adds the appropriate configuration entries to both my CI and tox configuration to allow me to do that, possibly prompting me for a secret. Same for pyautomate code-coverage or what have you. All of these automations are fairly straightforward; almost all of the files you need to edit are easily parse-able either as yaml, toml, or ConfigParser2 files.

A few years ago, I asked for this to be added to CookieCutter, but I think the task is just too big and complicated to reasonably expect the existing maintainers to ever get around to it.

If you have a bunch of spare time, and really wanted to turbo-charge the Python open source community, eliminating tons of drag on already-over-committed maintainers, such a tool would be amazing.


  1. and by you, obviously, I mean “I” 

  2. “INI-like files”, I guess? what is this format even called? 

by Glyph at February 17, 2020 12:09 AM

January 07, 2020

Hynek Schlawack

Better Python Object Serialization

The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects – for example to JSON in web APIs and structured logs.

by Hynek Schlawack (hs@ox.cx) at January 07, 2020 12:00 AM

December 31, 2019

Moshe Zadka

Meditations on the Zen of Python

(This is based on the series published in opensource.com as 9 articles: 1, 2, 3, 4, 5, 6, 7, 8, 9)

Python contributor Tim Peters introduced us to the Zen of Python in 1999. Twenty years later, its 19 guiding principles continue to be relevant within the community.

The Zen of Python is not "the rules of Python" or "guidelines of Python". It is full of contradiction and allusion. It is not intended to be followed: it is intended to be meditated upon.

In this spirit, I offer this series of meditations on the Zen of Python.

Beautiful is better than ugly.

It was in Structure and Interpretation of Computer Programs (SICP) that the point was made: "Programs must be written for people to read and only incidentally for machines to execute." Machines do not care about beauty, but people do.

A beautiful program is one that is enjoyable to read. This means first that it is consistent. Tools like Black, flake8, and Pylint are great for making sure things are reasonable on a surface layer.

But even more important, only humans can judge what humans find beautiful. Code reviews and a collaborative approach to writing code are the only realistic way to build beautiful code. Listening to other people is an important skill in software development.

Finally, all the tools and processes are moot if the will is not there. Without an appreciation for the importance of beauty, there will never be an emphasis on writing beautiful code.

This is why this is the first principle: it is a way of making "beauty" a value in the Python community. It immediately answers: "Do we really care about beauty?" We do.

Explicit is better than implicit.

We humans celebrate light and fear the dark. Light helps us make sense of vague images. In the same way, programming with more explicitness helps us make sense of abstract ideas. It is often tempting to make things implicit.

"Why is self explicitly there as the first parameter of methods?"

There are many technical explanations, but all of them are wrong. It is almost a Python programmer's rite of passage to write a metaclass that makes explicitly listing self unnecessary. (If you have never done this before, do so; it makes a great metaclass learning exercise!)

The reason self is explicit is not because the Python core developers did not want to make a metaclass like that the "default" metaclass. The reason it is explicit is because there is one less special case to teach: the first argument is explicit.

Even when Python does allow non-explicit things, such as context variables, we must always ask: Are we sure we need them? Could we not just pass arguments explicitly? Sometimes, for many reasons, this is not feasible. But prioritizing explicitness means, at least, asking the question and estimating the effort.

Simple is better than complex.

When it is possible to choose at all, choose the simple solution. Python is rarely in the business of disallowing things. This means it is possible, and even straightforward, to design baroque programs to solve straightforward problems.

It is worthwhile to remember at each point that simplicity is one of the easiest things to lose and the hardest to regain when writing code.

This can mean choosing to write something as a function, rather than introducing an extraneous class. This can mean avoiding a robust third-party library in favor of writing a two-line function that is perfect for the immediate use-case. Most often, it means avoiding predicting the future in favor of solving the problem at hand.

It is much easier to change the program later, especially if simplicity and beauty were among its guiding principles, than to load the code down with all possible future variations.

Complex is better than complicated.

This is possibly the most misunderstood principle because understanding the precise meanings of the words is crucial. Something is complex when it is composed of multiple parts. Something is complicated when it has a lot of different, often hard to predict, behaviors.

When solving a hard problem, it is often the case that no simple solution will do. In that case, the most Pythonic strategy is to go "bottom-up." Build simple tools and combine them to solve the problem.

This is where techniques like object composition shine. Instead of having a complicated inheritance hierarchy, have objects that forward some method calls to a separate object. Each of those can be tested and developed separately and then finally put together.

Another example of "building up" is using singledispatch, so that instead of one complicated object, we have a simple, mostly behavior-less object and separate behaviors.

Flat is better than nested.

Nowhere is the pressure to be "flat" more obvious than in Python's strong insistence on indentation. Other languages will often introduce an implementation that "cheats" on the nested structure by reducing indentation requirements. To appreciate this point, let's take a look at JavaScript.

JavaScript is natively async, which means that programmers write code in JavaScript using a lot of callbacks.

a(function(resultsFromA) {
  b(resultsFromA, function(resultsfromB) {
    c(resultsFromC, function(resultsFromC) {
      console.log(resultsFromC)
   }
  }
}

Ignoring the code, observe the pattern and the way indentation leads to a right-most point. This distinctive "arrow" shape is tough on the eye to quickly walk through the code, so it's seen as undesirable and even nicknamed "callback hell." However, in JavaScript, it is possible to "cheat" and not have indentation reflect nesting.

a(function(resultsFromA) {
b(resultsFromA,
  function(resultsfromB) {
c(resultsFromC,
  function(resultsFromC) {
    console.log(resultsFromC)
}}}

Python affords no such options to cheat: every nesting level in the program must be reflected in the indentation level. So deep nesting in Python looks deeply nested. That means "callback hell" was a worse problem in Python than in JavaScript: nesting callbacks mean indenting with no options to "cheat" with braces.

This challenge, in combination with the Zen principle, has led to an elegant solution by a library I worked on. In the Twisted framework, we came up with the deferred abstraction, which would later inspire the popular JavaScript promise abstraction. In this way, Python's unwavering commitment to clear code forces Python developers to discover new, powerful abstractions.

future_value = future_result()
future_value.addCallback(a)
future_value.addCallback(b)
future_value.addCallback(c)

(This might look familiar to modern JavaScript programmers: Promises were heavily influenced by Twisted's deferreds.)

Sparse is better than dense.

The easiest way to make something less dense is to introduce nesting. This habit is why the principle of sparseness follows the previous one: after we have reduced nesting as much as possible, we are often left with dense code or data structures. Density, in this sense, is jamming too much information into a small amount of code, making it difficult to decipher when something goes wrong.

Reducing that denseness requires creative thinking, and there are no simple solutions. The Zen of Python does not offer simple solutions. All it offers are ways to find what can be improved in the code, without always giving guidance for "how."

Take a walk. Take a shower. Smell the flowers. Sit in a lotus position and think hard, until finally, inspiration strikes. When you are finally enlightened, it is time to write the code.

Readability counts.

In some sense, this middle principle is indeed the center of the entire Zen of Python. The Zen is not about writing efficient programs. It is not even about writing robust programs, for the most part. It is about writing programs that other people can read.

Reading code, by its nature, happens after the code has been added to the system. Often, it happens long after. Neglecting readability is the easiest choice since it does not hurt right now. Whatever the reason for adding new code -- a painful bug or a highly requested feature -- it does hurt. Right now.

In the face of immense pressure to throw readability to the side and just "solve the problem," the Zen of Python reminds us: readability counts. Writing the code so it can be read is a form of compassion for yourself and others.

Special cases aren't special enough to break the rules.

There is always an excuse. This bug is particularly painful; let's not worry about simplicity. This feature is particularly urgent; let's not worry about beauty. The domain rules covering this case are particularly hairy; let's not worry about nesting levels.

Once we allow special pleading, the dam wall breaks, and there are no more principles; things devolve into a Mad Max dystopia with every programmer for themselves, trying to find the best excuses.

Discipline requires commitment. It is only when things are hard, when there is a strong temptation, that a software developer is tested. There is always a valid excuse to break the rules, and that's why the rules must be kept the rules. Discipline is the art of saying no to exceptions. No amount of explanation can change that.

Although, practicality beats purity.

"If you think only of hitting, springing, striking, or touching the enemy, you will not be able actually to cut him.", Miyamoto Musashi, The Book of Water

Ultimately, software development is a practical discipline. Its goal is to solve real problems, faced by real people. Practicality beats purity: above all else, we must solve the problem. If we think only about readability, simplicity, or beauty, we will not be able to actually solve the problem.

As Musashi suggested, the primary goal of every code change should be to solve a problem. The problem must be foremost in our minds. If we waver from it and think only of the Zen of Python, we have failed the Zen of Python. This is another one of those contradictions inherent in the Zen of Python.

Errors should never pass silently...

Before the Zen of Python was a twinkle in Tim Peters' eye, before Wikipedia became informally known as "wiki," the first WikiWiki site, C2, existed as a trove of programming guidelines. These are principles that mostly came out of a Smalltalk programming community. Smalltalk's ideas influenced many object-oriented languages, Python included.

The C2 wiki defines the Samurai Principle: "return victorious, or not at all." In Pythonic terms, it encourages eschewing sentinel values, such as returning None or -1 to indicate an inability to complete the task, in favor of raising exceptions. A None is silent: it looks like a value and can be put in a variable and passed around. Sometimes, it is even a valid return value.

The principle here is that if a function cannot accomplish its contract, it should "fail loudly": raise an exception. The raised exception will never look like a possible value. It will skip past the returned_value = call_to_function(parameter) line and go up the stack, potentially crashing the program.

A crash is straightforward to debug: there is a stack trace indicating the problem as well as the call stack. The failure might mean that a necessary condition for the program was not met, and human intervention is needed. It might mean that the program's logic is faulty. In either case, the loud failure is better than a hidden, "missing" value, infecting the program's valid data with None, until it is used somewhere and an error message says "None does not have method split," which you probably already knew.

Unless explicitly silenced.

Exceptions sometimes need to be explicitly caught. We might anticipate some of the lines in a file are misformatted and want to handle those in a special way, maybe by putting them in a "lines to be looked at by a human" file, instead of crashing the entire program.

Python allows us to catch exceptions with except. This means errors can be explicitly silenced. This explicitness means that the except line is visible in code reviews. It makes sense to question why this is the right place to silence, and potentially recover from, the exception. It makes sense to ask if we are catching too many exceptions or too few.

Because this is all explicit, it is possible for someone to read the code and understand which exceptional conditions are recoverable.

In the face of ambiguity, refuse the temptation to guess.

What should the result of 1 + "1" be? Both "11" and 2 would be valid guesses. This expression is ambiguous: there is no single thing it can do that would not be a surprise to at least some people.

Some languages choose to guess. In JavaScript, the result is "11". In Perl, the result is 2. In C, naturally, the result is the empty string. In the face of ambiguity, JavaScript, Perl, and C all guess.

In Python, this raises a TypeError: an error that is not silent. It is atypical to catch TypeError: it will usually terminate the program or at least the current task (for example, in most web frameworks, it will terminate the handling of the current request).

Python refuses to guess what 1 + "1" means. The programmer is forced to write code with clear intention: either 1 + int("1"), which would be 2 or str(1) + "1", which would be "11"; or "1"[1:], which would be an empty string. By refusing to guess, Python makes programs more predictable.

There should be one -- and preferably only one -- obvious way to do it.

Prediction also goes the other way. Given a task, can you predict the code that will be written to achieve it? It is impossible, of course, to predict perfectly. Programming, after all, is a creative task.

However, there is no reason to intentionally provide multiple, redundant ways to achieve the same thing. There is a sense in which some solutions are "better" or "more Pythonic."

Part of the appreciation for the Pythonic aesthetic is that it is OK to have healthy debates about which solution is better. It is even OK to disagree and keep programming. It is even OK to agree to disagree for the sake of harmony. But beneath it all, there has to be a feeling that, eventually, the right solution will come to light. There must be the hope that eventually we can live in true harmony by agreeing on the best way to achieve a goal.

Although that way may not be obvious at first (unless you're Dutch).

This is an important caveat: It is often not obvious, at first, what is the best way to achieve a task. Ideas are evolving. Python is evolving. The best way to read a file block-by-block is, probably, to wait until Python 3.8 and use the walrus operator.

This common task, reading a file block-by-block, did not have a "single best way to do it" for almost 30 years of Python's existence.

When I started using Python in 1998 with Python 1.5.2, there was no single best way to read a file line-by-line. For many years, the best way to know if a dictionary had a key was to use .haskey until the in operator became the best way.

It is only by appreciating that sometimes, finding the one (and only one) way of achieving a goal can take 30 years of trying out alternatives that Python can keep aiming to find those ways. This view of history, where 30 years is an acceptable time for something to take, often feels foreign to people in the United States, when the country has existed for just over 200 years.

The Dutch, whether it's Python creator Guido van Rossum or famous computer scientist Edsger W. Dijkstra, have a different worldview according to this part of the Zen of Python. A certain European appreciation for time is essential.

Now is better than never.

There is always the temptation to delay things until they are perfect. They will never be perfect, though. When they look "ready" enough, that is when it is time to take the plunge and put them out there. Ultimately, a change always happens at some now: the only thing that delaying does is move it to a future person's "now."

Although never is often better than right now.

This, however, does not mean things should be rushed. Decide the criteria for release in terms of testing, documentation, user feedback, and so on. "Right now," as in before the change is ready, is not a good time.

This is a good lesson not just for popular languages like Python, but also for your personal little open source project.

If the implementation is hard to explain, it's a bad idea.

The most important thing about programming languages is predictability. Sometimes we explain the semantics of a certain construct in terms of abstract programming models, which do not correspond exactly to the implementation. However, the best of all explanations just explains the implementation.

If the implementation is hard to explain, it means the avenue is impossible.

If the implementation is easy to explain, it may be a good idea.

Just because something is easy does not mean it is worthwhile. However, once it is explained, it is much easier to judge whether it is a good idea.

This is why the second half of this principle intentionally equivocates: nothing is certain to be a good idea, but it always allows people to have that discussion.

Namespaces in Python

Python uses namespaces for everything. Though simple, they are sparse data structures -- which is often the best way to achieve a goal.

Modules are namespaces. This means that correctly predicting module semantics often just requires familiarity with how Python namespaces work. Classes are namespaces. Objects are namespaces. Functions have access to their local namespace, their parent namespace, and the global namespace.

The simple model, where the . operator accesses an object, which in turn will usually, but not always, do some sort of dictionary lookup, makes Python hard to optimize, but easy to explain.

Indeed, some third-party modules take this guideline and run with it. For example, the variants package turns functions into namespaces of "related functionality." It is a good example of how the Zen of Python can inspire new abstractions.

by Moshe Zadka at December 31, 2019 06:30 AM

December 18, 2019

Moshe Zadka

Precise Unit Tests with PyHamcrest

(This is based on my article on opensource.com)

Unit test suites help maintain high-quality products by signaling problems early in the development process. An effective unit test catches bugs before the code has left the developer machine, or at least in a continuous integration environment on a dedicated branch. This marks the difference between good and bad unit tests: good tests increase developer productivity by catching bugs early and making testing faster. Bad tests decrease developer productivity.

Productivity decreases when testing incidental features. The test fails when the code changes, even if it is still correct. This happens because the output is different, but in a way that is not part of the function's contract.

A good unit test, therefore, is one that helps enforce the contract to which the function is committed.

If a good unit test breaks, the contract is violated and should be either explicitly amended (by changing the documentation and tests), or fixed (by fixing the code and leaving the tests as is).

A good unit test is also strict. It does its best to ensure the output is valid. This helps it catch more bugs.

While limiting tests to enforce only the public contract is a complicated skill to learn, there are tools that can help.

One of these tools is Hamcrest, a framework for writing assertions. Originally invented for Java-based unit tests, today the Hamcrest framework supports several languages, including Python.

Hamcrest is designed to make test assertions easier to write and more precise.

def add(a, b):
    return a + b

from hamcrest import assert_that, equal_to

def test_add():
    assert_that(add(2, 2), equal_to(4))

This is a simple assertion, for simple functionality. What if we wanted to assert something more complicated?

def test_set_removal():
    my_set = {1, 2, 3, 4}
    my_set.remove(3)
    assert_that(my_set, contains_inanyorder([1, 2, 4]))
    assert_that(my_set, is_not(has_item(3)))

Note that we can succinctly assert that the result has 1, 2, and 4 in any order since sets do not guarantee order.

We also easily negate assertions with is_not. This helps us write precise assertions, which allow us to limit ourselves to enforcing public contracts of functions.

Sometimes, however, none of the built-in functionality is precisely what we need. In those cases, Hamcrest allows us to write our own matchers.

Imagine the following function:

def scale_one(a, b):
    scale = random.randint(0, 5)
    pick = random.choice([a,b])
    return scale * pick

We can confidently assert that the result divides into at least one of the inputs evenly.

A matcher inherits from hamcrest.core.base_matcher.BaseMatcher, and overrides two methods:

class DivisibleBy(hamcrest.core.base_matcher.BaseMatcher):

    def __init__(self, factor):
        self.factor = factor

    def _matches(self, item):
        return (item % self.factor) == 0

    def describe_to(self, description):
        description.append_text('number divisible by')
        description.append_text(repr(self.factor))

Writing high-quality describe_to methods is important, since this is part of the message that will show up if the test fails.

def divisible_by(num):
    return DivisibleBy(num)

By convention, we wrap matchers in a function. Sometimes this gives us a chance to further process the inputs, but in this case, no further processing is needed.

def test_scale():
    result = scale_one(3, 7)
    assert_that(result,
                any_of(divisible_by(3),
                       divisible_by(7)))

Note that we combined our divisible_by matcher with the built-in any_of matcher to ensure that we test only what the contract commits to.

While editing the article, I heard a rumor that the name "Hamcrest" was chosen as an anagram for "matches". Hrm...

>>> assert_that("matches", contains_inanyorder(*"hamcrest")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moshez/src/devops-python/build/devops/lib/python3.6/site-packages/hamcrest/core/assert_that.py", line 43, in assert_that
    _assert_match(actual=arg1, matcher=arg2, reason=arg3)
  File "/home/moshez/src/devops-python/build/devops/lib/python3.6/site-packages/hamcrest/core/assert_that.py", line 57, in _assert_match
    raise AssertionError(description)
AssertionError:
Expected: a sequence over ['h', 'a', 'm', 'c', 'r', 'e', 's', 't'] in any order
      but: no item matches: 'r' in ['m', 'a', 't', 'c', 'h', 'e', 's']

Researching more, I found the source of the rumor: it is an anagram for "matchers".

>>> assert_that("matchers", contains_inanyorder(*"hamcrest"))
>>>

If you are not yet writing unit tests for your Python code, now is a good time to start. If you are writing unit tests for your Python code, using Hamcrest will allow you to make your assertion precise—neither more nor less than what you intend to test. This will lead to fewer false negatives when modifying code and less time spent modifying tests for working code.

by Moshe Zadka at December 18, 2019 05:00 AM

November 27, 2019

Itamar Turner-Trauring

Job negotiation for programmers: the basic principles

You need to negotiate at a new job: for your salary, or benefits, or my personal favorite, a shorter workweek. You’re not sure what to do, or how to approach it, or what to say when the company says “how much do you want?” or “here’s our offer—what do you say?”

Here’s the thing: that final conversation about salary might be the most nerve-wracking part, but the negotiation process starts much much earlier. Which means you can enter that final conversation having positioned yourself for success—and feeling less stressed about it too.

The way you can do that is following certain basic principles, which I’ll be covering in this article. I’m going to be focusing on salary negotiation as an example, but the same principles will apply when negotiating for a shorter workweek.

In particular, I’ll be talking about:

  1. An example from early in my career when I negotiated very very badly.
  2. The right way to negotiate, based on four principles:
    1. Employment is a negotiated relationship.
    2. Knowledge is power.
    3. Negotiate from a position of strength.
    4. Use the right tactics.

The wrong way to negotiate

Before moving on to the principles of negotiation, let me share a story of how I negotiated badly.

During my first real job search I interviewed at a company in New York City that was building a financial trading platform. They were pretty excited about some specific technologies I’d learned while working on Twisted, an open source networking framework. They offered me a job, I accepted, and my job search was over.

But then they sent me their intellectual property agreement, and I actually read legal documents; you should read them too. The agreement would have given the company ownership over any open source work I did, including work on Twisted. I wanted to ensure I could keep doing open source development, especially given that was their reason for hiring me in the first place. I asked for an exemption covering Twisted, they wouldn’t agree, and so we went back and forth trying to reach an agreement.

Eventually they came back with a new offer: in return for not working on Twisted I’d get a 20% salary increase over their initial offer. I thought about it briefly, then said no and walked away from the job. Since I had neither a CS degree—I’d dropped out—nor much of an employment history, open source contribution was important to my career. It was how I’d gotten contracting work, and it was the reason they’d offered me this job. And I enjoyed doing it, too, so I wasn’t willing to give it up.

I posted about this experience online, and an employee of ITA Software, which was based in the Boston area, suggested they were happy to support contributions to open source projects. It seemed worth a try, so I applied for the position. And when eventually I got a job offer from ITA and they asked me for my salary requirements, I asked for the second offer I’d gotten, the one that was 20% higher than my original offer. They accepted, and I’ve lived in the Boston area ever since.

As we go through the principles below, I’ll come back to this story and point out how they were (mis)applied in my two negotiations.

The four principles of negotiation

You can think of the negotiation process as building on four principles:

  1. Employment is a negotiated relationship.
  2. Knowledge is power.
  3. Negotiate from a position of strength.
  4. Use the right tactics.

Let’s go through them one by one.

Principle #1: Employment is a negotiated relationship

If you’re an employee, your employment relationship was negotiated. When you got a job offer and accepted it, that was a negotiation, even if you didn’t push back at all. Your choice isn’t between negotiating and not negotiating: it’s between negotiating badly, or negotiating well.

Negotiate actively

If you don’t actively try to negotiate, if you don’t ask for what you want, if you don’t ask for what you’re worth—you’re unlikely to get it. Salaries, for example, are a place where your interests and your employer’s are very much at odds. All things being equal, if you’re doing the exact same work and have the same likelihood of leaving, would your employer prefer to pay you less or more? Most employers will pay you less if they can, and I almost had to learn that the hard way.

Applying the principle: In my story above, I never proactively negotiated. Instead, I accepted a job offer from the financial company without any sort of additional demands. If they were happy to offer me a 20% raise just to quit open source, I probably could have gotten an even higher salary if I’d just asked in the first place.

Negotiation starts early, and never ends

Not only do you need to negotiate actively, you also need to realize that negotiation starts much earlier than you think, and ends only when you leave to a different job:

  • The minute you start thinking about applying to a company, you’ve started the negotiation process; as you’ll see, you’ll want to do research before you even talk to them.
  • Your interview is part of your negotiation, and you can in fact negotiate the interview process itself (e.g. suggest sharing a code sample instead of doing a whiteboard puzzle).
  • As an employee you will continue to negotiate: if you always say “yes” when your boss asks you to work long hours, your contract for a 3-day weekend will mean nothing.

In short, your whole relationship as an employee is based on negotiation.

Distinguish between friend and foe

A negotiation involves two sides: yours, and the company’s. When you’re negotiating it’s important to remember that anyone who works for the company is on the company’s side. Not yours.

I once had to negotiate the intellectual property agreement at a new job. My new employer was based in the UK, and it had a US subsidiary organized by a specialist company. These subsidiary specialists had provided the contract I was signing.

When I explained the changes I wanted to make, the manager at the subsidiary specialist told me that my complaint had no merit, because the contract had been written by the “best lawyers in Silicon Valley.” But the contract had been written by lawyers working for the company, not for me. If his claim had been true (spoiler: they were not in fact the best lawyers in Silicon Valley), that would have just made my argument stronger. The better the company’s lawyers, the more carefully I ought to have read the contract, and the more I ought to have pushed back.

The contracts the company wants you to sign? They were written by lawyers working for the company.

Human Resources works for the company, as does the in-house recruiter. However friendly they may seem, they are not working for you. And third-party recruiters are paid by the company, not you. It’s true that sometimes their commission is tied to your salary, which means they would rather you get paid more. But since they get paid only once per candidate, volume is more important than individual transactions: it’s in their best interest to get you hired as quickly as possible so they can move on to placing the next candidate.

Since all these people aren’t working for you, during a negotiation they’re working against you.

The only potential exception to this rule are friends who also work for the company, and aren’t directly involved in the negotiation process: even if they are constrained in some ways, they’re probably still on your side. They can serve as a backchannel for feedback and other information that the company can’t or won’t share.

Principle #2: Knowledge is power

The more you know about the situation, the better you’ll do as a negotiator. More knowledge gives more power: to you, but also to the company.

Know what you want

The first thing you need to do when negotiating is understand what you want.

  • What is your ideal outcome?
  • What can you compromise on, and what can’t you compromise on?
  • What is the worst outcome you’re willing to accept?

Do your research

You also want to understand where the other side is coming from:

  • What is the company’s goal, and the negotiator’s goal? For example, if you discover their goal is minimizing hassle, you might be able to get what you want by making the process a little smoother.
  • What resources are available to them? An unfunded startup has different resources than a large company, for example.
  • Has the company done something similar in the past, or will your request be unprecedented? For example, what hours do other employees in similar positions work? How much are other employees paid?
  • What do other companies in the area or industry provide?
  • How is this particular business segment doing: are they losing money, or doing great?

The more you understand going in, the better you’ll do, and that means doing your research before negotiation starts.

Applying the principle: In my story above I never did any research about salaries, either in NY or in Boston. As a result, I had no idea I was being offered a salary far below market rates.

As a comparison, here’s a real example of how research can help your negotiation, from an engineer named Adam:

Adam: “Being informed on salaries really helped my negotiating position. When my latest employer made me an offer I asked them why it was lower than their average salary on Glassdoor.com. The real reason was likely ‘we offer as little as possible to get you on board.’ They couldn’t come up with a convincing reason and so the salary was boosted 10%.”

Glassdoor is a site that allows employees to anonymously share salaries and job reviews. Five minutes of research got Adam a 10% raise: not bad at all!

Listen and empathize

If you only had to make yourself happy this wouldn’t be a negotiation: you need to understand the other side’s needs and wants, what they’re worrying about, what they’re feeling. That means you need to listen, not just talk: if you do, you will often gather useful information that can help you make yourself more valuable, or address a particular worry. And you need to feel empathy towards the person you’re talking to: you don’t need to agree or subordinate yourself to their goals, but you do need to understand how they’re feeling.

Share information carefully

Sharing information at the wrong time during a negotiation can significantly weaken your position. For example, sharing your previous salary will often anchor what the company is willing to offer you:

Adam: “I graduated from university and started working at the end of 2012. At my first job I worked for way under my market rate. I knew this and was OK with it because they were a good company.

Then I switched jobs in 2013. What I hadn’t accounted for was that my salary at my first job was going to limit my future salary prospects. I had to fight hard for raises at my next job before I was in line with people straight out of school, because they didn’t want to double my salary at my previous company.”

In general, when interviewing for a job you shouldn’t share your previous salary, or your specific salary demands—except of course when it is helpful to do so. For example, let’s say you’re moving from Google to a tiny bootstrapped startup, and you know you won’t be able to get the same level of salary. Sharing your current salary can help push your offer higher, or used as leverage to get shorter hours: “I know you can’t offer me my previous salary of $$$, but here’s something you could do—”. Just make sure not to share it too early, or they might decide you’d never accept any offer at all and stop the interview process too early.

Most of the time, however, you shouldn’t share either your previous salary or specific salary requirements. If the company insists on getting your previous salary, you can:

  • If you work somewhere with relevant laws (e.g. California and Massachusetts), point out that this question is illegal. Asking about salary expectations is not illegal in these jurisdictions, so be careful about the distinction.
  • Ask for the company’s salary range for the position, as well as the next level up in the salary tree. Chances are they will refuse to share, in which case you can correspondingly refuse to share your information.
  • Say something like “I expect to be paid industry-standard pay for my experience.”

Applying the principle: I shouldn’t have told ITA Software my salary requirement. Instead, I should have gotten them to make the first offer, which would have given me more information about what they were willing to pay.

Principle #3: Negotiate from a position of strength

The stronger your negotiation position, the more likely you are to get what you want. And this is especially important when you’re asking for something abnormal, like a 3-day weekend.

Have a good fallback (BATNA)

If negotiation fails, what will you do? Whatever it is, that is your fallback, sometimes known as the “Best Alternative to a Negotiated Agreement” (BATNA). The better your fallback, the better your alternative, the stronger your negotiating position is. Always figure out your fallback in advance, before you start negotiating.

For example, imagine you’re applying for a new job:

  • If you’re unemployed and have an empty bank account, your fallback might be moving in with your parents. This does not give you a strong negotiating position.
  • If you’re employed, and more or less content with your current job, your fallback is staying where you are. That makes your position much stronger.

If you have a strong fallback, you can choose to walk away at any time, and this will make asking for more much easier.

Provide and demonstrate value

The more an organization wants you as an employee, the more they’ll be willing to offer you. The people you’re negotiating with don’t necessarily know your value: you need to make sure they understand why you’re worth what you’re asking.

For example, when you’re interviewing for a job, you need to use at least part of the interview to explain your value to your prospective employer: your accomplishments and skills. Once you’ve established the value of your skills, asking for more—more money, unusual terms—can actually make you seem more valuable. And having another job offer—or an existing job—can also help, by showing you are in demand.

Finally, remember that your goal is to make sure the other side’s needs are met—not at your own expense, but if they don’t think hiring you is worth it, you aren’t going to get anything. Here’s how Alex, another programmer I talked to, explains how he learned this:

Alex: “Think about the other person and how they’re going to react, how you can try to manage that proactively. You need to treat your negotiating partner as a person, not a program.

Initially I had been approaching it adversarially, 'I need to extract value from you, I have to wrestle you for it’ but it’s more productive to negotiate with an attitude of 'we both need to get our needs met.’ The person you’re talking to is looking to hire someone productive who can create value, so figure out how can you couch what you want in a way that proactively addresses the other person’s concerns.”

Principle #4: Use the right tactics

Once you’ve realized you’re negotiating, have done your research, and are negotiating from a position of strength, applying the right negotiation tactics will increase your chances of success even more.

Ask for more than you want

Obviously you don’t want to ask for less than what you want. But why not ask for exactly what you want?

First, it might turn out that the company is willing to give you far more than you expected or thought possible.

Second, if you ask for exactly what you want there’s no way for you to compromise without getting less than what you want. By asking for more, you can compromise while still getting what you wanted.

Applying the principle: If I’d wanted a $72,000 salary, and research suggested that was a fair salary, I should have asked for $80,000. If I was lucky the company would have said yes; if they wanted to negotiate me down, I would have no problems agreeing to a lower number so long as it was above $72,000.

Negotiate multiple things at once

Your goal when negotiating is not to “win.” Rather, your goal is to reach an agreement that passes your minimal bar, and gets you as much as is feasible. Feasibility means you also need to take into account what the other side wants as well. If you’ve reached an impasse, and you still think you can make a deal that you like, try to come up with creative ways to work out a solution that they will like.

If you only negotiate one thing at once, every negotiation has a winner and a loser. For example, if all you’re negotiating is salary, either you’re making more money, or the company is saving money: it’s a zero-sum negotiation. This limits your ability to come up with a solution that maximizes value for you while still meeting the other side’s needs.

Applying the principle: In my story above, the financial company wanted intellectual property protection, I wanted to be able to write open source, and we were at an impasse. So they expanded the scope of the negotiation to include my salary, which allowed them to make tradeoffs between the two—more money for me in return for what they wanted. If I’d cared less about working on open source I might have accepted that offer.

Never give an answer immediately

During the actual negotiation you should never decide on the spot, nor are you required to. If you get a job offer you can explain that you need a little time to think about it: say something like “I have to run this by my spouse/significant other/resident expert.” This will give you the time to consider your options in a calmer state of mind, and not just blurt out “yes” at the first semi-decent offer.

Having someone else review the offer is a good idea in general; a friend of mine ran her job offers by her sister, who had an MBA. But it’s also useful to mention that other person as someone who has to sign off on the offer. That gives you the ability to say you’d like to accept an offer, but your spouse/expert thinks you can do better.

Notice that the employer almost always has this benefit already. Unless you’re negotiating with the owner of the business, you’re negotiating with an agent: someone in HR, say. When you make a demand, the HR person might say “I have go to check with the hiring manager”, and when they come back with less than you wanted it’s not their fault, they’re just passing on the bad news. The implication is that the low offer is just the way it is, and there’s nothing they can do about.

Don’t fall for this trick: they often can change the offer.

Beyond negotiating for salary

You can negotiate for a higher salary—or rather, you should negotiate for a higher salary. The Adam I interviewed in this article is now a partner in DangoorMendel, who can help you negotiate a higher salary.

But salary isn’t the only thing you can negotiate for. You can also negotiate for a shorter workweek.

And yes, this is harder, but it’s definitely possible.

In fact, this article is an excerpt from a book I wrote to help you do just that: You Can Negotiate a 3-Day Weekend.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you’d produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

November 27, 2019 05:00 AM

November 18, 2019

Moshe Zadka

Raise Better Exceptions in Python

There is a lot of Python code in the wild which does something like:

raise ValueError("Could not fraz the buzz:"
                 f"{foo} is less than {quux}")

This is, in general, a bad idea. It does not matter if the exception is fairly generic, like ValueError or specific like CustomFormatParsingException.

Exceptions are not program panics. Program panics are things which should "never happen", and usually abort either the entire program, or at least an execution thread.

While exceptions sometimes do terminate the program, or the execution thread, with a traceback, they are different in that they can be caught.

The code that catches the exception will sometimes have a way to recover: for example, maybe it’s not that important for the application to fraz the buzz if foo is 0. In that case, the code would look like:

try:
    some_function()
except ValueError as exc:
    if ???

Oh, right. We do not have direct access to foo. If we formatted better, using repr, at least we could tell the difference between 0 and "0": but we still would have to start parsing the representation out of the string.

Because of this, in general, it is better to raise exceptions like this:

raise ValueError("Could not fraz the buzz: foo is too small", foo, quux)

Note that all the exceptions defined in core Python already allow any number of arguments. Those arguments are available as exc.args, if exc is the exception object. If you do end up defining your custom exceptions, the easiest thing is to avoid overriding the __init__: this keeps this behavior.

Raising exceptions this way gives exception handling a lot of power: it can introspect foo, introspect quux and introspect the string. If by some reason the exception class is raised and we want to verify the reason, checking string equality, while not ideal, is still better than trying to match string parts or regular expression matching.

When the exception is presented to the user interface, in that case, it will not look as nice. Exceptions, in general, should reach the UI only in extreme circumstances. In those cases, having something that has as much information is useful for root cause analysis.

This is an update of an older blog post. Thanks to Mark Rice and Ben Nuttall for their improvement suggestions. All mistakes that are left are my responsibility.

by Moshe Zadka at November 18, 2019 06:00 AM

November 11, 2019

Twisted Matrix Laboratories

Twisted 19.10.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 19.10! The highlights of this release are:
  • Security fixes for HTTP/2 -- CVE-2019-9512 (Ping Flood), CVE-2019-9514 (Reset Flood), and CVE-2019-9515 (Settings Flood).  Thanks to Jonathan Looney and Piotr Sikora.
  • HTTP/2 fixes regarding timeouts.
  • trial's assertResultOf, failureResultOf, and successResultOf, now accept Deferred-awaiting coroutines.
  • Various other bug fixes for POP3, conch.ssh.keys, and twisted.web.client.FileBodyProducer.
You can find the downloads at <https://pypi.python.org/pypi/Twisted> (or alternatively <http://twistedmatrix.com/trac/wiki/Downloads>). The NEWS file is also available at <https://github.com/twisted/twisted/blob/twisted-19.10.0/NEWS.rst>.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

- hawkowl

by Amber Brown (HawkOwl) (noreply@blogger.com) at November 11, 2019 04:34 AM

November 06, 2019

Hynek Schlawack

Python Application Dependency Management in 2018

We have more ways to manage dependencies in Python applications than ever. But how do they fare in production? Unfortunately this topic turned out to be quite polarizing and was at the center of a lot of heated debates. This is my attempt at an opinionated review through a DevOps lens.

by Hynek Schlawack (hs@ox.cx) at November 06, 2019 12:00 AM

October 18, 2019

Moshe Zadka

An introduction to zope.interface

This has previously been published on opensource.com.

The Zen of Python is loose enough and contradicts itself enough that you can prove anything from it. Let's meditate upon one of its most famous principles: "Explicit is better than implicit."

One thing that traditionally has been implicit in Python is the expected interface. Functions have been documented to expect a "file-like object" or a "sequence." But what is a file-like object? Does it support .writelines? What about .seek? What is a "sequence"? Does it support step-slicing, such as a[1:10:2]?

Originally, Python's answer was the so-called "duck-typing," taken from the phrase "if it walks like a duck and quacks like a duck, it's probably a duck." In other words, "try it and see," which is possibly the most implicit you could possibly get.

In order to make those things explicit, you need a way to express expected interfaces. One of the first big systems written in Python was the Zope web framework, and it needed those things desperately to make it obvious what rendering code, for example, expected from a "user-like object."

Enter zope.interface, which was part of Zope but published as a separate Python package. The zope.interface package helps declare what interfaces exist, which objects provide them, and how to query for that information.

Imagine writing a simple 2D game that needs various things to support a "sprite" interface; e.g., indicate a bounding box, but also indicate when the object intersects with a box. Unlike some other languages, in Python, attribute access as part of the public interface is a common practice, instead of implementing getters and setters. The bounding box should be an attribute, not a method.

A method that renders the list of sprites might look like:

def render_sprites(render_surface, sprites):
    """
    sprites should be a list of objects complying with the Sprite interface:
    * An attribute "bounding_box", containing the bounding box.
    * A method called "intersects", that accepts a box and returns
      True or False
    """
    pass # some code that would actually render

The game will have many functions that deal with sprites. In each of them, you would have to specify the expected contract in a docstring.

Additionally, some functions might expect a more sophisticated sprite object, maybe one that has a Z-order. We would have to keep track of which methods expect a Sprite object, and which expect a SpriteWithZ object.

Wouldn't it be nice to be able to make what a sprite is explicit and obvious so that methods could declare "I need a sprite" and have that interface strictly defined? Enter zope.interface.

from zope import interface

class ISprite(interface.Interface):

    bounding_box = interface.Attribute(
        "The bounding box"
    )

    def intersects(box):
        "Does this intersect with a box"

This code looks a bit strange at first glance. The methods do not include a self, which is a common practice, and it has an Attribute thing. This is the way to declare interfaces in zope.interface. It looks strange because most people are not used to strictly declaring interfaces.

The reason for this practice is that the interface shows how the method will be called, not how it is defined. Because interfaces are not superclasses, they can be used to declare data attributes.

One possible implementation of the interface can be with a circular sprite:

@implementer(ISprite)
@attr.s(auto_attribs=True)
class CircleSprite:
    x: float
    y: float
    radius: float

    @property
    def bounding_box(self):
        return (
            self.x - self.radius,
            self.y - self.radius,
            self.x + self.radius,
            self.y + self.radius,
        )

    def intersects(self, box):
        # A box intersects a circle if and only if
        # at least one corner is inside the circle.
        top_left, bottom_right = box[:2], box[2:]
        for choose_x_from (top_left, bottom_right):
            for choose_y_from (top_left, bottom_right):
                x = choose_x_from[0]
                y = choose_y_from[1]
                if (((x - self.x) ** 2 + (y - self.y) ** 2) <=
                    self.radius ** 2):
                     return True
        return False

This explicitly declares that the CircleSprite class implements the interface. It even enables us to verify that the class implements it properly:

from zope.interface import verify

def test_implementation():
    sprite = CircleSprite(x=0, y=0, radius=1)
    verify.verifyObject(ISprite, sprite)

This is something that can be run by pytest, nose, or another test runner, and it will verify that the sprite created complies with the interface. The test is often partial: it will not test anything only mentioned in the documentation, and it will not even test that the methods can be called without exceptions! However, it does check that the right methods and attributes exist. This is a nice addition to the unit test suite and -- at a minimum -- prevents simple misspellings from passing the tests.

If you have some implicit interfaces in your code, why not document them clearly with zope.interface?

by Moshe Zadka at October 18, 2019 03:00 AM

October 16, 2019

Hynek Schlawack

Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

by Hynek Schlawack (hs@ox.cx) at October 16, 2019 12:00 AM

October 13, 2019

Glyph Lefkowitz

Mac Python Distribution Post Updated for Catalina and Notarization

I previously wrote a post about shipping a PyGame app to users on macOS. It’s now substantially updated for the new Notarization requirements in Catalina. I hope it’s useful to somebody!

by Glyph at October 13, 2019 09:10 PM

October 07, 2019

Glyph Lefkowitz

The Numbers, They Lie

It’s October, and we’re all getting ready for Halloween, so allow me to me tell you a horror story, in Python:

1
2
>>> 0.1 + 0.2 - 0.3
5.551115123125783e-17

some scary branches

Some of you might already be familiar with this chilling tale, but for those who might not have experienced it directly, let me briefly recap.

In Python, the default representation of a number with a decimal point in it is something called an “IEEE 754 double precision binary floating-point number”. This standard achieves a generally useful trade-off between performance, correctness, and is widely implemented in hardware, making it a popular choice for numbers in many programming language.

However, as our spooky story above indicates, it’s not perfect. 0.1 + 0.2 is very slightly less than 0.3 in this representation, because it is a floating-point representation in base 2.

If you’ve worked professionally with software that manipulates money1, you typically learn this lesson early; it’s quite easy to smash head-first into the problem with binary floating-point the first time you have an item that costs 30 cents and for some reason three dimes doesn’t suffice to cover it.

There are a few different approaches to the problem; one is using integers for everything, and denominating your transactions in cents rather than dollars. A strategy which requires less weird unit-conversion2, is to use the built-in decimal module, which provides a floating-point base 10 representation, rather than the standard base-2, which doesn’t have any of these weird glitches surrounding numbers like 0.1.

This is often where a working programmer’s numerical education ends; don’t use floats, they’re bad, use decimals, they’re good. Indeed, this advice will work well up to a pretty high degree of application complexity. But the story doesn’t end there. Once division gets involved, things can still get weird really fast:

1
2
3
>>> from decimal import Decimal
>>> (Decimal("1") / 7) * 14
Decimal('2.000000000000000000000000001')

The problem is the same: before, we were working with 1/10, a value that doesn’t have a finite (non-repeating) representation in base 2; now we’re working with 1/7, which has the same problem in base 10.

Any time you have a representation of a number which uses digits and a decimal point, no matter the base, you’re going to run in to some rational values which do not have an exact representation with a finite number of digits; thus, you’ll drop some digits off the (necessarily finite) end, and end up with a slightly inaccurate representation.

But Python does have a way to maintain symbolic accuracy for arbitrary rational numbers -- the fractions module!

1
2
3
4
5
>>> from fractions import Fraction
>>> Fraction(1)/3 + Fraction(2)/3 == 1
True
>>> (Fraction(1)/7) * 14 == 2
True

You can multiply and divide and add and subtract to your heart’s content, and still compare against zero and it’ll always work exactly, giving you the right answers.

So if Python has a “correct” representation, which doesn’t screw up our results under a basic arithmetic operation such as division, why isn’t it the default? We don’t care all that much about performance, right? Python certainly trades off correctness and safety in plenty of other areas.

First of all, while Python’s willing to trade off some storage or CPU efficiency for correctness, precise fractions rapidly consume huge amounts of storage even under very basic algorithms, like consuming gigabytes while just trying to maintain a simple running average over a stream of incoming numbers.

But even more importantly, you’ll notice that I said we could maintain symbolic accuracy for arbitrary rational numbers; but, as it turns out, a whole lot of interesting math you might want to do with a computer involves numbers which are irrational: like π. If you want to use a computer to do it, pretty much all trigonometry3 involves a slightly inaccurate approximation unless you have a literally infinite amount of storage.

As Morpheus put it, “welcome to the desert of the ”.


  1. or any proxy for it, like video-game virtual currency 

  2. and less time saying weird words like “nanodollars” to your co-workers 

  3. or, for that matter, geometry, or anything involving a square root 

by Glyph at October 07, 2019 06:25 AM

October 05, 2019

Glyph Lefkowitz

A Few Bad Apples

I’m a little annoyed at my Apple devices right now.

Time to complain.

“Trust us!” says Apple.

“We’re not like the big, bad Google! We don’t just want to advertise to you all the time! We’re not like Amazon, just trying to sell you stuff! We care about your experience. Magical. Revolutionary. Courageous!”

But I can’t hear them over the sound of my freshly-updated Apple TV — the appliance which exists solely to play Daniel Tiger for our toddler — playing the John Wick 3 trailer at full volume automatically as soon as it turns on.

For the aforementioned toddler.

I should mention that it is playing this trailer while specifically logged in to a profile that knows their birth date1 and also their play history2.


I’m aware of the preferences which control autoplay on the home screen; it’s disabled now. I’m aware that I can put an app other than “TV” in the default spot, so that I can see ads for other stuff, instead of the stuff “TV” shows me ads for.

But the whole point of all this video-on-demand junk was supposed to be that I can watch what I want, when I want — and buying stuff on the iTunes store included the implicit promise of no advertisements.

At least Google lets me search the web without any full-screen magazine-style ads popping up.

Launch the app store to check for new versions?

apple arcade ad

I can’t install my software updates without accidentally seeing HUGE ads for new apps.

Launch iTunes to play my own music?

apple music ad

I can’t play my own, purchased music without accidentally seeing ads for other music — and also Apple’s increasingly thirsty, desperate plea for me to remember that they have a streaming service now. I don’t want it! I know where Spotify is if I wanted such a thing, the whole reason I’m launching iTunes is that I want to buy and own the music!

On my iPhone, I can’t even launch the Settings app to turn off my WiFi without seeing an ad for AppleCare+, right there at the top of the UI, above everything but my iCloud account. I already have AppleCare+; I bought it with the phone! Worse, at some point the ad glitched itself out, and now it’s blank, and when I tap the blank spot where the ad used to be, it just shows me this:

undefined is not an insurance plan

I just want to use my device, I don’t need ad detritus littering every blank pixel of screen real estate.

Knock it off, Apple.


  1. less than 3 years ago 

  2. Daniel Tiger, Doctor McStuffins, Word World; none of which have super significant audience overlap with the John Wick franchise 

by Glyph at October 05, 2019 06:32 PM

September 24, 2019

Jp Calderone

Tahoe-LAFS on Python 3 - Call for Porters

Hello Pythonistas,

Earlier this year a number of Tahoe-LAFS community members began an effort to port Tahoe-LAFS from Python 2 to Python 3.  Around five people are currently involved in a part-time capacity.  We wish to accelerate the effort to ensure a Python 3-compatible release of Tahoe-LAFS can be made before the end of upstream support for CPython 2.x.

Tahoe-LAFS is a Free and Open system for private, secure, decentralized storage.  It encrypts and distributes your data across multiple servers.  If some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security.

Foolscap, a dependency of Tahoe-LAFS, is also being ported.  Foolscap is an object-capability-based RPC protocol with flexible serialization.

Some details of the porting effort are available in a milestone on the Tahoe-LAFS trac instance.

For this help, we are hoping to find a person/people with significant prior Python 3 porting experience and, preferably, some familiarity with Twisted, though in general the Tahoe-LAFS project welcomes contributors of all backgrounds and skill levels.

We would prefer someone to start with us as soon as possible and no later than October 15th. If you are interested in this opportunity, please send us any questions you have, as well as details of your availability and any related work you have done previously (GitHub, LinkedIn links, etc). If you would like to find out more about this opportunity, please contact us at jessielisbetfrance at gmail (dot) com or on IRC in #tahoe-lafs on Freenode.

by Jean-Paul Calderone (noreply@blogger.com) at September 24, 2019 04:59 PM

September 17, 2019

Moshe Zadka

Adding Methods Retroactively

The following post was originally published on OpenSource.com as part of a series on seven libraries that help solve common problems.

Imagine you have a "shapes" library. We have a Circle class, a Square class, etc.

A Circle has a radius, a Square has a side, and maybe Rectangle has height and width. The library already exists: we do not want to change it.

However, we do want to add an area calculation. If this was our library, we would just add an area method, so that we can call shape.area(), and not worry about what the shape is.

While it is possible to reach into a class and add a method, this is a bad idea: nobody expects their class to grow new methods, and things might break in weird ways.

Instead, the singledispatch function in functools can come to our rescue:

@singledispatch
def get_area(shape):
    raise NotImplementedError("cannot calculate area for unknown shape",
                              shape)

The "base" implementation for the get_area function just fails. This makes sure that if we get a new shape, we will cleanly fail instead of returning a nonsense result.

@get_area.register(Square)
def _get_area_square(shape):
    return shape.side ** 2
@get_area.register(Circle)
def _get_area_circle(shape):
    return math.pi * (shape.radius ** 2)

One nice thing about doing things this way is that if someone else writes a new shape that is intended to play well with our code, they can implement the get_area themselves:

from area_calculator import get_area

@attr.s(auto_attribs=True, frozen=True)
class Ellipse:
    horizontal_axis: float
    vertical_axis: float

@get_area.register(Ellipse)
def _get_area_ellipse(shape):
    return math.pi * shape.horizontal_axis * shape.vertical_axis

Calling get_area is straightforward:

print(get_area(shape))

This means we can change a function that has a long if isintance()/elif isinstance() chain to work this way, without changing the interface. The next time you are tempted to check if isinstance, try using singledispatch!

by Moshe Zadka at September 17, 2019 01:00 AM