Planet Twisted

July 21, 2017

Itamar Turner-Trauring

Incremental results, not incremental implementation

Update: Added section on iterative development.

You’re working on a large project, bigger than you’ve ever worked on before: how do you ship it on time? How do you ensure it has all the functionality it needs? How do you design something that is too big to fit in your head?

My colleague Rafi Schloming, speaking in the context of the transition to microservices, suggests that focusing on incremental results is fundamentally better than focusing on incremental implementation. This advice will serve you well in most large projects, and to explain why I’d like to tell you the story of a software project I built the wrong way.

A real world example

The wrong way…

I once built a system for efficiently sending streams of data from one source to many servers; the resulting software was run by the company’s ops team. Since I was even more foolish than I am now, I implemented it in the following order, based on the architecture I had come up with:

  1. First I implemented a C++ integration layer for the Python networking framework I was using, so I could write higher performance code.
  2. Then I implemented the messaging protocol and system, based on a research paper I’d found.
  3. Finally, I handed the code over to ops.

As you can see, I implemented my project based on its architecture: first the bottom layer, then the layers that built on top of it. Unfortunately, since I hadn’t consulted ops enough about the design they then had to make some changes on their own. As a result, it took six months to a year until the code was actually being used in production.

…and the right way

How would I have built my tool to deliver incremental results?

  1. Build a working tool in pure Python. This would probably have been too slow for some of the higher-speed message streams.
  2. Hand initial tool over to ops. Ops could then start using it for slower streams, and provide feedback on the design.
  3. Next I would have fixed any problems reported by ops.
  4. Finally, I would rewrite the core networking in C++ for performance.

Notice that this is seemingly less efficient than my original plan, since it involves re-implementing some code. Nonetheless I believe it would have resulted in the project going live much sooner.

Why incremental results are better

Incremental results means you focus on getting results as quickly as possible, even if you can’t get all the desired results with initial versions. That means:

  • Faster feedback: You can start using the software earlier, and therefore get feedback earlier. In may case I would have learned about ops’ use cases and problems months earlier, and could have incorporated their suggestions into my design. Instead, they had to patch the code themselves.
  • Less unnecessary features: By focusing on results you’re less likely to get distracted by things you think you need. I believed that Python wouldn’t have been fast enough, so I spent a lot of time upfront using C++. And the C++ version was definitely quite fast, faster than Python could do. But maybe a Python version would have been fast enough.
  • Less cancellation risk: The faster you deliver a project, the faster you can demonstrate results, and so the less risk of your big project being canceled half-way.
  • Less deployment risk: Instead of turning on a single, giant deliverable, you will start by deploying a simpler version, and then upgrading it over time. That means more operational knowledge of the software, and less risk when you turn it on the first time.

Beyond iterative development

“Iterative development” is a common, and good, suggestion for software development, but it’s not quite the same as focusing on incremental results. In iterative development you build your full application end-to-end, and then in each released iteration you make the functionality work better. In that sense, the better alternative I was suggesting above could be seen as simply suggesting iterative development. But incremental results is a more broadly applicable idea than iterative development.

Incremental results are the goal; iterative development is one possible technique to achieve that goal. Sometimes you can achieve incremental results without iterative development:

  • If each individual feature provides value on its own then you can get incremental results with less iterative, more cumulative development. That is, you don’t need to start with end-to-end product and then flesh out the details, you can just deliver one feature at a time.
  • Incremental results aren’t just about your development process, they are also about how results are delivered. For example, if you’re streaming a website to a browser there are two ways to send images: from top to bottom, or starting with a blurry image and getting progressively sharper. With a fast connection either choice works. With a slow connection progressive sharpening is superior because it provides information much sooner: incremental results.

Whenever you can, aim for incremental results: it will reduce the risks, and make your project valuable much earlier. It may mean some wasted effort, yes, as you re-implement certain features, but that waste is usually outweighed by the reduced risk and faster feedback you’ll get from incremental results.

PS: I’ve made lots of other mistakes in my career. If you’d like to learn how to avoid them, sign up for my newsletter, where every week I write up one of my mistakes and how you can avoid it.

July 21, 2017 04:00 AM

July 20, 2017

Moshe Zadka

Anatomy of a Multi-Stage Docker Build

Docker, in recent versions, has introduced multi-stage build. This allows separating the build environment from the runtime envrionment much more easily than before.

In order to demonstrate this, we will write a minimal Flask app and run it with Twisted using its WSGI support.

The Flask application itself is the smallest demo app, straight from any number of Flask tutorials:

# src/msbdemo/wsgi.py
from flask import Flask
app = Flask("msbdemo")
@app.route("/")
def hello():
    return "If you are seeing this, the multi-stage build succeeded"

The setup.py file, similarly, is the minimal one from any number of Python packaging tutorials:

import setuptools
setuptools.setup(
    name='msbdemo',
    version='0.0.1',
    url='https://github.com/moshez/msbdemo',
    author='Moshe Zadka',
    author_email='zadka.moshe@gmail.com',
    packages=setuptools.find_packages(),
    install_requires=['flask'],
)

The interesting stuff is in the Dockefile. It is interesting enough that we will go through it line by line:

FROM python:2.7.13

We start from a "fat" Python docker image -- one with the Python headers installed, and the ability to compile extensions.

RUN virtualenv /buildenv

We create a custom virtual environment for the build process.

RUN /buildenv/bin/pip install pex wheel

We install the build tools -- in this case, wheel, which will let us build wheels, and pex, which will let us build single file executables.

RUN mkdir /wheels

We create a custom directory to put all of our wheels. Note that we will not install those wheels in this docker image.

COPY src /src

We copy our minimal Flask-based application's source code into the docker image.

RUN /buildenv/bin/pip wheel --no-binary :all: \
                            twisted /src \
                            --wheel-dir /wheels

We build the wheels. We take care to manually build wheels ourselves, since pex, right now, cannot handle manylinux binary wheels.

RUN /buildenv/bin/pex --find-links /wheels --no-index \
                      twisted msbdemo -o /mnt/src/twist.pex -m twisted

We build the twisted and msbdemo wheels, togther with any recursive dependencies, into a Pex file -- a single file executable.

FROM python:2.7.13-slim

This is where the magic happens. A second FROM line starts a new docker image build. The previous images are available -- but only inside this Dockerfile -- for copying files from. Luckily, we have a file ready to copy: the output of the Pex build process.

COPY --from=0 /mnt/src/twist.pex /root

The --from=0 indicates copying from a previously built image, rather than the so-called "build context". In theory, any number of builds can take place in one Dockefile. While only the last one will actually result in a permanent image, the others are all available as targets for --from copying. In practice, two stages are usually enough.

ENTRYPOINT ["/root/twist.pex", "web", "--wsgi", "msbdemo.wsgi.app", \
            "--port", "tcp:80"]

Finally, we use Twisted as our WSGI container. Since we bound the Pex file to the -m twisted package execution, all we need to is run the web plugin, ask it to run a wsgi container, and give it the logical (module) path to our WSGI app.

Using Docker multi-stage builds has allowed us to create a Docker container for production with:

  • A smaller footprint (using the "slim" image as base)
  • Few layers (only adding two layers to the base slim image)

The biggest benefit is that it let us do so with one Dockerfile, with no extra machinery.

by Moshe Zadka at July 20, 2017 04:30 AM

July 18, 2017

Glyph Lefkowitz

Beyond ThunderDock

This weekend I found myself pleased to receive a Kensington SD5000T Thunderbolt 3 Docking Station.

Some of its functionality was a bit of a weird surprise.

The Setup

Due to my ... accretive history with computer purchases, I have 3 things on my desk at home: a USB-C macbook pro, a 27" Thunderbolt iMac, and an older 27" Dell display, which is old enough at this point that I can’t link it to you. Please do not take this to be some kind of totally sweet setup. It would just be somewhat pointlessly expensive to replace this jumble with something nicer. I purchased the dock because I want to have one cable to connect me to power & both displays.

For those not familiar, iMacs of a certain vintage1 can be jury-rigged to behave as Thunderbolt displays with limited functionality (no access from the guest system to the iMac’s ethernet port, for example), using Target Display Mode, which extends their useful lifespan somewhat. (This machine is still, relatively speaking, a powerhouse, so it’s not quite dead yet; but it’s nice to be able to swap in my laptop and use the big screen.)

The Link-up

On the back of the Thunderbolt dock, there are 2 Thunderbolt 3 ports. I plugged the first one into a Thunderbolt 3 to Thunderbolt 2 adapter which connects to the back of the iMac, and the second one into the Macbook directly. The Dell display plugs into the DisplayPort; I connected my network to the Ethernet port of the dock. My mouse, keyboard, and iPhone were plugged into the USB ports on the dock.

The Problem

I set it up and at first it seemed to be delivering on the “one cable” promise of thunderbolt 3. But then I switched WiFi off to test the speed of the wired network and was surprised to see that it didn’t see the dock’s ethernet port at all. Flipping wifi back on, I looked over at my router’s control panel and noticed that a new device (with the expected manufacturer) was on my network. nmap seemed to indicate that it was... running exactly the network services I expected to see on my iMac. VNCing into the iMac to see what was going on, I popped open the Network system preference pane, and right there alongside all the other devices, was the thunderbolt dock’s ethernet device.

The Punch Line

Despite the miasma of confusion surrounding USB-C and Thunderbolt 32, the surprise here is that apparently Thunderbolt is Thunderbolt, and (for this device at least) Thunderbolt devices connected across the same bus can happily drive whatever they’re plugged in to. The Thunderbolt 2 to 3 adapter isn’t just a fancy way of plugging in hard drives and displays with the older connector; as far as I can tell all the functionality of the Thunderbolt interface remains intact as both “host” and “guest”. It’s like having an ethernet switch for your PCI bus.

What this meant is that when I unplugged everything and then carefully plugged in the iMac before the Macbook, it happily lit up the Dell display, and connected to all the USB devices plugged into the USB hub. When I plugged the laptop in, it happily started charging, but since it didn’t “own” the other devices, nothing else connected to it.

Conclusion

This dock works a little bit too well; when I “dock” now I have to carefully plug in the laptop first, give it a moment to grab all the devices so that it “owns” them, then plug in the iMac, then use this handy app to tell the iMac to enter Target Display mode.

On the other hand, this does also mean that I can quickly toggle between “everything is plugged in to the iMac” and “everything is plugged in to the MacBook” just by disconnecting and reconnecting a single cable, which is pretty neat.


  1. Sadly, not the most recent fancy 5K ones. 

  2. which are, simultaneously, both the same thing and not the same thing. 

by Glyph at July 18, 2017 07:11 AM

Moshe Zadka

Bash is Unmaintainable Python

(Thanks to Aahz, Roy Williams, Yarko Tymciurak, and Naomi Ceder for feedback. Any mistakes that remain are mine alone.)

In the post about building Docker applications, I had the following Python script:

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call(["docker", "pull", orig])
    subprocess.check_call(["docker", "tag", orig, image])
    subprocess.check_call(["docker", "push", image])

I showed this script to two audiences, in two versions of the talk. One, a Python beginner audience, mostly new to Docker. Another, a Docker-centric audience, with varying levels of familiarity with Python. I gave excuses for why this script is in Python, rather than the obvious choice of shell scripting for automating command-line utilities.

None of the excuses were the true reason.

Note that in a talk, things are simplified. Typical scripts in the real world would not be 10 lines or so. They start out 10 lines, of course, but then have to account for edge cases, extra use cases, random bugs in the services that need to be worked around, and so on. I am more used to writing scripts for production than writing scripts for talks.

The true reason the script is in Python is that I have started doing all my "shell" scripting in Python recently, and I am never going back. Unix shell scripting is pretty much writing in unmaintainable Python. Before making the case for that, I am going to take a step in the other direction. The script above took care to only use the standard library. If it could take advantage of third party libraries, I would have written it this way:

import datetime, subprocess
import seashore
xctr = seashore.Executor(seashore.Shell())
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    xctr.docker.pull(orig)
    xctr.docker.tag(orig, image)
    xctr.docker.push(image)

But what if I went the other way?

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call("docker pull " + orig, shell=True)
    subprocess.check_call("docker tag " + orig + " " + image, shell=True)
    subprocess.check_call("docker push " + image, shell=True)

Note that using shell=True is discouraged, and is generally a bad idea. We will revisit why later. If I were using Python 3.6, I could even have the last three lines be:

subprocess.check_call(f"docker pull {orig}", shell=True)
subprocess.check_call(f"docker tag {orig} {image}", shell=True)
subprocess.check_call(f"docker push {image}", shell=True)

or I could even combine them:

subprocess.check_call(f"docker pull {orig} && "
                      f"docker tag {orig} {image} && "
                      f"docker push {image}", shell=True)

What about calculating the tag?

tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)

Putting it all together, we would have

import subprocess
tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)
for ext in ['', '-slim']:
    image = f"moshez/python36{ext}:{tag}"
    orig = f"python:3.6{ext}"
    subprocess.check_call(f"docker pull {orig} && "
                          f"docker tag {orig} {image} && "
                          f"docker push {image}", shell=True)

None of the changes we made were strictly improvements. They mostly made the code harder to read and more fragile. But now that we have done them, it is straightforward to convert it to a shell script:

#!/bin/sh
set -e
tag = $(date --utc --rfc-3339=ns |
        sed -e 's/ /T/' -e 's/:/-/g' \
        -e 's/\./-/g' -e 's/\+.*//')
for ext in '' '-slim'
do
    image = "moshez/python36$ext:$tag"
    orig = "python:3.6$ext
    docker pull $orig
    docker tag $orig $image
    docker push $image
done

Making our script worse and worse makes a Python script into a shell script. Not just a shell script -- this is arguably idiomatic shell. It uses -e, long options for legibility, and so on. Note that the shell does not even have a way to express a notion like shell=False. In a script without arguments, like this one, this merely means changes are dangerous. In a script with arguments, it means that input handling safely is difficult (and unlikely to happen). Indeed, this is why shell=False is the default, and recommended, approach in Python.

In this case, one that does little but automate unix commands, the primary use-case of shell scripts. It stands to reason that the reverse process -- making a shell script into Python, would have the reverse effect: making for more maintainable, less fragile code.

As an exercise of "going the other way", we will start with a simplified version of shell script

set -e

if [ $# != 3 ]; then
    echo "Invalid arguments: $*";
    exit 1;
fi;

PR_NUMBER="$1"; shift;
TICKET_NUMBER="$1"; shift;
BRANCH_NAME="$1"; shift;


repo="git@github.com:twisted/twisted.git";
wc="$(dirname "$(dirname "$0")")/.git";

if [ ! -d "${wc}" ]; then
  wc="$(mktemp -d -t twisted.XXXX)";

  git clone --depth 1 --progress "${repo}" "${wc}";

  cloned=true;
else
  cloned=false;
fi;

cd "${wc}";

git fetch origin "refs/pull/${PR_NUMBER}/head";
git push origin "FETCH_HEAD:refs/heads/${TICKET_NUMBER}-${BRANCH_NAME}";

if ${cloned}; then
  rm -fr "${wc}";
fi;

How would it look like, with Python and seashore?

import os
import shutil
import sys

import seashore

if len(sys.argv) != 4:
    sys.exit("Invalid arguments: " + ' '.join(sys.argv))

PR_NUMBER, TICKET_NUMBER, BRANCH_NAME = sys.argv[1:]

xctr = seashore.Executor(seashore.Shell())
repo="git@github.com:twisted/twisted.git";
wc=os.path.dirname(os.path.dirname(sys.argv[0])) + '/.git'
if not os.path.isdir(wc):
    wc = tempfile.mkdtemp(prefix='twisted')
    xctr.git.clone(repo, wc, depth=1, progress=None)
    cloned = True
else:
    cloned = False

xctr = xctr.chdir(wc)
xctr.git.fetch(origin, f"refs/pull/{PR_NUMBER}/head")
xctr.git.push(origin,
              f"FETCH_HEAD:refs/heads/{TICKET_NUMBER}-{BRANCH_NAME}")
if cloned:
    shutil.rmtree(wc)

The code is no longer, more explicit, and -- had we wanted to -- easier to now refactor into unit-testable functions.

If this is, indeed, the general case, we can skip that stage entirely: write the script in Python to begin with. When it inevitably increases in scope, it will already be in a language that supports modules and unit tests.

by Moshe Zadka at July 18, 2017 05:20 AM

July 16, 2017

Itamar Turner-Trauring

Beyond fad frameworks: which programming skills are in demand?

Which programming skills should spend your limited time and energy on, which engineering skills are really in demand? There will always be another fad framework that will soon fade from memory; the time you spend learning it might turn out to be wasted. And job listings ask for ever-changing, not very reasonable combinations of skills: “We want 5 years experience with AngularJS, a deep knowledge of machine learning, and a passion for flyfishing!”

Which skills are really in demand, which will continue to be in demand, and which can safely be ignored? The truth is that the skills employers want are not the skills they actually need: the gap between the two can be a problem, but if you present yourself right it can also be an opportunity.

What employers want

What employers typically want is someone who will get going quickly, with as short a ramp-up time as possible and as little training as possible. While perhaps short-sighted, this certainly seems to be the default mindset. There are two failure modes:

  1. Over-focusing on implementation skills, rather than problem solving skills: “We use AngularJS, therefore we must hire someone who already knows AngularJS!” If it turns out AngularJS doesn’t work when the requirements change, hiring only for AngularJS skills will prove problematic.
  2. Hiring based on a hypothetical solution: “I hear that well-known company succeeded using microservices, so we should switch to microservices. We must hire someone who already knows microservices!” If that turns out to be the wrong solution, hiring someone to implement it will not turn out well.

What employers need

What employers actually need is someone who will identify and solve their problems. An organization’s goal is never really to use AngularJS or switch to microservices: it’s to sell a product or service, help some group of people, promote some point of view, and so on. Employers need employees who will help further these goals.

That doesn’t necessarily require knowing the employer’s existing technology stack, or having a working knowledge of trendy technologies. Someone who can quickly learn the existing codebase and technologies, identify the big-picture problems, and then come up with and implement a good solution: that is what employers really need.

This can build on a broad range of skills, including:

What you should do

Given this gap between what employers want and what they need, what should you do?

  1. Learn the problem solving skills that employers will always need. That means gathering requirements, coming up with efficient solutions, project management, and so on.
  2. Learn some long-lasting popular technologies in-depth. Relational databases have been around since the 1980’s and aren’t going anywhere: if you really understand how to structure data, the concurrency model, the impact of disk storage and layout, and so on, learning some other database like MongoDB will be easy (although perhaps a little horrifying). Similarly, decades after their creations languages like Python or Java are still going strong, and if you know one well you’ll have an easy time learning many other languages.
  3. Dabble, minimally, in some trendy technologies. If you occasionally spend 30 minutes going through the tutorial for the web framework of the month, when it’s time to interview you say can say “I played with it a little.” This will also help you with the next item.
  4. Learn how to learn new technologies quickly.

Then, when it’s time to look for a job, ignore the list of technology requirements when applying, presuming you think you can do the job: it’s what the company wants, not what they need.

Interviewing is about marketing, about how you present yourself. So in your cover letter, and when you interview, emphasize all the ways you can address what they want in other ways, and try to point out ways in which you can help do what they actually need:

  • Getting started quickly with minimal training: “I can learn new codebases and technologies quickly, as I did at my last job when I joined the Foo team, learned how to use Bar in a month, and built Baz in just two months.”
  • Needs that are implicit in the company’s situation: “I see you’re a growing company; I have previous experience helping an engineering team grow under similar circumstances.”
  • Needs that are implicit in the job description: “I identified this big costly problem, not unlike yours, and solved it like this, using these technologies.”

Learning every new web framework isn’t necessary to get a job. Yes, technologies do change over the years: that means you need to be able to learn new technologies quickly. But just as important as technology skills are those that will make you valuable—the ability to identify and solve problems—and the skill that will make your value clear: the ability to market yourself.

July 16, 2017 04:00 AM

July 10, 2017

Itamar Turner-Trauring

Stop writing software, start solving problems

As software engineers we often suffer from an occupational hazard: we enjoy programming. Somewhere in college or high school we discovered that writing code is fun, and so we chose a profession that allowed us to both get paid and enjoy ourselves. And what could be wrong with that?

The problem is that our job as software engineers is not to write code: our job is to solve problems. And if we get distracted by the fun of programming we often do worse at solving those problems.

The siren call of programming

I’ve been coding since 1995, and I have to admit, I enjoy programming. My working life is therefore a constant struggle against the temptation to do so.

Recently, for example, I encountered a problem in the Softcover book publishing system, which converts Markdown into various e-book formats. I’ve been working on The Programmer’s Guide to a Sane Workweek (intro email course here), and I’ve reached the point of needing to render the text into a nicely layed out PDF.

Softcover renders Markdown blockquotes like this:

> This is my story.

into LaTex \quote{} environments like this:

\begin{quote}
This is my story.
\end{quote}

I wanted the output to be a custom LaTeX environment, so I could customize the PDF output to look a particular way:

\begin{mycustomquote}
This is my story.
\end{mycustomquote}

This is the point where programming began calling out to me: “Write code! Contribute to the open source community! Submit a patch upstream!” I would need to learn the Softcover code base just enough to find the relevant transformation, learn just enough enough more Ruby to modify the code, figure out how to make the output customizable, write a test or three, and then submit a patch. This probably would have taken me an afternoon. It would have been fun, and I would have felt good about myself.

But my goal is not to write software: my goal is to solve problems, and the problem in this case is spitting out the correct LaTeX so I can control my book’s formatting. And so instead of spending an afternoon on it, I spent five minutes writing the following Makefile:

build-pdf:
	rm -rf generated_polytex/*.tex
	softcover build:pdf
	sed 's/{quote}/{mycustomquote}/g' -i generated_polytex/*.tex
	softcover build:pdf

This is a horrible hack: I’m relying on the fact that building a PDF generates TeX files if they don’t already exist, but uses existing ones if they are there and newer than the source. So I build the PDF, modify the generated TeX files in place, and then rebuild the PDF with the modified files.

I would never do anything like this if I was building a production system used by customers. But this isn’t a production system, and there are no customers: it’s a script only I will ever run, and I run it manually. It’s not elegant, but then it doesn’t have to be.

I solved my problem, and I solved it efficiently.

Stop writing code, start solving problems

Don’t write code just because it’s fun—instead, solve the problem the right way:

  • Sometimes that means writing no code at all, because you can solve the problem with some Post-It notes on the wall.
  • Sometimes that means writing boring tests, even though it’s no fun to write tests.
  • Sometimes that means reusing someone else’s library, even though it’s much more fun to write your own version.

You can write software for fun, of course: programming makes a fine hobby. But when you’re working, when you’re trying to get a product shipped, when you’re trying to get a bug fixed: be a professional, and focus on solving the problem.

PS: Coding for fun when I should’ve been solving problems is just is one of the many programming mistakes I’ve made over the years. Sign up for my Software Clown newsletter and every week you’ll hear the story of one my engineering or career mistakes and how you can avoid it.

July 10, 2017 04:00 AM

July 07, 2017

Itamar Turner-Trauring

Don't crank out code at 2AM, especially if you're the CTO

Dear HubSpot CTO,

Yesterday over on the social medias you wrote that there’s “nothing quite as satisfying as cranking out code at 2am for a feature a customer requested earlier today.” I’m guessing that as CTO you don’t get to code as much these days, and I don’t wish to diminish your personal satisfaction. But nonetheless cranking out code at 2AM is a bad idea: it’s bad for your customers, it sets a bad example for your employees, and as a result it’s bad for your company.

An invitation to disaster

Tired people make mistakes. This is not controversial: lack of sleep has been tied to everything from medical errors to the Exxon Valdez and Challenger disasters (see Evan Robinson on the evils of crunch mode for references).

If you’re coding and deploying at 2AM:

  • You’re more likely to write buggy code.
  • You’re more likely to make a mistake while deploying, breaking a production system.
  • If you do deploy successfully, but you’ve deployed buggy code, you’ll take longer to fix the problem… and the more time it takes the more likely you are to make an operational mistake.

And that’s just the short term cost. When you do start work the next day you’ll also be tired, and correspondingly less productive and more likely to make mistakes.

None of this is good for your customers.

Encouraging a culture of low productivity

If you’re a random developer cranking out code at 2AM the worse you can do is harm your product or production environment. If you’re the CTO, however, you’re also harming your organization.

By touting a 2AM deploy you’re encouraging your workers to work long hours, and to write and deploy code while exhausted. Which is to say, you’re encouraging your workers to engage in behavior that’s bad for the company. Tired workers are less productive. Tired workers make more mistakes. Is that really what you want from your developers?

Don’t crank out code at 2AM: it’s bad for you and your customers. And if you must, don’t brag about it publicly. At best it’s a guilty pleasure; bragging makes it a public vice.

Regards,

—Itamar

PS: While I’ve never cranked out code at 2AM, I’ve certainly made my own share of mistakes as a programmer. If you’d like to learn from my failings sign up for my newsletter where each week I cover one of my mistakes and how you can avoid it.

July 07, 2017 04:00 AM

June 27, 2017

Itamar Turner-Trauring

It may not be your fault, but it's always your responsibility

If you’re going to be a humble programmer, you need to start with the assumption that every reported bug is your fault. This is a good principle, but what if it turns out the user did something stupid? What if it really is a third-party library that is buggy, not your code?

Even if a bug isn’t your fault it’s still your responsibility, and in this post I’ll explain how to apply that in practice.

First, discover the source of the problem

A user has reported a bug: they’re using some code you wrote and something has gone horribly wrong. What can the source of the problem be?

  • User error: they did something they shouldn’t have, or they’re confused about what should have happened.
  • Environmental problems: their dependencies are slightly different than yours, their operating system is slightly different, and so on.
  • Third party bugs: someone else’s code is buggy, not yours.
  • A bug in your code: you made a mistake somewhere.

A good starting assumption is that you are at fault, that your code is buggy. It’s hard, I know: I often find myself assuming other people’s code is the problem, only to find it was my own mistake. But precisely because it’s so hard to blame oneself it’s better to start with that as the presumed cause, to help overcome the bias against admitting a mistake.

If something is a bug in your code then you can go and fix it. But sometimes users will have problems that aren’t caused by a bug in your code: sometimes users do silly things, or a library you depend on has a bug. What then?

Then, take responsibility

Even if the fault was elsewhere, you are still responsible, and you can take appropriate action.

User error

If the user made a mistake, or had a misunderstanding, that implies your design is at fault. Maybe your API encourages bad interaction patterns, maybe your error handling isn’t informative enough, maybe your user interface doesn’t ask users to confirm that yes, they want to delete all their data. Whatever the problem, user mistakes are something you can try to fix with a better design:

  • Give the API guide rails to keep users from doing unsafe operations.
  • Create better error messages, allowing the user to diagnose mistakes on their own.
  • Make the UI prevent dangerous operations.
  • Add an onboarding system to a complex UI.
  • Try to remove the UI altogether and just do the right thing.

If a better design is impossible, the next best thing to do is write some documentation, and explain why the users shouldn’t do that, or document a workaround. The worst thing to do is to dismiss user error as the user’s problem: if one person made a mistake, probably others will as well.

Environmental problems

If your code doesn’t work in a particular environment, well, that’s your responsibility too:

  • You can package your software in a more isolated fashion, so the environment affects it less.
  • You can make your software work in more environments.
  • You can add a sanity check on startup that warns users if their environment won’t work.

If all else fails, write some documentation.

Third party bugs

Found a bug in someone else’s code?

  • Stop supporting older versions of a library if it introduces bugs.
  • If it’s a new bug you’ve discovered, file a bug report so they can fix it.
  • Add a workaround to your code.

And again, if all else fails, write some documentation explaining a workaround.

It’s always your responsibility

Releasing code into the world is a responsibility: you are telling people they can rely on you. When a user reports a problem, there’s almost always something you can do. So take your responsibility seriously and fix the problem, regardless of whose fault it is.

Best of all is avoiding problems in the first place: I’ve made many mistakes you can avoid by signing up for my weekly newsletter. Every week I’ll share an engineering or career mistake and how you can avoid it.

June 27, 2017 04:00 AM

June 26, 2017

Moshe Zadka

Imports at a Distance

(Thanks to Mark Williams for feedback and research)

Imagine the following code:

## mymodule.py
import toplevel.nextlevel.lowmodule

def _func():
    toplevel.nextlevel.lowmodule.dosomething(1)

def main():
    _func()

Assuming the toplevel.nextlevel.module does define a function dosomething, this code seems to work just fine.

However, imagine that later we decide to move _func to a different module: : .. code:

# utilmodule.py
import toplevel

def _func():
    toplevel.nextlevel.lowmodule.dosomething(1)

This code will probably still work, as long as at some point, before calling _func, we import mymodule.

This introduces a subtle action-at-a-distance: the code will only stop working when we remove the import from mymodule -- or any other modules which import lowmodule.

Even unit tests will not necessarily catch the problem, depending on the order of imports of the unit tests. Static analyzers, like pylint and pyflakes, also cannot catch it.

The only safe thing to do is to eschew this import style completely, and always do from toplevel.nextlevel import lowmodule.

Addendum

Why is this happening?

Python package imports are a little subtle.

import toplevel

Does three things:

  • (Once only) Creates a toplevel entry in sys.modules
  • (Once only) Executes toplevel/__init__.py inside the namespace of toplevel
  • Creates a variable called toplevel and assigns it the module.

The things marked "once only" will only happen the first time toplevel is imported.

import toplevel.nextlevel

Does the same three things (with :code:`toplevel`) as well as:

  • (Once only) Creates a toplevel.nextlevel entry in sys.modules
  • (Once only) Executes:code:toplevel/nextlevel/__init__.py inside the namespace of toplevel.nextlevel
  • (Once only) Creates a variable nextlevel in the namespace of toplevel, and binds the toplevel.nextlevel module to it.

The third one is the most interesting one -- note that the first time toplevel.nextlevel is imported, a nextlevel variable is created in the namespace of toplevel, so that every subsequent place that imports toplevel can access nextlevel for "free".

by Moshe Zadka at June 26, 2017 05:20 AM

June 25, 2017

Moshe Zadka

X Why Zip

PEP 441 resulted in the creation of the zipapp module. The PEP says "Python has had the ability to execute directories or ZIP-format archives as scripts since version 2.6 [...] This feature is not as popular as it should be mainly because it was not promoted as part of Python 2.6." So far, so true -- the first time I saw the feature used in production, in Facebook, I was so shocked I had to take a Sweet Stop break.

However, more than a year before the PEP was created, and even longer than the PEP was implemented, the PEX format was contributed to the Python community by Twitter. It was, indeed, not well promoted. Indeed the lightning talk by Brian Wickman (creator of PEX) wouldn't be given for two more years.

However, at this point in time, PEX is a superior solution to zipapp in every single way:

  • It supports both Python 2 and Python 3.
  • It supports C extensions and non-zip-safe archives.
  • It has been used in production, by many people.

The only advantage zipapp has? It is in the standard library. This used to be a big advantage. However, Python packaging is good now and the biggest difference is that a module in the standard library can change in backwards-compatible ways extremely slowly, and in ways that evolve bad interfaces even slower. A module on PyPI can get regular updates, regular releases and, most importantly, if it is bad -- it can be supplanted by a new module, and see users naturally move to the new solution.

ZipApp is a non-solution for a non-problem. The solution for the problem of this feature not being well known is to talk about it more. I, and other people, have given multiple talks that involved the awesomeness of PEX (in SF Python, SF SRE, PyBay) and have written multiple posts and proofs of concept on my GitHub.

I have used PEX in production in three different companies, teaching my colleagues about it as a side-effect.

I wish more people would be giving talks, and writing posts. Using the standard library to reimplement a popular tool, that can iterate faster, not being bound to the Python release cycle, does not help anyone.

by Moshe Zadka at June 25, 2017 05:20 AM

June 23, 2017

Hynek Schlawack

Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

by Hynek Schlawack (hs@ox.cx) at June 23, 2017 12:00 AM

June 21, 2017

Itamar Turner-Trauring

The bad reasons you're forced to work long hours

Working long hours is unproductive, unhealthy, and unfortunately common. I strongly believe that working less hours is good for you and your employer, yet many companies and managers force you to work long hours, even as it decreases worker productivity.

So why do they do it? Let’s go over some of the reasons.

Leading by example

Some managers simply don’t understand that working long hours is counter-productive. Consider the founders of a startup. They love their job: the startup is their baby, and they are happy to works long hour to ensure it succeeds. That may well be inefficient and counter-productive, but they won’t necessarily realize this.

The employees that join afterwards take their cue from the founders: if the boss is working long hours, it’s hard not to do so yourself. And since the founders love what they’re building it never occurs to them that long hours might not be for everyone, or even might be an outright negative for the company. Similar situations can also happen in larger organizations, when a team lead or manager put in long hours out of a sense of dedication.

A sense of entitlement

A less tractable problem is a manager who thinks they own your life. Jason Fried describes this as a Managerial Entitlement Complex: the idea that if someone is paying you a salary they are entitled to every minute of your time.

In this situation the problem isn’t ignorance on the part of your manager. The problem is that your manager doesn’t care about you as a human being or even as an employee. You’re a resource provided by the human resources department, just like the office printer is provided by the IT department.

Control beats profits

Another problem is the fact that working hours are easy to measure, and therefore easy to control. When managers or companies see their employees as a cost center (and at least in the US the corporate culture is heavily biased against labor costs) the temptation to “control costs” by measuring and maximizing hours can be hard to resist.

Of course, this results in less output and so it is not rational behavior if the goal is to maximize profits. Would companies would actually choose labor control over productivity? Evidence from other industries suggests they would.

Up until the 1970s many farms in California forced their workers to use a short hoe, which involved bending over continuously. The result was a high rate of worker injuries. Employers liked the short hoe because they could easily control farm workers’ labor: because of the way the workers bent over when using the short hoe it was easy to see whether or not they were working.

After a series of strikes and lawsuits by the United Farm Workers the short hoe was banned. The result? According to the CEO of a large lettuce grower, productivity actually went up.

(I learned this story from the book Solving the Climate Crisis through Social Change, by Gar W. Lipow. The book includes a number of other examples and further references.)

Bad incentives, or Cover Your Ass

Bad incentives in one part of the company can result in long hours in another. Consider this scenario: the sales team, which is paid on commission, has promised a customer to deliver a series of features in a month. Unfortunately implementing those features will take 6 months. The sales team doesn’t care: they’re only paid for sales, and delivering the product isn’t their problem.

Now put yourself in the place of the tech lead or manager whose team has to implement those features. You can try to push back against the sales team’s promises, but in many companies that will result in being seen as “not a team player.” And when the project fails you and your team will be blamed by sales for not delivering on the company’s promises.

When you’ve been set up to fail, your primary goal is to demonstrate that the inevitable failure was not your fault. The obvious and perhaps only way for you to do this is to have your team work long hours, a visible demonstration of commitment and effort. “We did everything we could! We worked 12 hour days, 6 days a week but we just couldn’t do it.”

Notice that in this scenario the manager may be good at their job; the issue is the organization as a whole.

Hero syndrome

Hero syndrome is another organizational failure that can cause long working hours. Imagine you’re an engineer working for a startup that’s going through a growth spurt. Servers keep going down under load, the architecture isn’t keeping up, and there are lots of other growing pains. One evening the whole system goes down, and you stay up until 4AM bringing it back up. At the next company event you are lauded as a hero for saving the day… but no one devotes any resources to fixing the underlying problems.

The result is hero syndrome: the organization rewards those who save the day at the last minute, rather than work that prevents problems in the first place. And so they end up with a cycle of failure. Tired engineers making mistakes, lack of resources to build good infrastructure, and rewards for engineers who work long hours to try to duct tape a structure that is falling apart.

Avoiding bad companies

Working long hours is not productive. But since many companies don’t understand this, when you’re looking for a new job be on the lookout for the problems I described above. And if you’d like more tips to help you work a sane, productive workweek, check out my email course, the Programmer’s Guide to a Sane Workweek.

June 21, 2017 04:00 AM

June 19, 2017

Hynek Schlawack

Why Your Dockerized Application Isn’t Receiving Signals

Proper cleanup when terminating your application isn’t less important when it’s running inside of a Docker container. Although it only comes down to making sure signals reach your application and handling them, there’s a bunch of things that can go wrong.

by Hynek Schlawack (hs@ox.cx) at June 19, 2017 12:00 AM

June 14, 2017

Itamar Turner-Trauring

Lawyers, bad jokes and typos: how not to name your software

When you’re writing software you’ll find yourself naming every beast of the field, and every fowl of the air: projects, classes, functions, and variables. There are many ways to fail at naming projects, and when you do the costs of a bad name can haunt you for years.

To help you avoid these problems, let me share some of bad naming schemes I have been responsible for, observed, or had inflicted on me. You can do better.

Five naming schemes to avoid

They’re gonna try to take it

Rule #1: don’t give your software the same name as a heavy metal band, or more broadly anyone who can afford to have a lawyer on retainer.

Long ago, the open source Twisted project had a sub-package for managing other subprocesses. Twisted’s main package is called twisted, and the author decided to call this package twisted.sister. This was a mistake.

One day my company, which advertised Twisted consulting services, received a cease and desist letter from the lawyers of the band Twisted Sister. They indicated that Twisted’s use of the name Twisted Sister was a violation of the band’s intellectual property, demanded we stop immediately, after which they wanted to discuss damages. Since my company didn’t actually own Twisted this was a little confusing, but I passed this on to the project.

The project wrote the lawyers explaining that Twisted was run by hobbyists, just so it was clear there was no money to be had. Twisted also changed the package name from twisted.sister to twisted.sibling: none of us believed the lawyers’ claim had any validity, but no one wanted to deal with the hassle of fighting them.

A subject of ridicule

Rule #2: don’t pick a name that will allow people to make bad jokes about you.

Continuing with the travails of the Twisted project, naming the project “Twisted” was a mistake. Python developers have, until recent years, not been very comfortable with asynchronous programming, and Twisted is an async framework. Unfortunately, having a name with negative connotations meant this discomfort was verbalized in a way that associated it with the project.

“Twisted” led people to say things like “Twisted is so twisted,” over and over and over again. Other async libraries for Python, like asyncore or Tornado, had neutral names and didn’t suffer from this problem.

Bad metaphors

Rule #3: if you’re going to use an extended metaphor, pick one that makes sense.

Continuing to pick on Twisted yet again (sorry!), one of Twisted’s packages is a remote method invocation library, similar to Java RMI. The package is called twisted.spread, the wire format is twisted.spread.banana, the serialization layer is twisted.spread.jelly, and the protocol itself is twisted.spread.pb.

This naming scheme, based on peanut butter and jelly sandwiches, has a number of problems. To begin with, PB&J is very American, and software is international. As a child and American emigrant living in a different country, the peanut butter and banana sandwiches my mother made led to ridicule by my friends.

Minor personal traumas aside, this naming scheme has no relation to what the software actually does. Silliness is a fine thing, but names should also be informative. The Homebrew project almost falls into this trap, with formulas and taps and casks and whatnot. But while the metaphor is a little unstable on its feet, it’s not quite drunk enough to completely fall over.

Typos

Rule #4: avoid names with common typos.

One of my software projects is named Crochet. Other people—and I make this typo too, to be fair—will mistakenly write “crotchet” instead, which the dictionary describes as “a perverse fancy; a whim which takes possession of the mind; a conceit.”

Bonus advice: you may wish to avoid whims or conceits when naming your software projects.

I can’t even

Rule #5: avoid facepalms.

I used to work for a company named ClusterHQ, and our initial product was named Flocker. When the company shut down the CEO wrote a blog post titled ClusterF**ed.

Why you shouldn’t listen to my advice

Twisted has many users, from individuals to massive corporations. My own project, Crochet, has a decent number. ClusterHQ shut down for reasons that had nothing to do with its name. So it’s not clear any of this makes a difference.

You should certainly avoid names that confuse your users, and you’ll be happier if you can avoid lawsuits. But if you’re going to be writing software all day, you should enjoy yourself while you do. If your programming language supports Unicode symbols, why not use emoji in your project name?

🙊🙉🙈 has a nice sound to it.

By the way, if you’d like to learn how to avoid my many mistakes, subscribe to my weekly newsletter. Every week I share one of my programming or career mistakes and how you can avoid it.

June 14, 2017 04:00 AM

June 12, 2017

Hynek Schlawack

Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

by Hynek Schlawack (hs@ox.cx) at June 12, 2017 10:00 AM

June 11, 2017

Twisted Matrix Laboratories

Twisted 17.5.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 17.5!

The highlights of this release are:

  • twisted.python.url has been spun out into the new 'hyperlink' package; importing twisted.python.url is now a compatibility alias
  • Initial support for OpenSSL 1.1.0.
  • Fixes around the reactor DNS resolver changes in 17.1, solving all known regressions
  • Deferred.asFuture and Deferred.fromFuture, to allow you to map asyncio Futures to Twisted Deferreds and vice versa, for use the with Python 3+ asyncioreactor in Twisted
  • Support for TLS 1.3 ciphersuites, in advance of a released OpenSSL to enable the protocol
  • Further Python 3 support in twisted.web, initial support in twisted.mail.smtp.

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by Amber Brown (noreply@blogger.com) at June 11, 2017 01:22 AM

June 10, 2017

Moshe Zadka

Nitpicks are for Robots

Many projects and organizations have a coding standard. For Python, much of the time, the standard is a variant of PEP 8. During code reviews, often the reviewer will point out where the code disagrees with the standard, and ask for it to be fixed. Sometimes, if the reviewer is self-aware enough, they will precede those comments with nit or nitpick.

As Raymond Hettinger points out in Beyond PEP8, this gives a reviewer "something to do" without needing a lot of understanding, and will sometimes be used by people who feel overwhelmed at their job. Often these people need to be helped (mentored, tutored or sometimes just encouraged) but those code reviews mask their difficulties with an illusion of productivity.

Moreover, this often feels to the original coder as a put-down attempt ("I know the standard better than you") or simply as an attempt to slow them down and derail them. It does not matter that this is rarely the real intention (although, in sufficiently toxic teams, it can be) it is hard to see a comment on every other line saying "there is a missing space after '3'" or "funcName should be func_name" without feeling that enough is enough, and clearly the reviewer is just trying to make themselves feel important.

The proper solution for that is to automate the nitpicks. There are many possible tools to use there: flake8 (or a custom flake8 plugin), pylint (or a custom pylint plugin) or even, if your needs are unique enough, write a custom tool.

Ideally, there is already CI in place for the tool, using Tox and some CI runner that integrates with your development life cycle (otherwise, how would the unit tests run?) and extending Tox with a new environment for running whatever automatic nitpicker (or nitpickers) should be straightforward.

When adding a new nitpicking rule, some sort of grandfathering strategy needs to be added. My favorite one, unfortunately, takes some time to implement: have Tox run the linting tool, itself, in "ignore process exit code" mode. Capture the output, and compare it to a list of exceptions kept in a special "list of exceptions" file. Now -- and this is the important thing -- any deviation is an error. In particular, anything that fixes a nitpick should also remove it from the list of exceptions.

Of course, this means that it is possible to add to the list of exceptions. But now it is clearly visible in the code diff. The code review will naturally ask "why", since clearly conventions have been intentionally violated. In a rare case, there might be a reason that is good. But the important thing: now the question is not arbitrary, asked by a human with possibly impure motivations -- and so, will be answered in honesty and with good intentions assumed.

by Moshe Zadka at June 10, 2017 05:20 AM

June 05, 2017

Jp Calderone

Twisted Web in 60 Seconds: HTTP/2

Hello, hello. It's been a long time since the last entry in the "Twisted Web in 60 Seconds" series. If you're new to the series and you like this post, I recommend going back and reading the older posts as well.

In this entry, I'll show you how to enable HTTP/2 for your Twisted Web-based site. HTTP/2 is the latest entry in the HTTP family of protocols. It builds on work from Google and others to improve performance (and other) shortcomings of the older HTTP/1.x protocols in wide-spread use today.

Twisted implements HTTP/2 support by building on the general-purpose H2 Python library. In fact, all you have to do to have HTTP/2 for your Twisted Web-based site (starting in Twisted 16.3.0) is install the dependencies:

$ pip install twisted[http2]

Your TLS-based site is now available via HTTP/2! A future version of Twisted will likely extend this to non-TLS sites (which requires the Upgrade: h2c handshake) with no further effort on your part.

by Jean-Paul Calderone (noreply@blogger.com) at June 05, 2017 05:56 PM

June 03, 2017

Jonathan Lange

SPAKE2 in Haskell: What is SPAKE2?

Last post, I discussed how I found myself implementing SPAKE2 in Haskell. Here, I want to discuss what SPAKE2 is, and why you might care.

I just want to send a file over the internet

Long ago, Glyph lamented that all he wanted to do was send a file over the internet. Since then, very little has changed.

If you want to send a file to someone, you either attach it to an email, or you upload it to some central, cloud-based service that you both have access to: Google Drive; Dropbox; iCloud; etc.

But what if you don’t want to do that? What if you want to send a file directly to someone, without intermediaries?

First, you need to figure out where they are. That’s not what SPAKE2 does.

Once you have figured out where they are, you need:

  1. Their permission
  2. Assurance that you are sending the file to the right person
  3. A secure channel to send the actual data

SPAKE2 helps with all three of these.

What is SPAKE2?

SPAKE2 is a password-authenticated key exchange (PAKE) protocol.

This means it is a set of steps (a protocol) to allow two parties to share a session key (“key exchange”) based on a password that they both know (“password-authenticated”).

There are many such protocols, but as mentioned last post, I know next to nothing about cryptography, so if you want to learn about them, you’ll have to go elsewhere.

SPAKE2 is designed under a certain set of assumptions and constraints.

First, we don’t know if the person we’re talking to is the person we think we are talking to, but we want to find out. That is, we need to authenticate them, and we want to use the password to do this (hence “password-authenticated”).

Second, the shared password is expected to be weak, such as a PIN code, or a couple of English words stuck together.

What does this mean?

These assumptons have a couple of implications.

First, we want to give our adversary as few chances as possible to guess the password. The password is precious, we don’t want to lose it. If someone discovers it, they could impersonate us or our friends, and gain access to precious secrets.

Specifically, this means we want to prevent offline dictionary attacks (where the adversary can snapshot some data and run it against all common passwords at their leisure) against both eavesdropping adversaries (those snooping on our connection) and active adversaries (people pretending to be our friend).

Second, we don’t want to use the password as the key that encrypts our payload. We need to use it to generate a new key, specific to this session. If we re-use passwords, eventually we’ll send some encrypted content for which the plaintext content is known, the eavesdropper will find this and be able to brute force at their leisure.

How does SPAKE2 solve this?

To explain how SPAKE2 solves this, it can help to go through a couple of approaches that definitely do not work.

For example, we could just send the password over the wire. This is a terrible idea. Not only does it expose the password to eavesdroppers, but it also gives us no evidence that the other side knows the password. After all, we could send them the password, and they could send it right back.

We need to send something over the wire that is not the password, but that could only have been generated with the password.

So perhaps our next refinement might be that we send our name, somehow cryptographically signed with password.

This is better than just sending the password, but still leaves us exposed to offline dictionary attacks. After all, our name is well-known in plain text, so an eavesdropper can look out for it in the protocol, snaffle up the ciphertext, and then run a dictionary against it at their leisure. It also leaves open the question of how we will generate a session key.

SPAKE2 goes a few steps further than this. Rather than sending a signed version of some known text, each side sends an “encrypted” version of a random value, using the password as a key.

Each side then decrypts the value it receives from the other side, and then uses its random value and the other random value as inputs to a hash function that generates a session key.

If the passwords are the same, the session key will be the same, and both sides will be able to communicate.

That is the shorter answer for “How does SPAKE2 work?”. The longer answer involves rather a lot of mathematics.

Show me the mathematics

When I was learning SPAKE2, this was a bit of a problem for me. I hit three major hurdles.

  1. Notation—maths just has obscure notation
  2. Terminology—maths uses non-descriptive words for concepts
  3. Concepts—some are merely unfamiliar, others genuinely difficult

In this post, I want to help you over all of these hurdles, such that you can go and read papers and blog posts by people who actually understand what they are talking about. This means that I’ll try to go out of my way to explain the notation and terminology while also going through the core concepts.

I want to stress that I am not an expert. What you’re reading here is me figuring this out for myself, with a little help from my friends and the Internet.

Overview

We can think of SPAKE2 as having five stages:

  1. Public system parameters, established before any exchange takes place
  2. A secret password shared between two parties
  3. An exchange of data
  4. Using that exchange to calculate a key
  5. Generating a session key

We’ll deal with each in turn.

System parameters

First, we start with some system parameters. These are things that both ends of the SPAKE2 protocol need to have baked into their code. As such, they are public.

These parameters are:

  • a group, \(G\), of prime order \(p\)
  • a generator element, \(g\), from that group
  • two arbitrary elements, \(M\), \(N\), of the group

What’s a group? A group \((G, +)\) is a set together with a binary operator such that:

  • adding any two members of the group gets you another member of the group (closed under \(+\))
  • the operation + is associative, that is \(X + (Y + Z) = (X + Y) + Z\) (associativity)
  • there’s an element, \(0\), such that \(X + 0 = X = 0 + X\) for any element x in the group (identity)
  • for every element, \(X\), there’s an element \(-X\), such that \(X + (-X) = 0\) (inverse)

It’s important to note that \(+\) stands for a generic binary operation with these properties, not necessarily any kind of addition, and \(0\) stands for the identity, rather than the numeral 0.

To get a better sense of this somewhat abstract concept, it can help to look at a few concrete examples. These don’t have much to do with SPAKE2 per se, they are just here to help explain groups.

The integers with addition form a group with \(0\) as the identity, because you can add and subtract (i.e. add the negative) them and get other integers, and because addition is associative.

The integers with multiplication are not a group, because what’s the inverse of 2?

But the rational numbers greater than zero with multiplication do form a group, with 1 as the identity.

Likewise, integers with multiplication modulo some fixed number do form a group—a finite group. For example, for integers with multiplication modulo 7, the identity is 1, multiplication is associative, and the inverse of 2 is 4, because \((2 \cdot 4) \mod 7 = 1\).

But but! When we are talking about groups in the abstract, we’ll still call the operation \(+\) and the identity \(0\), even if the implementation is that the operation is multiplication.

But but but! This is not at all a universally followed convention, so when you are reading about groups, you’ll often see the operation written as a product (e.g. \(XY\) or \(X \cdot Y\) instead of \(X + Y\)) and the identity written as \(1\).

Still with me?

Why groups?

You might be wondering why we need this “group” abstraction at all. It might seem like unnecessary complexity.

Abstractions like groups are a lot like the programming concept of an abstract interface. You might write a function in terms of an interface because you want to allow for lots of different possible implementations. Doing so also allows you to ignore details about specific concrete implementations so you can focus on what matters—the external behaviour.

It’s the same thing here. The group could be an elliptic curve, or something to do with prime numbers, or something else entirely—SPAKE2 doesn’t care. We want to define our protocol to allow lots of different underlying implementations, and without getting bogged down in how they actually work.

For SPAKE2, we have an additional requirement for the group: it is finite and has a prime number of elements. We’ll use \(p\) to refer to this number—this is what is meant by “of prime order \(p\)” above.

Due to the magic of group theory [1], this gives \(G\) some bonus properties:

  • it is cyclic, we can generate all of the elements of the group by picking one (not the identity) and adding it to itself over and over
  • it is abelian, that is \(X + Y = Y + X\), for any two elements \(X\), \(Y\) in \(G\) (commutativity)

Which explains what we mean by “a generator element”, \(g\), it’s just an element from the group that’s not the identity. We can use it to make any other element of the group by adding it to itself. If the group weren’t cyclic, we could, in general, only use \(g\) to generate a subgroup.

The process of adding an element to itself over and over is called scalar multiplication [2]. In mathematical notation, we write it like this:

\begin{equation*} Y = nX \end{equation*}

Or slightly more verbosely like:

\begin{equation*} Y = n \cdot X \end{equation*}

Where \(n\) is an integer and \(X\) is a member of the group, and \(Y\) is the result of adding \(X\) to itself \(n\) times.

If \(n\) is 0, \(Y\) is \(0\). If \(n\) is 1, \(Y\) is \(X\).

Just as sometimes the group operator is written with product notation rather than addition, so to scalar multiplication is sometimes written with exponentiation, to denote multiplying a thing by itself n times. e.g.

\begin{equation*} Y = X^n \end{equation*}

I’m going to stick to the \(n \cdot X\) notation in this post, and I’m always going to put the scalar first.

Also, I am mostly going to use upper case (e.g. \(X\), \(Y\)) to refer to group elements (with the exception of the generator element, \(g\)) and lower case (e.g. \(n\), \(k\)) to refer to scalars. I’m going to try to be consistent, but it’s always worth checking for yourself.

Because the group \(G\) is cyclic, if we have some group element \(X\) and a generator \(g\), there will always be a number, \(k\), such that:

\begin{equation*} X = k \cdot g \end{equation*}

Here, \(k\) would be called the discrete log of \(X\) with respect to base \(g\). “Log” is a nod to exponentiation notation, “discrete” because this is a finite group.

Time to recap.

SPAKE2 has several public parameters, which are

  • a group, \(G\), of prime order \(p\), which means it’s cyclic, abelian, and we can do scalar multiplication on it
  • a generator element, \(g\), from that group, that we will do a lot of scalar multiplication with
  • two arbitrary elements, \(M\), \(N\), of the group, where no one knows the discrete log [3] with respect to \(g\).

Shared secret password

The next thing we need to begin a SPAKE2 exchange is a shared secret password.

In human terms, this will be a short string of bytes, or a PIN.

In terms of the mathematical SPAKE2 protocol, this must be a scalar, \(pw\).

How we go from a string of bytes to a scalar is completely out of scope for the SPAKE2 paper. This confused me when trying to implement SPAKE2 in Haskell, and I don’t claim to fully understand it now.

We HKDF expand the password in order to get a more uniform distribution of scalars [4]. This still leaves us with a byte string, though.

To turn that into an integer (i.e. a scalar), we simply base256 decode the byte string.

This gives us \(pw\), which we use in the next step.

Data exchange

At this point, the user has entered a password and we’ve converted it into a scalar.

We need some way to convince the other side that we know the password without actually sending the password to them.

This means two things:

  1. We have to send them something based on the password
  2. We get to use all of the shiny mathematics we introduced earlier

The process for both sides is the same, but each side needs to know who’s who. One side is side A, and other is side B, and how they figure out which is which happens outside the protocol.

Each draw a random scalar between \(0\) and \(p\): \(x\) for side A, \(y\) for side B. They then use that to generate an element: \(X = x \cdot g\) for side A, \(Y = y \cdot g\) for side B.

They then “blind” this value by adding it to one of the elements that make up the system parameters, scalar multiplied by \(pw\), our password.

Thus, side A makes \(X^{\star} = X + pw \cdot M\) and side B makes \(Y^{\star} = Y + pw \cdot N\).

They then each send this to the other side and wait to receive the equivalent message.

Again, the papers don’t say how to encode the message, so python-spake2 just base-256 encodes it and has side A prepend the byte A (0x41) and side B prepend B (0x42).

Calculating a key

Once each side has the other side’s message, they can start to calculate a key.

Side A calculates its key like this:

\begin{equation*} K_A = x \cdot (Y^{\star} - pw \cdot N) \end{equation*}

The bit inside the parentheses is side A recovering \(Y\), since we defined \(Y^{\star}\) as:

\begin{equation*} Y^{\star} = Y + pw \cdot N \end{equation*}

We can rewrite that in terms of \(Y\) by subtracting \(pw \cdot N\) from both sides:

\begin{equation*} Y = Y^{\star} - pw \cdot N \end{equation*}

Which means, as long as both sides have the same value for \(pw\), can swap in \(Y\) like so:

\begin{align*} K_A& = x \cdot Y \\ & = x \cdot (y \cdot g) \\ & = xy \cdot g \end{align*}

Remember that when we created \(Y\) in the first place, we did so by multiplying our generator \(g\) by a random scalar \(y\).

Side B calculates its key in the same way:

\begin{align*} K_B& = y \cdot (X^{\star} - pw \cdot N) \\ & = y \cdot X \\ & = y \cdot (x \cdot g) \\ & = yx \cdot g \\ & = xy \cdot g \end{align*}

Thus, if both sides used the same password, \(K_A = K_B\).

Generating a session key

Both sides now have:

  • \(X^{\star}\)
  • \(Y^{\star}\)
  • Either \(K_A\) or \(K_B\)
  • \(pw\), or at least their own opinion of what \(pw\) is

To these we add the heretofore unmentioned \(A\) and \(B\), which are meant to be identifiers for side A and side B respectively. Each side presumably communicates these to each other out-of-band to SPAKE2.

We then hash all of these together, using a hash algorithm, \(H\), that both sides have previously agreed upon, so that:

\begin{equation*} SK = H(A, B, X^{\star}, Y^{\star}, pw, K) \end{equation*}

Where \(K\) is either \(K_A\) or \(K_B\).

I don’t really understand why this step is necessary—why not use \(K\)? Nor do I understand why each of the inputs to the hash is necessary—\(K\) is already derived from \(X^{\star}\), why do we need \(X^{\star}\)?

In the code, we change this ever so slightly:

\begin{equation*} SK = H(H(pw), H(A), H(B), X^{\star}, Y^{\star}, K) \end{equation*}

Basically, we hash all of the variable length fields to make them fixed length to avoid collisions between values. [5]

python-spake2 uses SHA256 as the hash algorithm for this step. I’ve got no idea why this and not, say, HKDF.

And this is the session key. SPAKE2 is done!

Did SPAKE2 solve our problem?

We wanted a way of authenticating a remote connection using a password, without having to share that password, and without using that password to encrypt known plaintext. We’ve done that.

The SPAKE2 protocol above will result in two sides negotiating a shared session key, sending only randomly generated data over the wire.

Is it vulnerable to offline dictionary attacks? No. The value we send over the wire is just a random group element encrypted with the password. Even if an eavesdropper gets that value and runs a dictionary against it, they’ll have no way of determining whether they’ve cracked it or not. After all, one random value looks very much like another.

Where to now?

I’m looking forward to learning about elliptic curves, and to writing about what it was like to use Haskell in particular to implement SPAKE2.

I learned a lot implementing SPAKE2, then learned a lot more writing this post, and have much to learn still.

But perhaps the biggest thing I’ve learned is that although maths isn’t easy, it’s at least possible, and that sometimes, if you want to send a file over the Internet, what you really need is a huge pile of math.

Let me know if I’ve got anything wrong, or if this inspires you do go forth and implement some crypto papers yourself.

Thanks

This post owes a great deal to Brian Warner’s “Magic Wormhole” talk, to Jake Edge’s write-up of that talk, and to Michel Abdalla and David Pointcheval’s paper “Simple Password-Based Encrypted Key Exchange Protocols”.

Bice Dibley, Chris Halse Rogers, and JP Viljoen all read early drafts and provided helpful suggestions. This piece has been much improved by their input. Any infelicities are my own.

I wouldn’t have written this without being inspired by Julia Evans. Julia often shares what she’s learning as she learns it, and does a great job at making difficult topics seem approachable and fun. I highly recommend her blog, especially if you’re into devops or distributed systems.

[1]I used to know the proof for this but have since forgotten it, so I’m taking this on faith for now.
[2]With scalar multiplication, we aren’t talking about a group, but rather a \(\mathbb{Z}\)-module. But at this point, I can’t even, so look it up on Wikipedia if you’re interested.
[3]Taking this on faith too.
[4]Yup, faith again.
[5]I only sort of understand why this is necessary.

by Jonathan Lange at June 03, 2017 11:00 PM

June 01, 2017

Glyph Lefkowitz

The Sororicide Antipattern

Composition is better than inheritance.”. This is a true statement. “Inheritance is bad.” Also true. I’m a well-known compositional extremist. There’s a great talk you can watch if I haven’t talked your ear off about it already.

Which is why I was extremely surprised in a recent conversation when my interlocutor said that while inheritance might be bad, composition is worse. Once I understood what they meant by “composition”, I was even more surprised to find that I agreed with this assertion.

Although inheritance is bad, it’s very important to understand why. In a high-level language like Python, with first-class runtime datatypes (i.e.: user defined classes that are objects), the computational difference between what we call “composition” and what we call “inheritance” is a matter of where we put a pointer: is it on a type or on an instance? The important distinction has to do with human factors.

First, a brief parable about real-life inheritance.


You find yourself in conversation with an indolent heiress-in-waiting. She complains of her boredom whiling away the time until the dowager countess finally leaves her her fortune.

“Inheritance is bad”, you opine. “It’s better to make your own way in life”.

“By George, you’re right!” she exclaims. You weren’t expecting such an enthusiastic reversal.

“Well,”, you sputter, “glad to see you are turning over a new leaf”.

She crosses the room to open a sturdy mahogany armoire, and draws forth a belt holstering a pistol and a menacing-looking sabre.

“Auntie has only the dwindling remnants of a legacy fortune. The real money has always been with my sister’s manufacturing concern. Why passively wait for Auntie to die, when I can murder my dear sister now, and take what is rightfully mine!”

Cinching the belt around her waist, she strides from the room animated and full of purpose, no longer indolent or in-waiting, but you feel less than satisfied with your advice.

It is, after all, important to understand what the problem with inheritance is.


The primary reason inheritance is bad is confusion between namespaces.

The most important role of code organization (division of code into files, modules, packages, subroutines, data structures, etc) is division of responsibility. In other words, Conway’s Law isn’t just an unfortunate accident of budgeting, but a fundamental property of software design.

For example, if we have a function called multiply(a, b) - its presence in our codebase suggests that if someone were to want to multiply two numbers together, it is multiply’s responsibility to know how to do so. If there’s a problem with multiplication, it’s the maintainers of multiply who need to go fix it.

And, with this responsibility comes authority over a specific scope within the code. So if we were to look at an implementation of multiply:

1
2
3
def multiply(a, b):
    product = a * b
    return product

The maintainers of multiply get to decide what product means in the context of their function. It’s possible, in Python, for some other funciton to reach into multiply with frame objects and mangle the meaning of product between its assignment and return, but it’s generally understood that it’s none of your business what product is, and if you touch it, all bets are off about the correctness of multiply. More importantly, if the maintainers of multiply wanted to bind other names, or change around existing names, like so, in a subsequent version:

1
2
3
4
5
def multiply(a, b):
    factor1 = a
    factor2 = b
    result = a * b
    return result

It is the maintainer of multiply’s job, not the caller of multiply, to make those decisions.

The same programmer may, at different times, be both a caller and a maintainer of multiply. However, they have to know which hat they’re wearing at any given time, so that they can know which stuff they’re still repsonsible for when they hand over multiply to be maintained by a different team.

It’s important to be able to forget about the internals of the local variables in the functions you call. Otherwise, abstractions give us no power: if you have to know the internals of everything you’re using, you can never build much beyond what’s already there, because you’ll be spending all your time trying to understand all the layers below it.

Classes complicate this process of forgetting somewhat. Properties of class instances “stick out”, and are visible to the callers. This can be powerful — and can be a great way to represent shared data structures — but this is exactly why we have the ._ convention in Python: if something starts with an underscore, and it’s not in a namespace you own, you shouldn’t mess with it. So: other._foo is not for you to touch, unless you’re maintaining type(other). self._foo is where you should put your own private state.

So if we have a class like this:

1
2
3
class A(object):
    def __init__(self):
        self._note = "a note"

we all know that A()._note is off-limits.

But then what happens here?

1
2
3
4
class B(A):
    def __init__(self):
        super().__init__()
        self._note = "private state for B()"

B()._note is also off limits for everyone but B, except... as it turns out, B doesn’t really own the namespace of self here, so it’s clashing with what A wants _note to mean. Even if, right now, we were to change it to _note2, the maintainer of A could, in any future release of A, add a new _note2 variable which conflicts with something B is using. A’s maintainers (rightfully) think they own self, B’s maintainers (reasonably) think that they do. This could continue all the way until we get to _note7, at which point it would explode violently.


So that’s why Inheritance is bad. It’s a bad way for two layers of a system to communicate because it leaves each layer nowhere to put its internal state that the other doesn’t need to know about. So what could be worse?

Let’s say we’ve convinced our junior programmer who wrote A that inheritance is a bad interface, and they should instead use the panacea that cures all inherited ills, composition. Great! Let’s just write a B that composes in an A in a nice clean way, instead of doing any gross inheritance:

1
2
3
4
class Bprime(object):
    def __init__(self, a):
        for var in dir(a):
            setattr(self, var, getattr(a, var))

Uh oh. Looks like composition is worse than inheritance.


Let’s enumerate some of the issues with this “solution” to the problem of inheritance:

  • How do we know what attributes Bprime has?
  • How do we even know what type a is?
  • How is anyone ever going to grep for relevant methods in this code and have them come up in the right place?

We briefly reclaimed self for Bprime by removing the inheritance from A, but what Bprime does in __init__ to replace it is much worse. At least with normal, “vertical” inheritance, IDEs and code inspection tools can have some idea where your parents are and what methods they declare. We have to look aside to know what’s there, but at least it’s clear from the code’s structure where exactly we have to look aside to.

When faced with a class like Bprime though, what does one do? It’s just shredding apart some apparently totally unrelated object, there’s nearly no way for tooling to inspect this code to the point that they know where self.<something> comes from in a method defined on Bprime.

The goal of replacing inheritance with composition is to make it clear and easy to understand what code owns each attribute on self. Sometimes that clarity comes at the expense of a few extra keystrokes; an __init__ that copies over a few specific attributes, or a method that does nothing but forward a message, like def something(self): return self.other.something().

Automatic composition is just lateral inheritance. Magically auto-proxying all methods1, or auto-copying all attributes, saves a few keystrokes at the time some new code is created at the expense of hours of debugging when it is being maintained. If readability counts, we should never privilege the writer over the reader.


  1. It is left as an exercise for the reader why proxyForInterface is still a reasonably okay idea even in the face of this criticism.2 

  2. Although ironically it probably shouldn’t use inheritance as its interface. 

by Glyph at June 01, 2017 06:25 AM

Itamar Turner-Trauring

The best place to practice your programming skills

If you want to be a better programmer, what is the best way to practice and improve your skills? You can improve your skills by working on a side project, but what if you don’t have the free time or motivation? You can learn from a mentor or more experienced programmers, but what if you don’t work with any?

There is one more way you can practice skills, and in many ways it’s the ideal way. It doesn’t require extra time, or experienced colleagues: all it takes is a job as a software engineer. The best way to improve your skills is by practicing them on the job.

Now, when you see the word “skills” you might immediately translate that into “tools and technologies”. And learning new technologies on the job is not always possible, especially when your company is stuck on one particular tech stack. But your goal as a programmer isn’t to use technologies.

Your goal is to solve problems, and writing software is a technique that helps you do that. You job might be helping people plan their travel, or helping a business sell more products, or entertaining people. These goals might require technology, but the technology is a means not an end. So while understanding particular technologies is both useful and necessary, it is just the start of the skills you need as a programmer.

The skills you need

Many of the most important skills you need as a programmer have nothing to do with the particulars of any technology, and everything to do with learning how to better identify and solve problems.

Here are some of the skills that will help you identify problems:

  • Business goals: you choose what problems to focus on based on your organization’s goals.
  • Root cause analysis: when a problem is presented you don’t just accept it, but rather dig in and try to figure out the underlying problem.
  • Identifying development bottlenecks: you notice when software development is slowing down and try to figure out why.

And some of the skills you’ll need to solve problems once they’re found:

  • Self-management: you can organize your own time while working on a particular solution. You stay focused and on task, and if things are taking you too long you’ll notice and ask for help.
  • Planning your own software project: given a problem statement and a high-level solution, you can break up the task into the necessary steps to implement that solution. You can then take those steps and figure out the best order in which to implement them.
  • Debugging: when faced with a bug you don’t understand you are able to figure out what is causing it, or at least how to work around it.

These are just some of the skills you can practice and improve during your normal workday. And while there are ways your organization can help your learning—like using newer technologies, or having expert coworkers—you can practice these particular skills almost anywhere.

The best practice is the real thing

Every time you solve a problem at work you are also practicing identifying and solving problems, and practicing under the most realistic of situations. It’s a real problem, you have limited resources, and there’s a deadline.

Compare this to working on a side project. With a side project you have come up with the goal on your own, and there’s no real deadline. It’s possible to make side projects more realistic, but it’s never quite the same as practicing the real thing.

Here’s how to practice your skills at work:

  1. Pick a particular skill you want to work on.
  2. Pay attention to how you apply it: when do you use this skill? Are you conscious of when it’s necessary and how you’re applying it? Even just realizing something is a skill and then paying attention to how and when you use it can help you improve it.
  3. Figure out ways to measure the skill’s impact, and work on improving the way you apply it. With debugging, for example, you can see how long it takes you to discover a problem. Then notice what particular techniques and questions speed up your debugging, and make sure you apply them consistently.
  4. Learn from your colleagues. Even if they’re no more experienced than you, they still have different experiences, and might have skills and knowledge you don’t.
  5. Try to notice the mistakes you make, and the cues and models that would have helped you avoid a particular mistake. This is what I do in my weekly newsletter where I share my own mistakes and what I’ve learned from them.
  6. If possible, find a book on the topic and skim it. You can skim a book in just an hour or two and come away with some vague models and ideas for improvement. As you do your job and notice yourself applying the skill those models will come to mind. You can then read a small, focused, and relevant part of the book in detail at just the right moment: when you need to apply the skill.

Practicing on the job

Your job is always the best place to practice your skills, because that is where you will apply your skills. The ideal, of course, is to find a job that lets you try new technologies and learn from experienced colleagues. But even if your job doesn’t achieve that ideal, there are many critical skills that you can practice at almost any job. Don’t waste your time at work: every workday can also be a day where you are learning.

June 01, 2017 04:00 AM

May 30, 2017

Hynek Schlawack

On Conference Speaking

I’ve seen quite a bit of the world thanks to being invited to speak at conferences. Since some people are under the impression that serial conference speakers possess some special talents, I’d like to demystify my process by walking you through my latest talk from start to finish.

by Hynek Schlawack (hs@ox.cx) at May 30, 2017 12:00 AM

May 26, 2017

Jonathan Lange

SPAKE2 in Haskell: the journey begins

There’s a joke about programmers that’s been doing the rounds for the last couple of years:

We do these things not because they are easy, but because we thought they would be easy.

This is about how I became the butt of a tired, old joke.

My friend Jean-Paul decided to start learning Haskell by writing a magic-wormhole client.

magic-wormhole works in part by negotiating a session key using SPAKE2: a password-authenticated key exchange protocol, so one of the first things Jean-Paul needed was a Haskell implementation.

Eager to help Jean-Paul on his Haskell journey, I volunteered to write a SPAKE2 implementation in Haskell. After all, there’s a pure Python implementation, so all I’d need to do is translate it from Python to Haskell. I know both languages pretty well. How hard could it be?

Turns out there are a few things I hadn’t really counted on.

I know next to nothing about cryptography

Until now, I could summarise what I knew about cryptography into two points:

  1. It works because factoring prime numbers is hard
  2. I don’t know enough about it to implement it reliably, I should use proven, off-the-shelf components instead

This isn’t really a solid foundation for implementing crypto code. In fact, it’s a compelling argument to walk away while I still had the chance.

My ignorance was a particular liability here, since python-spake2 assumes a lot of cryptographic knowledge. What’s HKDF? What’s Ed25519? Is that the same thing as Curve25519? What’s NIST? What’s q in this context?

python-spake2 also assumes a lot of knowledge about abstract algebra. This is less of a problem for me, since I studied a lot of that at university. However, it’s still a problem. Most of that knowledge has sat unused for fifteen or so years. Dusting off those cobwebs took time.

My rusty know-how was especially obvious when reading the PDFs that describe SPAKE2. Mathematical notation isn’t easy to read, and every subdiscipline has its own special variants (“Oh, obviously q means the size of the subgroup. That’s just convention.”)

For example, I know that what’s in spake2/groups.py is the multiplicative group of integers modulo n, and I know what “the multiplicative group of integers modulo n” means, but I understand about 2% of the Wikipedia page on the subject, and I have even less understanding about how the group is relevant to cryptography.

The protocol diagrams that appear in the papers I read were a confusing mess of symbols at first. It took several passes through the text, and a couple of botched explanations to patient listeners on IRC before I really understood them. These diagrams now seem clear & succinct to me, although I’m sure they could be written better in code.

python-spake2 is idiosyncratic

The python-spake2 source code is made almost entirely out of object inheritance and mutation, which makes it hard for me to follow, and hard to transliterate into Haskell, where object inheritance and mutation are hard to model.

This is a very minor criticism. With magic-wormhole and python-spake2, Warner has made creative, useful software that solves a difficult problem and meets a worthwhile need.

crypto libraries rarely have beginner-friendly documentation

python-spake2 isn’t alone in assuming cryptographic knowledge. The Haskell library cryptonite is much the same. Most documentation I could find about various topics on the web pointed to cr.yp.to pages, which either link to papers or C code.

I think this is partly driven by a concern for user safety, “if you don’t understand it, you shouldn’t be using it”. Maybe this is a good idea. The problem is that it can be hard to know where to start in order to gain that understanding.

To illustrate, I now sort of get how an elliptic curve might form a group, but have forgotten enough algebra to not know about what subgroups there are, how that’s relevant to the implementation of ed25519, how subgroups and groups relate to fields, to say nothing of how elliptic curve cryptography actually works.

I don’t really know where to go to remedy this ignorance, although I’m pretty sure doing so is within my capabilities, I just need to find the right book or course to actually teach me these things.

Protocols ain’t protocols

The mathematics of SPAKE2 are fairly clearly defined, but there is a large gap between “use this group element” and sending some bits over the wire.

python-spake2 doesn’t clearly distinguish between the mathematics of SPAKE2 and the necessary implementation decisions it makes in order to be a useful networked protocol.

This meant that when translating, it was hard to tell what was an essential detail and what was accidental detail. As Siderea eloquently points out, software is made of decisions. When writing the Haskell version, which decisions do I get to make, and which are forced upon me? Must this salt be the empty string? Can I generate the “blind” any way I want?

Eventually, I found a PR implementing SPAKE2 (and SPAKE2+, SPAKE2-EE, etc.) in Javascript. From the discussion there, I was able to synthesize a rough standard for implementing.

Jean-Paul helped by writing an interoperability test harness, which gave me an easy way to experiment with design choices.

Conclusion

Happily, as of this weekend, I’ve been able to overcome my lack of knowledge of cryptography, the idiosyncracies of python-spake2, the documentation quirks of crypto libraries, and the lack of a standard for SPAKE2 on the network to implement SPAKE2 in Haskell, first with NIST groups, then with Ed25519.

No doubt much could be better—I would very much welcome feedback, whether it’s about my Haskell, my mathematics, or my documentation—but I’m pretty happy with the results.

This has been a fun, stretching, challenging exercise. Even though it took more time and was more difficult than I expected, it has been such a privilege to be able to tackle it. Not only have I learned much, but I also feel much more confident in my ability to learn hard things.

I hope to follow up with more posts, covering:

  • just what is SPAKE2, and why should I care?
  • how can I use SPAKE2 (and especially, haskell-spake2)?
  • what was it like to write a Haskell version of a Python library?
  • what’s up with Ed25519? (this is somewhat ambitious)

by Jonathan Lange at May 26, 2017 11:00 PM

May 11, 2017

Hynek Schlawack

Please Fix Your Decorators

If your Python decorator unintentionally changes the signatures of my callables or doesn’t work with class methods, it’s broken and should be fixed. Sadly most decorators are broken because the web is full of bad advice.

by Hynek Schlawack (hs@ox.cx) at May 11, 2017 12:00 PM

April 30, 2017

Moshe Zadka

My Little Subclass: Inheritance is Magic

Learning about Python Method Resolution Order with Twilight Sparkle and her friends.

(Thanks to Ashwini Oruganti for her helpful suggestions.)

The show "My Little Pony: Friendship is Magic" is the latest reimagination of the "My Little Pony" brand as a TV show. The show takes place, mostly, in Ponyville and features the struggles of the so-called "Mane Six" characters. Ponyville was founded by Earth ponies, but is populated by Unicorns and Pegasus ponies as well.

At the end of season 4, Twilight Sparkle becomes an "Alicorn" princess. Alicorns have both horns, like unicorns, and wings, like the pegasus ponies. In the My Little Pony show, horns are used to do magic -- most commonly, telekinetically moving objects.

When reaching puberty, ponies discover their "special talent" via a cutie mark -- a magical tattoo on their flank.

from __future__ import print_function
# Base class for all ponies.
class Pony(object):
    def __init__(self, name, cutie_mark_ability):
        self.name = name
        self.cutie_mark_ability = cutie_mark_ability

    def move(self):
        return "Galloping"

    def carry(self):
        return "Carrying on my back"

    def special_abilities(self):
        return [self.cutie_mark_ability]
# Earth ponies. Just regular ponies.
class Earth(Pony):
    pass
# Apple Pie is an Earth pony. Let's define her!
Apple_Pie = Earth("Apple Pie", "farming apples")
# This is a little helper function to help ponies
# introduce themselves.
def introduce(pony):
    print("Hi, I'm", pony.name, ",", type(pony).__name__, "pony")
    print("Moving:", pony.move())
    print("Carrying:", pony.carry())
    print("Abilities:")
    for ability in pony.special_abilities():
        print("\t", ability)
introduce(Apple_Pie)
Hi, I'm Apple Pie , Earth pony
Moving: Galloping
Carrying: Carrying on my back
Abilities:
     farming apples
# Pegasus ponies have wings and can fly.
class Pegasus(Pony):

    def move(self):
        return "Flying"

    def special_abilities(self):
        return super(Pegasus, self).special_abilities() + ["flying"]
# Rainbow Dash is a pegasus. Let's define her!
Rainbow_Dash = Pegasus("Rainbow Dash", "rainbow boom")

introduce(Rainbow_Dash)
Hi, I'm Rainbow Dash , Pegasus pony
Moving: Flying
Carrying: Carrying on my back
Abilities:
     rainbow boom
     flying
# Unicorn ponies have wings and can fly.
class Unicorn(Pony):

    def carry(self):
        return "Lifting with horn"

    def special_abilities(self):
        return super(Unicorn, self).special_abilities() + ["magic"]
Rarity = Unicorn("Rarity", "finding gems")

introduce(Rarity)
Hi, I'm Rarity , Unicorn pony
Moving: Galloping
Carrying: Lifting with horn
Abilities:
     finding gems
     magic
# Alicorn princesses have wings and horns
class Alicorn(Pegasus, Unicorn):

    def special_abilities(self):
        return super(Alicorn, self).special_abilities() + ["ruling"]

# Twilight Sparkle is an alicorn princess.
Twilight_Sparkle = Alicorn("Twilight Sparkle", "learning magic")

introduce(Twilight_Sparkle)
Hi, I'm Twilight Sparkle , Alicorn pony
Moving: Flying
Carrying: Lifting with horn
Abilities:
     learning magic
     magic
     flying
     ruling

Pun fully intended -- this is magic! To understand why, let us think of how the inheritance hierarchy of the Alicorn class looks like:

    Pony
    /   \
   /     \
Unicorn Pegasus
   \     /
    \   /
    Alicorn

The Unicorn class has a move method it inherits from Pony. But magically, the Alicorn class knows it should get the more specialized one from Pegasus. Could it be that when inheriting, Python looks right to left? But then the carry method would not work correctly. Lastly, what about special_abilities? How did it track all classes correctly?

The answer is the MRO -- message resolution order.

Alicorn.mro()
[__main__.Alicorn, __main__.Pegasus, __main__.Unicorn, __main__.Pony, object]

Let us first explain the last question. The meaning of super(klass, instance).method() is "find the next time the method appears in the instance's class, after klass does". Because it looks at the MRO of the instance's class, it knows to jump from Alicorn to Pegasus to Unicorn and finally to Pony.

The MRO also helps to explain the first question, but the answer is more complex. Let us focus on the carry method.

Pegasus.carry == Pony.carry
True

This is the first clue -- the methods compare equal. Which brings us to our next point: let's list all candidate methods.

candidates = set()
for klass in Alicorn.mro()[1:]:
    try:
        candidates.add(klass.carry)
    except AttributeError:
        pass
candidates
{<unbound method Unicorn.carry>, <unbound method Pegasus.carry>}

(We do not see Pony.carry in the set because the Pegasus.carry appears first, Pony.carry compares equal, and candidates is a set.)

We only care about the last time a method appears in the MRO: if you think of it as "distance", we are looking for the closest method. Since in Python, it is easier to find the first time an element appears in a list, we will calculate the reversed MRO:

ordered_candidates = [getattr(klass, 'carry', None) for klass in Alicorn.mro()[1:]]
reverse_ordered_candidates = ordered_candidates[::-1]
reverse_ordered_candidates
[None,
 <unbound method Pony.carry>,
 <unbound method Unicorn.carry>,
 <unbound method Pegasus.carry>]

Now we find the candidate with the highest reverse-distance (lowest distance) using Python's max:

max(candidates, key=reverse_ordered_candidates.index)
<unbound method Unicorn.carry>

The same logic would explain the move method, and the details of the special_abilities method.

The Python Method Resolution Order (MRO) is subtle, and it is easy to misunderstand. Hopefully, the examples here, and the sample code, give a way to think about the MRO and understand why, exactly, it is the way it is.

by Moshe Zadka at April 30, 2017 05:20 AM

April 19, 2017

Glyph Lefkowitz

So You Want To Web A Twisted

As a rehearsal for our upcoming tutorial at PyCon, Creating And Consuming Modern Web Services with Twisted, Moshe Zadka, we are doing a LIVE STREAM WEBINAR. You know, like the kids do, with the video games and such.

As the webinar gods demand, there is an event site for it, and there will be a live stream.

This is a practice run, so expect “alpha” quality content. There will be an IRC channel for audience participation, and the price of admission is good feedback.

See you there!

by Glyph at April 19, 2017 03:29 AM

April 17, 2017

Itamar Turner-Trauring

Learning without a mentor: how to become an expert programmer on your own

If you’re an intermediate or senior programmer you may hit the point where you feel you’re no longer making progress, where you’re no longer learning. You’re good at what you do, but you don’t know what to learn next, or how: there are too many options, it’s hard to get feedback or even tell you’re making progress.

A mentor can help, if they’re good at teaching… but what do you do if you don’t have a mentor? How do you become a better programmer on your own?

In order to learn without a mentor you need to be able to recognize when you’re learning and when you’re not, and then you need to choose a new topic and learn it.

How to tell if you’re learning

If you’re not getting any feedback from an expert it can be tricky to tell whether you’re actually learning or not. And lacking that knowledge it’s easy to get discouraged and give up.

Luckily, there’s an easy way to tell whether you’re learning or not: learning is uncomfortable. If you’re in your comfort zone, if you’re thinking to yourself “this isn’t so hard, I can just do this” you’re not learning. You’re just going over things you already know; that’s why it feels comfortable.

If you’re irritated and a little confused, if you feel clumsy and everything seems harder than it should be: now you’re learning. That feeling of clumsiness is your brain engaging with new material it doesn’t quite understand. You’re irritated because you can’t rely on existing knowledge. It’s hard because it’s new. If you feel this way don’t stop: push through until you’re feeling comfortable again.

You don’t want to take this too far, of course. Pick a topic that is too far out of your experience and it will be so difficult you will fail to learn anything, and the experience may be so traumatic you won’t want to learn anything.

Choosing something to learn

When choosing something to learn you want something just outside your comfort zone: close enough to your existing knowledge that it won’t overwhelm you, far enough that it’s actually new. You also want to pick something you’ll be able to practice: without practice you’ll never get past the point of discomfort.

Your existing job is a great place to practice new skills because it provides plenty of time to do so, and you’ll also get real-world practice. That suggests picking new skills that are relevant to your job. As an added bonus this may give you the opportunity to get your employer to pay for initial training or materials.

Let’s consider some of the many techniques you can use to learn new skills on the job.

Teaching

If you have colleagues you work with you will occasionally see them do something you think is obviously wrong, or miss something you think is the obviously right thing to do. For example, “obviously you should never do file I/O in a class constructor.”

When this happens the tempting thing to do, especially if you’re in charge, is to just tell them to change to the obviously better solution and move on. But it’s worth resisting that urge, and instead taking the opportunity to turn this into a learning experience, for them and for you.

The interesting thing here is the obviousness: why is something obvious to you, and not to them? When you learn a subject you go through multiple phases:

  • Conscious ignorance: you don’t know anything.
  • Conscious knowledge: you know how to do the task, but you have to think it through.
  • Unconscious knowledge: you just know what to do.

When you have unconscious knowledge you are an expert: you’ve internalized a model so well you apply it automatically. There’s are two problems with being an expert, however:

  • It’s hard for you to explain why you’re making particular decisions. Since your internal model is unconscious you can’t clearly articulate why or how you made the decision.
  • Your model is rarely going to be the best possible model, and since you’re applying it unconsciously you may have stopped improving it.

Teaching means taking your unconscious model and turning it into an explicit conscious model someone else can understand. And because teaching makes your mental model conscious you also get the opportunity to examine your own assumptions and improve your own understanding, ending up with a better model.

You don’t have to teach only colleagues, of course: you can also write a blog post, or give a talk at a meetup or at a conference. The important thing is to notice the places you have unconscious knowledge and try to make it conscious by explaining it to others.

Switching jobs

While learning is uncomfortable, I personally suffer from a countervailing form of discomfort: I get bored really easily. As soon as I become comfortable with a new task or skill I’m bored, and I hate that.

In 2004 I joined a team writing high performance C++. As soon as I’d gotten just good enough to feel comfortable… I was bored. So I came up with slightly different tasks to work on, tasks that involved slightly different skills. And then I was bored again, so I moved on to a different team in the company, where I learned even more skills.

Switching tasks within your own company, or switching jobs to a new company, is a great way to get out of your comfort zone and learning something new. It’s daunting, of course, because you always end up feeling clumsy and incompetent, but remember: that discomfort means you’re learning. And every time you go through the experience of switching teams or jobs you will become better at dealing with this discomfort.

Learning from experts: skills, not technologies

Another way to learn is to learn from experts that don’t work with you. It’s tempting to try to find experts who will teach you new technologies and tools, but skills are actually far more valuable.

Programming languages and libraries are tools. Once you’re an experienced enough programmer you should be able to just pick them up as needed if they become relevant to your job.

Skills are trickier: testing, refactoring, API design, debugging… skills will help you wherever you work, regardless of technology. But they’re also easier to ignore or miss. There are skills we’d all benefit from that we don’t even know exist.

So read a book or two on programming, but pick a book that will teach you a skill, not a technology. Or try to find the results of experts breaking down their unconscious models into explicit models, for example:

Conclusion: learning on your own

You don’t need a mentor to learn. You can become a better software engineer on your own by:

  • Recognizing when you’re feeling comfortable and not learning.
  • Finding ways to learn, including teaching, switching jobs and learning skills from experts (and I have some more suggestions here).
  • Forcing yourself through the discomfort of learning: if you feel incompetent you’re doing the right thing.

April 17, 2017 04:00 AM

April 12, 2017

Moshe Zadka

PYTHONPATH Considered Harmful

(Thanks to Tim D. Smith and Augie Fackler for reviewing a draft. Any mistakes that remain are mine.)

The environment variable PYTHONPATH seems harmless enough. The official documentation refers to its function as "Augment the default search path for module files." However, in practice, setting this variable in a shell can lead to many confusing things.

For one, most directories are poorly suited to be on the Python search path. Consider, for example, the root directory of a typical Python project: it contains setup.py -- and so, if it were added to the current search path, import setup would become possible. (This is one reason to have src/ directories.) Often, directories added unwisely to the Python search path cause files to be imported from paths they do not expect to, and surprisingly conflict.

In addition, it now means that commands such as pip freeze do not describe reality correctly: just because foo==1.2.4 appears in the output, does not mean that import foo will import version 1.2.4 -- and only careful checking of the __file__ attribute will notice it.

Specifically, any bug reporting template that asks the output of pip freeze to be attached is rendered nearly useless.

Worse, active commands in pip would seem to have no effect: pip install --upgrade or pip uninstall would have no effect on the valiant foo package, safely ensconced in a directory on PYTHONPATH.

Finally, PYTHONPATH changes paths for both Python 2 and 3. This means code intended for one interpreter will end up in the other.

Manipulating PYTHONPATH should be reserved for specialized launchers which need to start a Python program despite a complicated system configuration. In those cases, PYTHONPATH should be set inside the launcher, so as not to effect any code outside it.

As always, when needing specialized launchers in order to run code, consider that "starting up should not be the most complicated thing that your program does". It is quite possible that other, existing tools, are already good enough. Maybe virtualenv, pip install -e or Pex. can solve the problem?

Postscript

In order to write this post, I tested the effect of setting PYTHONPATH to various values. One of those was left over after my testing, leading to fifteen minutes of wasted debugging effort. Harmful indeed!

by Moshe Zadka at April 12, 2017 05:20 AM

April 06, 2017

Itamar Turner-Trauring

You don't need a Computer Science degree

If you never studied Computer Science in school you might believe that’s made you a worse programmer. Your colleagues who did study CS know more about algorithms and data structures than you do, after all. What did you miss? Are you really as good?

My answer: you don’t need to worry about it, you’ll do just fine. Some of the best programmers I’ve worked with never studied Computer Science at all.

A CS education has its useful parts, but real-world programming includes a broad array of skills that no one person can master. Programming is a team effort, and different developers can and should have different skills and strengths for the team to succeed. Where a CS degree does give you a leg up is when you’re trying to get hired early in your career, but that is a hurdle that many people overcome.

What you learn in Comp Sci

I myself did study CS and Mathematics in college, dropped out of school, and then later went back to school and got a liberal arts degree. Personally I feel the latter was more useful for my career.

Much of CS is devoted to theory, and that theory doesn’t come up in most programming. Yes, proving that all programming languages are equivalent is interesting, but that one sentence is all that I’ve taken away from a whole semester’s class on the subject. And yes, I took multiple classes on data structures and algorithms… but I’m still no good at implementing new algorithms.

I ended up being bored in most of my CS classes. I dropped out to get a programming job where I could just write code, which I enjoyed much more. In some jobs that theory I took would be quite useful, but for me at least it has mostly been irrelevant.

Writing software in the real world

It’s true that lacking a CS education you might not be as good as data structures. But chances are you have some other skill your colleagues lack, a unique strength that you contribute to your team.

Let me give a concrete example: at a previous job we gave all candidates a take home exercise, implementing a simple Twitter-like server. I reviewed many of the solutions, and each solution had different strengths. For example:

  • Doing a wonderful job packaging a solution and making it easy to deploy.
  • Choosing a data structure that made access to popular feeds faster, for better scalability.
  • Discussing how the API was badly designed and could result in lost messages, and suggesting improvements.
  • Discussing the ways in which the design was likely to cause business problems.
  • Adding monitoring, and explaining how to use the monitoring to decide when to scale up the service.

Packaging, data structures, network API design, big picture thinking, operational experience: these are just some of the skills that contribute to writing software. No one person can have them all. That means you can focus on your own strengths, because you’re part of a team. Here’s what that’s meant in my case:

  • I’m pretty bad at creating new data structures or algorithms, and mostly it doesn’t matter. Sometimes you’re creating new algorithms, it’s true. But most web developers, for example, just need to know that hashmaps give faster lookups than iterating over a list. But I’ve had coworkers who were good at it… and they worked on those problems when they came up.
  • In my liberal arts degree I learned how to craft better abstractions, and writing as a form of thinking. This has proven invaluable when designing new products and solving harder problems.
  • From the friends I made working on open source projects I learned about testing, and how to build robust software. Many of them had no CS education at all, and had learned on their own, from books and forums and their own practice.

Job interviews

The one place where a Computer Science degree is unquestionably useful is getting a job early in your career. When you have no experience the degree will help; once you have experience your degree really doesn’t matter.

It’s true that many companies have an interview process that focuses on algorithmic puzzles. Unfortunately interviewing skills are often different than job skills, and need to be strengthened separately. And my CS degree doesn’t really help me, at least: I’ve forgotten most of what I’ve learned, and algorithms were never my strength. So whenever I’m interviewing for jobs I re-read an old algorithms textbook and do some practice puzzles.

In short: don’t worry about lacking a CS degree. Remember that you’ll never be able to know everything, or do everything: it’s the combined skills of your whole team that matters. Focus on your strengths, improve the skills you have, and see what new skills you can learn from your teammates.

April 06, 2017 04:00 AM

March 26, 2017

Itamar Turner-Trauring

Why and how you should test your software

This was the second draft of a talk I’ll be giving at PyCon 2017. You can now watch the actual talk instead.

March 26, 2017 04:00 AM

March 20, 2017

Itamar Turner-Trauring

Dear recruiter, "open floor space" is not a job benefit

I was recently astonished by a job posting for a company that touted their “open floor space.” Whoever wrote the ad appeared to sincerely believe that an open floor office was a reason to work for the company, to be mentioned alongside a real benefit like work/life balance.

While work/life balance and an open floor plan can co-exist, an open floor plan is a management decision much more akin to requiring long hours: both choose control over productivity.

The fundamental problem facing managers is that productivity is hard to measure. Faced with the inability to measure productivity, managers may feel compelled to measure time spent working. Never mind that it’s counter-productive: at least it gives management control, even if it’s control over the wrong thing.

Here is a manager explaining the problem:

I [would] like to manage [the] team’s output rather than managing their time, because if they are forced to spend time inside the office, it doesn’t mean they are productive or even working. At the same time it’s hard to manage output, because software engineering tasks are hard to estimate and things can go out of the track easily.

In this case at least the manager involved understands that what matters is output, not hours in the office. But not everyone realizes is as insightful.

Choosing control

In cases where management does fall into the trap of choosing control over productivity, the end result is a culture where the only thing that matters is hours in the office. Here’s a story I heard from a friend about a startup they used to work at:

People would get in around 8 or 9, because that’s when breakfast is served. They work until lunch, which is served in the office, then until dinner, which is served in the office. Then they do social activities in the office, video games or board games, and then keep working until 10PM or later. Their approach was that you can bring your significant other in for free dinner, and therefore why leave work? Almost like your life is at the office.

Most of the low level employees, especially engineers, didn’t feel that this was the most productive path. But everyone knew this was the most direct way to progress, to a higher salary, to becoming a team lead. The number of hours in the office is a significant part of how your performance is rated.

I don’t think people can work 12 hours consistently. And when you’re not working and you’re in an open plan office, you distract people. It’s not about hours worked, it’s about hours in office, there’s ping pong tables… so there’s always someone asking you to play ping pong or distracting you with a funny story. They’re distracting you, their head wasn’t in the zone, but they had to be in the office.

A team of 10 achieved what a team of 3 should achieve.

Control through visibility

Much like measuring hours in the office, an open floor office is designed for control rather than productivity. A manager can easily see what each developer is doing: are they coding? Are they browsing the web? Are they spending too much time chatting to each other?

In the company above the focus on working hours was apparently so strong that the open floor plan was less relevant. But I’ve no doubt there are many companies where you’ll start getting funny looks from your manager if you spend too much time appearing to be “unproductive.”

To be fair, sometimes open floor plans are just chosen for cheapness, or thoughtlessly copied from other companies because all the cool kids are doing it. Whatever the motivation, they’re still bad for productivity. Programming requires focus, and concentrated thought, and understanding complex systems that cannot fit in your head all at once and so constantly need to be swapped in and out.

Open floor spaces create exactly the opposite of the environment you need for programming: they’re full of noise and distraction, and headphones only help so much. I’ve heard of people wearing hoodies to block out not just noise but also visual distraction.

Dear recruiter

Work/life balance is a real job benefit, for both employers and employees: it increases productivity while allowing workers space to live outside their job. But “open office space” is not a benefit for anyone.

At worst it means your company is perversely sabotaging its employees in order to control them better. At best it means your company doesn’t understand how to enable its employees to do their job.

March 20, 2017 04:00 AM

March 18, 2017

Moshe Zadka

Shipping Python Applications in Docker

Introduction

When looking in open source examples, or tutorials, one will often see Dockerfiles that look like this:

FROM python:3.6
COPY setup.py /mypackage
COPY src /mypackage/src
COPY requirements.txt /
RUN pip install -r /requirements.txt
RUN pip install /mypackage
ENTRYPOINT ["/usr/bin/mypackage-console-script", "some", "arguments"]

This typical naive example has multiple problems, which this article will show how to fix.

  • It uses a container with a functioning build environment
  • It has the potential for "loose" and/or incomplete requirements.txt, so that changes in PyPI (or potential compromise of it) change what is installed in the container.
  • It is based on python:3.6, a label which can point to different images at different times (for example, when Python 3.6 has a patch release).
  • It does not use virtual environments, leading to potential bad interactions.

In general, when thinking about the Docker container image as the "build" output of our source tree, it is good to aim for a reproducible build: building the same source tree should produce equivalent results. (Literally bit-wise identical results are a good thing to aim for, but there are many potential sources for spurious bitwise changes such as dates embedded in zip files.) One important reason is for bug fixes: for example, a bug fix leads to change a single line of Python code. That is usually a safe change, that is reasonably easy to understand and to test for regressions. However, if it also can update Python from 3.6.1 to 3.6.1, or update Twisted from 17.1 to 17.2, regressions are more likely and the change has to be tested more carefully.

Managing Base Images

Docker images are built in layers. In a Dockerfile, the FROM line imports the layers from the source image. Every other line adds exactly one layer.

Container registries store images. Internally, they deduplicate identical layers. Images are indexed by user (or organization), repository name and tag. These are not immutable: it is possible to upload a new image, and point an old user/repository/tag at it.

In theory, it is possible to achieve immutability by content-addressing. However, registries usually garbage collect layers with no pointers. The only safe thing to do is to own images on one's own user or organization. Note that to "own" the images, all we need to do is to push them under our own name. The push will be fast -- since the actual content is already on the Docker registry side, it will be short-circuit the actual upload.

Especially when working using a typical home internet connection, or an internet cafe connection, download speeds are usually reasonable but upload speeds are slow. This upload does not depend on fast upload speed, since it only uploads small hashes. However, it will make sure the images keep forever, and we have a stable pointer to them.

The following code shows an example of fork-tagging in this way into a user account. We use time, precise to the microsecond, to name the images. This means that running this script is all but guaranteed to result in unique image labels, and they even sort correctly.

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call(["docker", "pull", orig])
    subprocess.check_call(["docker", "tag", orig, image])
    subprocess.check_call(["docker", "push", image])

(This assumes the Docker client is already pre-authenticated, via running docker login.)

It produces images that look like:

moshez/python36-slim:2017-03-10T02-18-12-843046
   b5b6550a858c
   198.6 MB
moshez/python36:2017-03-10T02-18-12-843046
   a1782fa44ef7
   687.2 MB

The script is safe to run -- it will not clobber existing labels. It will require a source change to use the new images -- usually by changing a Dockerfile.

Python Third Party Dependencies

When installing dependencies, it is good to ensure that those cannot change. Modern pip gives great tools to do that, but those are rarely used. The following describes how to allow reproducible builds while still allowing for easy upgrades.

"Loose" requirements specify programmer intent. They indicate what dependencies the programmer cares about: e.g., wanting a minimum version of a library. They are handwritten by the developer. Note that often that "hand-writing" can be little more than the single line of the "main" package (the one that contains the code which is written in-house). It will usually be a bit more than that: for example, pex and docker-py to support the build process.

Strict dependencies indicate the entire dependency chain: complete, with specific versions and hashes. Note that both must be checked in. Re-generating the strict dependencies is a source-tree change operation, no different from changing the project's own Python code, because it changes the build output.

The workflow for regenerating the strict dependencies might look like:

$ git checkout -b updating-third-party
$ docker build -t temp-image -f harden.docker .
$ docker run --rm -it temp-image > requirements.strict.txt
$ git commit -a -m 'Update 3rd party'
$ git push
$ # Follow code workflow

There is no way, currently, to statically analyze dependencies. Docker allows to install the packages in an ephemeral environment, analyze the dependencies and check the hashes, and then output the results. The output can be checked in, while the actual installation is removed: this is what the --rm flag above means.

For the workflow above to work, we need a Dockerfile and a script. The Dockerfile is short, since most of the logic is in the script. We use the reproducible base images, (i.e., the images we fork-tagged), copy the inputs, and let the entry point run the script.

# harden.docker
FROM moshez/python36:2017-03-09T04-50-49-169150
COPY harden-requirements requirements.loose.txt /
ENTRYPOINT ["/harden-requirements"]

The script itself is divided into two parts. The first part installs the loose requirements in a virtual environment, and then runs pip freeze. This gives us the "frozen" requirements -- with specific versions, but without hashes. We use --all to force freezing of packages pip thinks we should not. One of those packages is setuptools, which pex needs specific versions of.

# harden-requirements
import subprocess, sys
subprocess.check_output([sys.executable, "-m", "venv",
                         "/envs/loose"])
subprocess.check_output(["/envs/loose/bin/pip", "install", "-r",
                         "/requirements.loose.txt"])
frozen = subprocess.check_output(["/envs/loose/bin/pip",
                                  "freeze", "--all"])
with open("/requirements.frozen.txt", 'wb') as fp:
    fp.write(frozen)

The second part takes the frozen requirements, and uses pip-compile (part of the pip-tools package) to generate the hashes. The --allow-unsafe is the scarier looking, but semantically equivalent, version of pip's --all.

subprocess.check_output(["/envs/loose/bin/pip", "install",
                         "pip-tools"])
output = subprocess.check_output(["/envs/loose/bin/pip-compile",
                                  "--allow-unsafe",
                                  "--generate-hashes",
                                  "requirements.frozen.txt"])
for line in output.decode('utf-8').splitlines():
    print(line)

Docker-in-Docker

(Thanks to Glyph Lefkowitz for explaining that well.)

Too many examples in the wild use the same container to build and deploy. Those end up shipping a container full of build tools, such as make and gcc, to production. That has many downsides -- size, performance, security and isolation.

In order to properly separate those two tasks, we will need to learn how to run containers from inside containers. Running a container as a daughter of a container is a bad idea. Instead, we take advantage of the client/server architecture of Docker.

Contrary to some misconception, docker run does not run a container. It connects to the Docker daemon, which runs the container. The client knows how to connect to the daemon based on environment variables. If the environment is vanilla, the client has a default: it connects to the UNIX domain socket at /var/run/docker.sock. The daemon always listens locally on that socket.

Thus, we can run a container from inside a container that has access to the UNIX domain socket. Such a socket is simply a file, in keeping with UNIX's "everything is a file" philosophy. Therefore, we can pass it in to the container by using host volume mounting.

$ docker run -v /var/run/docker.sock:/var/run/docker.sock ...

A Docker client running inside such a container will connect to the outside daemon, and can cause it to run other containers.

Python Output Formats

Wheel

Wheels are specially structured zip files. They are designed so a simple unzip is all that is needed to install them. The pip program will build, and cache, wheels for any package it installs.

Pex

Pex is an executable Python program. It does not embed the interpreter, or the standard library. It does, however, embed all third-party packages: running the Pex file on a vanilla Python installation works, as long as the Python versions are compatible.

Pex works because running python somefile.zip works by adding somefile.zip to sys.path, and then running __main__.py from the archive as its main file. The Zip format is end-based: adding an arbitrary prefixes to the content does not change the semantics of a zip file. Pex adds the prefix #!/usr/bin/env python.

Building a Simple Service Container Image

The following will use remotemath package, which does slow remote arithmetic operations (mutliplication and negation). It does not have much utility for production, but is useful for pedagogical purposes, such as this one.

In general, build systems can get complicated. For our examples, however, a simple Python script is all the build infrastructure we need.

Much like in the requirements hardening example, we will follow the pattern of creating a docker container image and running it immediately. Note that the first two lines do not depend on the so-called "context" -- the files that are being copied into the docker container from the outside. Because of this, the docker build process will know to cache them, leading to fast rebuilds.

However, building like this allows us to ignore how to mount files into the docker container -- and allows us to ignore the sometimes subtle semantics of such mounting.

# build.docker
FROM moshez/python36:2017-03-09T04-50-49-169150
RUN python3 -m venv /buildenv && mkdir /wheelhouse
COPY requirements.strict.txt build-script remotemath.docker /
ENTRYPOINT ["/buildenv/bin/python", "/build-script"]

This Docker file depends on two things we have not mentioned: a build script and the remotemath.docker file. The build script itself has one subtlety -- because it calls out to pip, it cannot import the docker module until it has been installed. Indeed, since it starts out in a vanilla virtual environment, it does not have access to any non-built-in packages. We do need to make sure the docker-py PyPI package makes it into the requirements file.

# build-script
import subprocess

def run(cmd, *args):
    args = [os.path.join('/buildenv/bin', cmd)] + list(args[1:])
    return subprocess.check_call(args)

os.makedirs('/mnt/output')
shutil.copy('/remotemath.docker',
            '/mnt/output/remotemath.docker')
run('pip', 'install', '--require-hashes',
                      '-r', 'requirements.strict.txt')
run('pip', 'wheel', '--require-hashes',
                    '-r', 'requirements.strict.txt',
                    '--wheel-dir', '/wheelhouse/')
run('pex', '-o', '/mnt/output/twist.pex',
           '-m', 'twisted',
           '--repo', '/wheelhouse', '--no-index',
           'remotemath')

import docker
client = docker.from_env()
client.images.build(
    path='/mnt/output/',
    dockerfile='/mnt/output/remotemath.docker',
    tag='/moshez/remotemath:{}'.format(sys.argv[1]),
    rm=True,
)
## TODO: Push here

Docker has no way to "build and run" in one step. We build an image, tagged as temp-image, like we did for hardening. When running the build on a shared machine are likely, this needs to be a uuid, and have some garbage collection mechanism.

$ docker build -t temp-image -f build.docker .
$ docker run --rm -it \
         -v /var/run/docker.sock:/var/run/docker.sock \
         temp-image $(git rev-parse HEAD)

The Dockerfile for the production service is also short: here, we take advantage of two nice properties of the python -m twisted command (wrapped into pex as twist.pex):

  • It accepts a port on the command-line. Since arguments to Docker run are appended to the command-line, this can be run as docker run moshez/remotemath:TAG --port=tcp:8080.
  • It reaps adopted children, and thus can be used as PID 1.
FROM moshez/python36-slim:2017-03-09T04-50-49-169150
COPY twist.pex /
ENTRYPOINT ["/twist.pex", "remotemath"]

Building a Container Image With Custom Code

So far we have packaged a ready-made application. That was non-trivial in itself, but now let us write an application.

Pyramid App

We are using the Pyramid framework to write an application. It will have one page, which will use the remote math service to calculate 3*4.

# setup.py
import setuptools
setuptools.setup(name='fancycalculator',
    packages=setuptools.find_packages(where='src'),
    package_dir={"": "src"}, install_requires=['pyramid'])
# src/fancycalculator/app.py
import os, requests, pyramid.config, pyramid.response
def multiply(request):
    res = requests.post(os.environ['REMOTE_MATH']+'/multiply', json=[3, 4])
    num, = res.json()
    return pyramid.response.Response('Result:{}'.format(num))
cfg = pyramid.config.Configurator()
cfg.add_route('multiply', '/');
cfg.add_view(multiply, route_name='multiply')
app = cfg.make_wsgi_app()

Changes

In many cases, one logical source code repository will be responsible for more than one Docker image. Obviously, in the case of a monorepo, this will happen. But even when using a repo per logical unit, often two services share so much code that it makes sense to separate them into two images.

For example, the admin panel of a web application and the application itself. It might make sense to build them as two repositories because they will not share all the code and running them on separate containers allows separating permissions effectively. However, they will share enough code between them that is not general interest that building them out of the same repository is probably a good idea.

This is what we will do in this example. We will change our build files, above, to also build the "fancy calculator" Docker image.

Two files need append-only changes: build.docker needs to add the new source files and build-script needs to actually build the new container.

# build.docker
RUN mkdir /fancycalculator
COPY setup.py /fancycalculator/
COPY src /fancycalculator/src/
COPY fancycalculator.docker /
# build-script
run('pip', 'wheel', '/fancycalculator',
    '--wheel-dir', '/wheelhouse')
run('pex', '-o', '/mnt/output/fc-twist.pex',
    '-m', 'twisted',
    '--repo', '/wheelhouse', '--no-index',
    'twisted', 'fancycalculator')
client.images.build(path='/mnt/output/',
    dockerfile='/mnt/output/fancycalculator.docker',
    tag='/moshez/fancycalculator:{}'.format(sys.argv[1]),
    rm=True,
)

Again, in more realistic scenario where a build system is already being used, it might be that a new build configuration file needs to be added -- and if all the source code is already in a subdirectory, that directory can be added as a whole in the build Dockerfile.

We need to add a Dockerfile -- the one for production fancycalculator. Again, using the Twisted WSGI container at the root allows us to avoid some complexity.

# fancycalculator.docker
FROM moshez/python36-slim:2017-03-09T04-50-49-169150
COPY fc-twist.pex /twist.pex
ENTRYPOINT ["/twist.pex", "web", "--wsgi", \
            "fancycalculator.app.app"]

Running Containers

Orchestration Framework

There are several popular orchestration framework: Kubernetes (sometimes shortened to k8s), Docker Swarm, Mesosphere and Nomad. Of those, Swarm is probably the easiest to get up and running.

In any case, it is best to make sure specific containers are written in an OF-agnostic manner. The easiest way to achieve that is to have containers expect to connect to a DNS name for other services. All Orchestation Frameworks support some DNS integration in order to do service discovery.

Configuration and Secrets

Secrets can be bootstrapped from the secret system the Orchestration Framework uses. When not using an Orchestration Framework, it is possible to mount secret files directly into containers with -v. The reason "bootstrapped" is used is because often it is easier to have the secret at that level be a PyNaCl private key, and to add application-level secrets by encrypting them with a PyNaCl corresponding public key (that can be checked into the repository) and putting them directly in the image. This adds developer velocity by allowing develops to add secrets directly without giving them any special privileges.

Storage

In general, much of Docker is used for "stateless" services. When needing to keep state, there are several options. One is to avoid an Orchestration Framework for the stateful services: deploy with docker, but mount the area where long-term data is kept from the host. Backup, redundancy and fail-over solutions still need to be implemented, but this is no worse than without containers, and containers at least give an option of atomic upgrades.

by Moshe Zadka at March 18, 2017 05:20 AM

March 12, 2017

Itamar Turner-Trauring

Unit testing, Lean Startup, and everything in-between

This was my first draft of a talk I gave at PyCon 2017. You can now watch the actual talk instead.

March 12, 2017 05:00 AM

March 08, 2017

Moshe Zadka

Pelican -- an Experiment

This blog is an experiment -- static blog, hosted on my own domain. An Orbifold is, roughly speaking, something that locally looks like Euclidean space divided by a finite linear group.

This is Moshe'z space, to talk on how things look like -- locally and globally.

by Moshe Zadka at March 08, 2017 06:20 AM

March 05, 2017

Itamar Turner-Trauring

Why you're (not) failing at your new job

It’s your first month at your new job, and you’re worried you’re on the verge of getting fired. You don’t know what you’re doing, everyone is busy and you need to bother them with questions, and you’re barely writing any code. Any day now your boss will notice just how bad a job you’re doing… what will you do then?

Luckily, you’re unlikely to be fired, and in reality you’re likely doing just fine. What you’re going through happens to almost everyone when they start a new job, and your panicked feeling will eventually pass.

New jobs make you incompetent

Just like you, every time I’ve started a new job I’ve had to deal with feeling incompetent.

  • In 2004 I started a new job as a C++ programmer, writing software for the airline industry. On my first day of work the VP of Engineering told me he didn’t expect me to fully productive for 6 months. And for good reason: I barely knew any C++, I knew nothing about airlines, and I had to learn a completely new codebase involving both.
  • In 2012 I quit my job to become a consultant. Every time I got a new client I had to learn a new codebase and a new set of problems and constraints.
  • At my latest job I ended up writing code in 3 programming languages I didn’t previously know. And once again I had to learn a new codebase and a new business logic domain.

Do you notice the theme?

Every time you start a new job you are leaving behind the people, processes and tools you understand, and starting from scratch. A new language, new frameworks, new tools, new codebases, new ways of doing things, people you don’t know, business logic you don’t understand, processes you’re unfamiliar with… of course it’s scary and uncomfortable.

Luckily, for the most part this is a temporary feeling.

This too will pass

Like a baby taking their first steps, or a child on their first bike ride, those first months on a new job will make you feel incompetent. Soon the baby will be a toddler, the child will be riding with confidence. You too will soon be productive and competent.

Given some time and effort you will eventually learn what you need to know:

…the codebase.

…the processes, how things are done and maybe even why.

…the programming language.

…who to ask and when to ask them.

…the business you are operating in.

Since that’s a lot to learn, it will take some time, but unless you are working for an awful company that is expected and normal.

What you can do

While the incompetent phase is normal and unavoidable, there is still something you can do about it: learn how to learn better. Every time you start a new job you’re going to be learning new technologies, new processes, new business logic. The most important skill you can learn is how to learn better and faster.

The faster you learn the faster you’ll get past the feeling of incompetence when you start a new job. The faster you learn the faster you can become a productive employee or valued consultant.

Some skills are specific to programming. For example, when learning new programming languages, I like skimming a book or tutorial first before jumping in: it helps me understand the syntax and basic concepts. Plus having a mental map of the book helps me know where to go back to when I’m stuck. Other skills are more generic, e.g. there is considerable research on how learning works that can help you learn better.

Finally, another way I personally try to learn faster is by turning my mistakes into educational opportunities. Past and present, coding or career, every mistake I make is a chance to figure out what I did wrong and how I can do better. If you’d like to avoid my past mistakes, sign up to get a weekly email with one of my mistakes and what you can learn from it.

March 05, 2017 05:00 AM

February 19, 2017

Itamar Turner-Trauring

When AI replaces programmers

The year is 2030, and artificial intelligence has replaced all programmers. Let’s see how this brave new world works out:

Hi! I’m John the Software Genie, here to help you with all your software needs.

Hi, I’d like some software to calculate the volume of my house.

Awesome! May I have access to your location?

Why do you need to access my location?

I will look up your house in your city’s GIS database and use its dimensions to calculate its volume.

Sorry, I didn’t quite state that correctly. I want some software that will calculate the dimensions of any house.

Awesome! What is the address of this house?

No, look, I don’t want anything with addresses. You can have multiple apartments in a house, and anyway some structures don’t have an address, or are just being designed… and the attic and basement doesn’t always count… How about… I want software that calculates the volume of an abstract apartment.

Awesome! What’s an abstract apartment?

Grrrr. I want software that calculates the sum of the volumes of some rooms.

Awesome! Which rooms?

You know what, never mind, I’ll use a spreadsheet.

I’m sorry Dave, I can’t let you do that.

What.

Just a little joke! I’m sorry you decided to go with a spreadsheet. Your usage bill for $153.24 will be charged to your credit card. Have a nice day!

Back to the present: I’ve been writing software for 20 years, and I find the idea of being replaced by an AI laughable.

Processing large amounts of data? Software’s great at that. Figuring out what a human wants, or what a usable UI is like, or what the real problem you need to solve is… those are hard.

Imagine what it would take for John the Software Genie to learn from that conversation. I’ve made my share of mistakes over the years, but I’ve learned enough that these days I can gather requirements decently. How do you teach an AI to gather requirements?

We might one day have AI that is as smart as a random human, an AI that can learn a variety of skills, an AI that can understand what those strange and pesky humans are talking about. Until that day comes, I’m not worried about being replaced by an AI, and you shouldn’t worry either.

Garbage collection didn’t make programmers obsolete just because it automated memory management. In the end automation is a tool to be controlled by human understanding. As a programmer you should focus on building the skills that can’t be automated: figuring out the real problems and how to solve them.

February 19, 2017 05:00 AM

February 11, 2017

Twisted Matrix Laboratories

Twisted 17.1.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 17.1!

The highlights of this release are:

  • twisted.web.client.Agent now supports IPv6! It's also now the primary web client in Twisted, with twisted.web.client.getPage being deprecated in favour of it and Treq.
  • twisted.web.server has had many cleanups revolving around timing out inactive clients.
  • twisted.internet.ssl.CertificateOptions has had its method argument deprecated, in favour of the new raiseMinimumTo, lowerMaximumSecurityTo, and insecurelyLowerMinimumTo arguments, which take TLSVersion arguments. This allows you to better give a range of versions of TLS you wish to negotiate, rather than forcing yourself to any one version.
  • twisted.internet.ssl.CertificateOptions will use OpenSSL's MODE_RELEASE_BUFFERS, which will let it free unused memory that was held by idle TLS connections.
  • You can now call the new twist runner with python -m twisted.
  • twisted.conch.ssh now has some ECDH key exchange support and supports hmac-sha2-384.
  • Better Unicode support in twisted.internet.reactor.spawnProcess, especially on Windows on Python 3.6.
  • More Python 3 porting in Conch, and more under-the-hood changes to facilitate a Twisted-wide jump to new-style classes only on Python 2 in 2018/2019. This release has also been tested on Python 3.6 on Linux.
  • Lots of deprecated code removals, to make a sleeker, less confusing Twisted.
  • 60+ closed tickets.

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by Amber Brown (noreply@blogger.com) at February 11, 2017 10:08 AM

February 10, 2017

Glyph Lefkowitz

Make Time For Hope

Pandora hastened to replace the lid! but, alas! the whole contents of the jar had escaped, one thing only excepted, which lay at the bottom, and that was HOPE. So we see at this day, whatever evils are abroad, hope never entirely leaves us; and while we have THAT, no amount of other ills can make us completely wretched.

It’s been a rough couple of weeks, and it seems likely to continue to be so for quite some time. There are many real and terrible consequences of the mistake that America made in November, and ignoring them will not make them go away. We’ll all need to find a way to do our part.

It’s not just you — it’s legit hard to focus on work right now. This is especially true if, as many people in my community are, you are trying to motivate yourself to work on extracurricular, after-work projects that you used to find exciting, and instead find it hard to get out of bed in the morning.

I have no particular position of authority to advise you what to do about this situation, but I need to give a little pep talk to myself to get out of bed in the morning these days, so I figure I’d share my strategy with you. This is as much in the hope that I’ll follow it more closely myself as it is that it will be of use to you.

With that, here are some ideas.

It’s not over.

The feeling that nothing else is important any more, that everything must now be a life-or-death political struggle, is exhausting. Again, I don’t want to minimize the very real problems that are coming or the need to do something about them, but, life will go on. Remind yourself of that. If you were doing something important before, it’s still important. The rest of the world isn’t going away.

Make as much time for self-care as you need.

You’re not going to be of much use to anyone if you’re just a sobbing wreck all the time. Do whatever you can do to take care of yourself and don’t feel guilty about it. We’ll all do what we can, when we can.1

You need to put on your own oxygen mask first.

Make time, every day, for hope.

“You can stand anything for 10 seconds. Then you just start on a new 10 seconds.”

Every day, set aside some time — maybe 10 minutes, maybe an hour, maybe half the day, however much you can manage — where you’re going to just pretend everything is going to be OK.2

Once you’ve managed to securely fasten this self-deception in place, take the time to do the things you think are important. Of course, for my audience, “work on your cool open source code” is a safe bet for something you might want to do, but don’t make the mistake of always grimly setting your jaw and nose to the extracurricular grindstone; that would just be trading one set of world-weariness for another.

After convincing yourself that everything’s fine, spend time with your friends and family, make art, or heck, just enjoy a good movie. Don’t let the flavor of life turn to ash on your tongue.

Good night and good luck.

Thanks for reading. It’s going to be a long four years3; I wish you the best of luck living your life in the meanwhile.


  1. I should note that self-care includes just doing your work to financially support yourself. If you have a job that you don’t feel is meaningful but you need the wages to survive, that’s meaningful. It’s OK. Let yourself do it. Do a good job. Don’t get fired. 

  2. I know that there are people who are in desperate situations who can’t do this; if you’re an immigrant in illegal ICE or CBP detention, I’m (hopefully obviously) not talking to you. But, luckily, this is not yet the majority of the population. Most of us can, at least some of the time, afford to ignore the ongoing disaster. 

  3. Realistically, probably more like 20 months, once the Rs in congress realize that he’s completely destroyed their party’s credibility and get around to impeaching him for one of his numerous crimes. 

by Glyph at February 10, 2017 07:58 AM

Itamar Turner-Trauring

Buggy Software, Loyal Users: Why Bug Reporting is Key To User Retention

Your software has bugs. Sorry, mine does too.

Doesn’t matter how much you’ve tested it or how much QA has tested it, some bugs will get through. And unless you’re NASA, you probably can’t afford to test your software enough anyway.

That means your users will be finding bugs for you. They will discover that your site doesn’t work on IE 8.2. They clicked a button and a blank screen came up. Where is that feature you promised? WHY IS THIS NOT WORKING?!

As you know from personal experience, users don’t enjoy finding bugs. Now you have buggy software and unhappy users. What are you going to do about it?

Luckily, in 1970 the economist Albert O. Hirschman came up with an answer to that question.

Exit, Voice and Loyalty

In his classic treatise Exit, Voice and Loyalty, Hirschman points out that users who are unhappy with a product have exactly two options. Just two, no more:

  1. Exiting, i.e. giving up on your product.
  2. Voicing their concerns.

Someone who has given up on your software isn’t likely to tell you about their issues. And someone who cares enough to file a bug is less likely to switch away from your software. Finally, loyal users will stick around and use their voice when otherwise they would choose exit.

Now, your software has no purpose without users. So chances are you want to keep them from leaving (though perhaps you’re better off without some of them - see item #2).

And here’s the thing: there’s only two choices, voice or exit. If you can convince your users to use their voice to complain, and they feel heard, your users are going to stick around.

Now at this point you might be thinking, “Itamar, why are you telling me obvious things? Of course users will stick around if we fix their bugs.” But that’s not how you keep users around. You keep users around by allowing them to express their voice, by making sure they feel heard.

Sometimes you may not fix the bug, and still have satisfied users. And sometimes you may fix a bug and still fail at making them feel heard; better than not fixing the bug, but you can do better.

To make your users feel heard when they encounter bugs you need to make sure:

  1. They can report bugs with as little effort as possible.
  2. They hear back from you.
  3. If you choose to fix the bug, you can actually figure out the problem from their bug report.
  4. The bug fix actually gets delivered to them.

Let’s go through these requirements one by one.

Bug reporting

Once your users notice a problem you want them to communicate it to you immediately. This will ensure they choose the path of voice and don’t contemplate exiting to your shiny competitor at the domain next door.

Faster communication also makes it much more likely the bug report will be useful. If the bug occurred 10 seconds ago the user will probably remember what happened. If it’s a few hours later you’re going to hear something about how “there was a flying moose on the screen? or maybe a hyena?” Remember: users are human, just like you and me, and humans have a hard time remembering things (and sometimes we forget that.)

To ensure bugs get reported quickly (or at all) you want to make it as easy as possible for the user to report the problem. Each additional step, e.g. creating an account in an issue tracker, means more users dropping out of the reporting process.

In practice many applications are designed to make it as hard as possible to report bugs. You’ll be visiting a website, when suddenly:

“An error has occurred, please refresh your browser.”

And of course the page will give no indication of how or where you should report the problem if it reoccurs, and do the people who run this website even care about your problem?

Make sure that’s not how you’re treating your users when an error occurs.

Improving bug reporting

So let’s say you include a link to the bug report page, and let’s say the user doesn’t have to jump through hoops and email verification to sign up and then fill out the 200 fields that JIRA or Bugzilla think are really important and would you like to tell us about your most common childhood nightmare from the following list? We’ll assume that bug reporting is easy to find and easy to fill it out, as it should be.

But… you need some information from the user. Like, what version of the program they were using, the operating system and so on and so forth. And you do actually need that information.

But the user just wants to get this over with, and every additional piece of information you ask for is likely to make them give up and go away. What to do?

Here’s one solution: include it for them.

I’ve been using the rather excellent Spacemacs editor, and as usual when I use new software I had to report a bug. So it goes.

Anyway, to report a bug in Spacemacs you just hit the bug reporting key combo. You get a bug reporting page. In your editor. And it’s filled in the Spacemacs version and the Emacs version and all the configuration you made. And then you close the buffer and it pre-populates a new GitHub issue with all this information and you hit Submit and you’re done.

This is pretty awesome. No need to find the right web page or copy down every single configuration and environmental parameter to report a bug: just run the error-reporting command.

Spacemacs screenshot

Also, notice that this is a lot better than automatic crash reporting. I got to participate and explain the bits that were important to me, it’s not the “yeah we reported the crash info and let’s be honest you don’t really believe anyone looks at these do you?”-vibe that one gets from certain software.

You can do much the same thing with a command-line tool, or a web-based application. Every time an error occurs or the user is having issues (e.g. the user ran yourcommand --help):

  1. Ask for the user’s feedback in the terminal, or within the context of the web page, and then file the bug report for them.
  2. Gather as much information as you can automatically and include it in the bug report, so the user only has to report the part they care about.

Responding to the bug report

The next thing you have to do is actually respond to the bug report. As I write this the Spacemacs issue I filed is sitting there all sad and lonely and unanswered, and it’s a little annoying. I do understand, open source volunteer-run project and all. (This is where loyalty comes in.)

But for a commercial product I am paying for I want to know that my problem has been heard. And sometimes the answer might be “sorry, we’re not going to fix that this quarter.” And that’s OK, at least I know where I stand. But silence is not OK.

Respond to your bug reports, and tell your users what you’re doing about it. And no, an automated response is not good enough.

Diagnosing the bug

If you’ve decided to fix the bug you can now proceed to do so… if you can figure out what the actual problem is. If diagnosing problems is impossible, or even just too expensive, you’re not going to do fix the problem. And that means your users will continue to be affected by the bug.

Let’s look at a common bug report:

I clicked Save, and your program crashed and deleted a day’s worth of my hopes and dreams.

This is pretty bad: an unhappy user, and as is often the case the user simply doesn’t have the information you need to figure out what’s going on.

Even if the bug report is useful it can often be hard to reproduce the problem. Testing happens in controlled environments, which makes investigation and reproduction easier. Real-world use happens out in the wild, so good luck reproducing that crash.

Remember how we talked about automatically including as much information as possible in the bug report? You did that, right? And included all the relevant logs?

If you didn’t, now’s the time to think about it: try to ensure that logs, core dumps, and so on and so forth will always be available for bug reports. And try to automate submitting them as much as possible, while still giving the user the ability to report their specific take on the problem.

(And don’t forget to respect your user’s privacy!)

Distributing the fix

So now you’ve diagnosed and fixed your bug: problem solved! Or rather, problem almost solved. If the user hasn’t gotten the bug fix all this work has been a waste of time.

You need fast releases, and you need automated updates where possible. If it takes too long for fixes to reach your users then for for practical purposes your users are not being heard. (There are exceptions, as always: in some businesses your users will value stability over all else.)

A virtuous circle

If you’ve done this right:

  1. Your users will feel heard, and choose voice over exit.
  2. You will get the occasional useful bug report, allowing you to improve your product.
  3. All your users will quickly benefit from those improvements.

There is more to be done, of course: many bugs can and should be caught in advance. And many users will never tell you their problems unless you ask.

But it’s a good start at making your users happy and loyal, even when they encounter the occasional bug.

February 10, 2017 05:00 AM

January 31, 2017

Jonathan Lange

Announcing haskell-cli-template

Last October, I announced servant-template, a cookiecutter template for creating production-ready Haskell web services.

Almost immediately after making it, I wished I had something for building command-line tools quickly. I know stack comes with a heap of them, but:

  • it’s hard to predict what they’ll do
  • adding a new template requires submitting a PR
  • cookiecutter has existed for ages and is pretty much better in every way

So I made haskell-cli-template. It’s very simple, it just makes a Haskell command-line project with some tests, command-line parsing, and a CircleCI build.

I wanted to integrate logging-effect, but after a few months away from it my tired little brain wasn’t able to figure it out. I like command-line tools with logging controls, so I suspect I’ll add it again in the future.

Let me know if you use haskell-cli-template to make anything cool, and please feel free to fork and extend.

by Jonathan Lange at January 31, 2017 12:00 AM