Planet Twisted

October 19, 2016

Itamar Turner-Trauring

Why Pylint is both useful and unusable, and how you can actually use it

This is a story about a tool that caught a production-impacting bug the day before we released the code. This is also the story of a tool no one uses, and for good reason. By the time you're done reading you'll see why this tool is useful, why it's unusable, and how you can actually use it with your Python project.

(Not a Python programmer? The same problems and solutions are likely apply to tools in your ecosystem as well.)

Pylint saves the day

If you're coding in Haskell the compiler's got your back. If you're coding in Java the compiler will usually lend a helping hand. But if you're coding in a dynamic language like Python or Ruby you're on your own: you don't have a compiler to catch bugs for you.

The next best thing is a lint tool that uses heuristics to catch bugs in your code. One such tool is Pylint, and here's how I started using it.

One day at work we realized our builds had been consistently failing for a few days, and it wasn't the usual intermittent failures. After a few days of investigating, my colleague Tom Prince discovered the problem. It was Python code that looked something like this:

for volume in get_volumes():

for volme in get_other_volumes():

Notice the typo in the second for loop. Combined with the fact that Python leaks variables from blocks, the last value of volume from the first for loop was used for every iteration of the second loop.

To see if we could prevent these problems in the future I tried Pylint, re-introduced the bug... and indeed it caught the problem. I then looked at the rest of the output to see what else it had found.

What it had found was a serious bug. It was in code I had written a few days earlier, and the bug completely broke an important feature we were going to ship to users the very next day. Here's a heavily simplified minimal reproducer for the bug:

list_of_printers = []
for i in [1, 2, 3]:
    def printer():

for func in list_of_printers:

The intended result of this reproducer is to print:


But what will actually get printed with this code is:


When you define a nested function in Python that refers to a variable in the outside scope it binds not the value of a variable but the variable itself. In this case that means the i inside printer() ended up always getting the last value of the variable i in the for loop.

And luckily Pylint caught that bug before it shipped; pretty great, right?

Why no one uses Pylint

Pylint is useful, but many projects don't use it. For example, I went and checked just now, and neither Twisted nor Django nor Flask nor Sphinx seem to use Pylint. Why wouldn't these large, sophisticated Python projects use a tool that would automatically catch bugs for them?

One problem is that it's slow, but that's not the real problem; you can always just run it on the CI system with the other slow tests. The real problem is the amount of output.

Here's what I mean: I ran pylint on a checkout of Twisted and the resulting output was 28,000 lines of output (at which point pylint crashed, but I'll assume that's fixed in newer releases). Let me say that again: 28,000 errors or warnings.

That's insane.

And to be fair Twisted has a coding standard that doesn't match the Python mainstream, but massive amounts of noise has been my experience with other projects as well. Pylint has a lot of useful errors... but also a whole lot of utterly useless garbage assumptions about how your code should look. And fundamentally it treats them all the same; e.g. there's a distinction between warnings and errors but in practice both useful and useless stuff is in the warning category.

For example:

W:675, 0: Class has no __init__ method (no-init)

That's not a useful warning. Now imagine a few thousand of those.

How you should use Pylint

So here we have a tool that is potentially useful, but unusable in practice. What to do? Luckily Pylint has some functionality that can help: you can configure it with a whitelist of lint checks.

First, setup Pylint to do nothing:

  1. Make a list of all the features you plausibly want to enable from the Pylint docs and configure .pylintrc to whitelist them.
  2. Comment them all out.

At this point Pylint will do no checks. Next:

  1. Uncomment a small batch of checks, and run pylint.
  2. If the resulting errors are real problems, fix them. If the errors are utter garbage, delete those checks from the configuration.

At this point you have a small number of probably useful checks that are passing: you can run pylint and you only will be told about new problems. In other words, you have a useful tool.

Repeat this process a few times, or once a week, enabling a new batch of checks each time until you run out of patience or you run out of Pylint checks to enable.

The end result will be something like this configuration or this configuration; both projects are open source under the Apache 2.0 license, so you can use those as a starting point.

Go forth and lint

Here's my challenge to you: if you're a Python programmer, go setup Pylint on a project today. It'll take an hour to get some minimal checks going, and one day it will save you from a production-impacting bug. If you're not a Python programmer you can probably find some equivalent tool for your language; go set that up.

And if you're the author of a lint tool, please, try to come up with better defaults. It's better to catch 60% of bugs and have 10,000 software projects using your tool than to catch 70% of bugs and have almost no one use it.

October 19, 2016 04:00 AM

Glyph Lefkowitz

docker run glyph/rproxy

Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

$ mkdir -p conf/certificates;
$ cat > conf/rproxy.ini << EOF;
> [rproxy]
> certs=certificates
> http_ports=80
> https_ports=443
> [hosts]
> mysite.com_host=<other container host>
> mysite.com_port=8080
$ docker run --restart=always -v "$(pwd)"/conf:/conf \
    -p 80:80 -p 443:443 \

There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

by Glyph at October 19, 2016 12:32 AM

October 18, 2016

Glyph Lefkowitz

As some of you may have guessed from the unintentional recent flurry of activity on my Twitter account, twitter feed, the service I used to use to post blog links automatically, is getting end-of-lifed. I've switched to for the time being, unless they send another unsolicited tweetstorm out on my behalf...

Sorry about the noise! In the interests of putting some actual content here, maybe you would be interested to know that I was recently interviewed for PyDev of the Week?

by Glyph at October 18, 2016 08:37 PM

October 15, 2016

Jonathan Lange

servant-template: production-ready Haskell web services in 5 minutes

If you want to write a web API in Haskell, then you should start by using my new cookiecutter template at It’ll get you a production-ready web service in 5 minutes or less.

Whenever you start any new web service and you actually care about getting it working and available to users, it’s very useful to have:

  • logging
  • monitoring
  • continuous integration
  • tests
  • deployment
  • command-line parsing

These are largely boring, but nearly essential. Logs and monitoring give you visibility into the code’s behaviour in production, tests and continuous integration help you make sure you don’t break it, and, of course, you need some way of actually shipping code to users. As an engineer who cares deeply about running code in production, these are pretty much the bare minimum for me to be able to deploy something to my users.

The cookiecutter template at gh:jml/servant-template creates a simple Haskell web API service that does all of these things:

As the name suggests, all of this enables writing a servant server. Servant lets you declaring web APIs at the type-level and then using those API specifications to write servers. It’s hard to overstate just how useful it is for writing RESTful APIs.

Get started with:

$ cookiecutter gh:jml/servant-template
project_name [awesome-service]: awesome-service
$ cd awesome-service
$ stack test
$ make image
$ docker run awesome-service:latest --help
awesome-service - TODO fill this in

Usage: awesome-service --port PORT [--access-logs ARG] [--log-level ARG]
  One line description of project

Available options:
  -h,--help                Show this help text
  --port PORT              Port to listen on
  --access-logs ARG        How to log HTTP access
  --log-level ARG          Minimum severity for log messages
  --ghc-metrics            Export GHC metrics. Requires running with +RTS.
$ docker run -p 8080:80 awesome-service --port 80
[2016-10-16T20:50:07.983292987000] [Informational] Listening on :80

For this to work, you’ll need to have Docker installed on your system. I’ve tested it on my Mac with Docker Machine, but haven’t yet with Linux.

You might have to run stack docker pull before make image, if you haven’t already used stack to build things from within Docker.

Once it’s up and running, you can browse to http://localhost:8080/ (or http://$(docker-machine ip):8080/) if you’re on a Mac, and you’ll see a simple HTML page describing the API and giving you a link to the /metrics page, which is where all the Prometheus metrics are exported.

There you have it, a production-ready web service. At least for some values of “production-ready”.

Of course, the API it offers is really simple. You can make it your own by editing the API definition and the server implementation to make it really your own. Note these two are in separate libraries to make it easier to generate client code.

The template comes with a test suite that uses servant-quickcheck to guarantee that none of your endpoints return 500s, take longer than 100ms to serve, and that all the 201s include Location headers.

If you’re so inclined, you could push the created Docker image to a repository somewhere—it’s around 25MB when built. Then, people could use it and no one would have to know that it’s Haskell, they’d just notice a fast web service that works.

As the README says, I’ve made a few questionable decisions when building this. If you disagree, or think I could have done anything better I’d love to know. If you use this to build something cool, or even something silly, please let me know on Twitter.

by jml at October 15, 2016 11:00 PM

October 14, 2016

Itamar Turner-Trauring

How to find a programming job you won't hate

Somewhere out there is a company that wants to hire you as a software engineer. Working for that company is a salesperson whose incentives were set by an incompetent yet highly compensated upper management. The salesperson has just made a sale, and in return for a large commission has promised the new customer twice the features in half the time.

The team that wants to hire you will spend the next three months working evenings and weekends. And then, with a job badly done, they'll move on to the next doomed project.

You don't want to work for this company, and you shouldn't waste your time applying there.

When you're looking for a new programming job you want to find it quickly:

  • If your current job sucks you want to find a new place before you hit the unfortunate gotta-quit-today moment.
  • If you're not working you don't want your savings to run out. You have been saving money, right?
  • Either way, looking for a job is no fun.

Assuming you can afford to be choosy, you'll want to speed up the process by filtering out as many companies as possible in advance. There are many useful ways to filter your list down: your technical interests, the kinds of company you want to work for, location.

In this post, however, I'd like to talk about ways to filter out companies you'd hate. That is, companies with terrible work conditions.

Talk to your friends

Some companies have an bad reputation, some have a great reputation. But once a company is big enough different teams can end up with very different work environments.

Talking to someone who actually works at a company will give you much better insight about how things work more locally. They can tell you which groups to avoid, and which groups have great leadership.

For example, Amazon does not have a very good reputation as a workplace, but I know someone who enjoys his job there and his very reasonable working hours.


For companies where you don't have contacts Glassdoor can be a great resource. Glassdoor is a site that lets employees post anonymous salaries and reviews of their company.

The information is anonymous, so you have to be a little skeptical, especially when there's only a few reviews. And you need to pay attention to the reviewer's role, location, and the year it was posted. Once you take all that into account the reviews can often be very informative.

During my last job search I found one company in the healthcare area with many complaints of long working hours. One of Glassdoor's features is a way for a company to reply to reviews. In this case the CEO himself answered, explaining that they work hard because "sick patients can't wait."

Personally I'd rather not work for someone who confuses working long hours with increased output or productivity.

Read company materials

After you've checked out Glassdoor the next thing to look at is the job posting itself, along with the company's website. These are often written by people other than the engineering team, but you can still learn a lot from them.

Sometimes you'll get the sense the company is actually a great place to work for. For example, Memrise has this to say in their Software Engineering postings:

If you aren’t completely confident that you fit our exact criteria, please get in touch immediately. Humility is a wonderful thing and we’re not interested in hiring ‘rockstars’ or ‘ninjas’.

On the other hand, consider a job post I found for an Automation Test Engineer. First we learn:

Must be able to execute scripts during off hours if required.

This is peculiar; if they're automated why does a person need to run them manually? Later on we read:

This isn’t the job for someone looking for a traditional 8-5 position, but it’s a great role for someone who is hungry for a terrific opportunity in a fast-paced, state of the art environment.

Apparently they consider working 8-5 traditional, they will work their employees much longer hours, and they think they're "state of the art" even though they haven't heard of cron.

Notice, by the way, that it's worth reading all of a company's job postings. Other job postings from the same company are less informative about working conditions than the one I just quoted.


Finally, if a company has passed the previous filters and you've gotten an interview, make sure you ask about working conditions. Tactfully, of course, and once you've demonstrated your value, but if you don't ask you won't know until it's too late. Here are some sample questions to get you started:

  • What's your typical work day like?
  • How many hours do you end up working?
  • How do you manage project deadlines?

Depending on the question you might want to ask individual contributors rather than managers. But I've had managers tell me outright they want employees to work really long hours.


There are many bad software jobs out there. But you don't need to work evenings or weekends to succeed as a programmer.

If you want to find a programming job with a sane workweek, a job you'll actually enjoy, sign up for the free email course below for more tips and tricks.

October 14, 2016 04:00 AM

October 09, 2016

Thomas Vander Stichele

Puppet/puppetdb/storeconfigs validation issues

Over the past year I’ve chipped away at setting up new servers for apestaart and managing the deployment in puppet as opposed to a by now years old manual single server configuration that would be hard to replicate if the drives fail (one of which did recently, making this more urgent).

It’s been a while since I felt like I was good enough at puppet to love and hate it in equal parts, but mostly manage to control a deployment of around ten servers at a previous job.

Things were progressing an hour or two here and there at a time, and accelerated when a friend in our collective was launching a new business for which I wanted to make sure he had a decent redundancy setup.

I was saving the hardest part for last – setting up Nagios monitoring with Matthias Saou’s puppet-nagios module, which needs External Resources and storeconfigs working.

Even on the previous server setup based on CentOS 6, that was a pain to set up – needing MySQL and ruby’s ActiveRecord. But it sorta worked.

It seems that for newer puppet setups, you’re now supposed to use something called PuppetDB, which is not in fact a database on its own as the name suggests, but requires another database. Of course, it chose to need a different one – Postgres. Oh, and PuppetDB itself is in Java – now you get the cost of two runtimes when you use puppet!

So, to add useful Nagios monitoring to my puppet deploys, which without it are quite happy to be simple puppet apply runs from a local git checkout on each server, I now need storedconfigs which needs puppetdb which pulls in Java and Postgres. And that’s just so a system that handles distributed configuration can actually be told about the results of that distributed configuration and create a useful feedback cycle allowing it to do useful things to the observed result.

Since I test these deployments on local vagrant/VirtualBox machines, I had to double their RAM because of this – even just the puppetdb java server by default starts with 192MB reserved out of the box.

But enough complaining about these expensive changes – at least there was a working puppetdb module that managed to set things up well enough.

It was easy enough to get the first host monitored, and apart from some minor changes (like updating the default Nagios config template from 3.x to 4.x), I had a familiar Nagios view working showing results from the server running Nagios itself. Success!

But all runs from the other vm’s did not trigger adding any exported resources, and I couldn’t find anything wrong in the logs. In fact, I could not find /var/log/puppetdb/puppetdb.log at all…

fun with utf-8

After a long night of experimenting and head scratching, I chased down a first clue in /var/log/messages saying puppet-master[17702]: Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB

I traced that down to puppetdb/char_encoding.rb, and with my limited ruby skills, I got a dump of the offending byte sequence by adding this code:

Puppet.warning "Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB"'/tmp/ruby', 'w') { |file| file.write(str) }
Puppet.warning "THOMAS: is here"

(I tend to use my name in debugging to have something easy to grep for, and I wanted some verification that the File dump wasn’t triggering any errors)
It took a little time at 3AM to remember where these /tmp files end up thanks to systemd, but once found, I saw it was a json blob with a command to “replace catalog”. That could explain why my puppetdb didn’t have any catalogs for other hosts. But file told me this was a plain ASCII file, so that didn’t help me narrow it down.

I brute forced it by just checking my whole puppet tree:

find . -type f -exec file {} \; > /tmp/puppetfile
grep -v ASCII /tmp/puppetfile | grep -v git

This turned up a few UTF-8 candidates. Googling around, I was reminded about how terrible utf-8 handling was in ruby 1.8, and saw information that puppet recommended using ASCII only in most of the manifests and files to avoid issues.

It turned out to be a config from a webalizer module:

webalizer/templates/webalizer.conf.erb: UTF-8 Unicode text

While it was written by a Jesús with a unicode name, the file itself didn’t have his name in it, and I couldn’t obviously find where the UTF-8 chars were hiding. One StackOverflow post later, I had nailed it down – UTF-8 spaces!

00004ba0 2e 0a 23 c2 a0 4e 6f 74 65 20 66 6f 72 20 74 68 |..#..Note for th|
00004bb0 69 73 20 74 6f 20 77 6f 72 6b 20 79 6f 75 20 6e |is to work you n|

The offending character is c2 a0 – the non-breaking space

I have no idea how that slipped into a comment in a config file, but I changed the spaces and got rid of the error.

Puppet’s error was vague, did not provide any context whatsoever (Where do the bytes come from? Dump the part that is parseable? Dump the hex representation? Tell me the position in it where the problem is?), did not give any indication of the potential impact, and in a sea of spurious puppet warnings that you simply have to live with, is easy to miss. One down.

However, still no catalogs on the server, so still only one host being monitored. What next?

users, groups, and permissions

Chasing my next lead turned out to be my own fault. After turning off SELinux temporarily, checking all permissions on all puppetdb files to make sure that they were group-owned by puppetdb and writable for puppet, I took the last step of switching to that user role and trying to write the log file myself. And it failed. Huh? And then id told me why – while /var/log/puppetdb/ was group-writeable and owned by puppetdb group, my puppetdb user was actually in the www-data group.

It turns out that I had tried to move some uids and gids around after the automatic assignment puppet does gave different results on two hosts (a problem I still don’t have a satisfying answer for, as I don’t want to hard-code uids/gids for system accounts in other people’s modules), and clearly I did one of them wrong.

I think a server that for whatever reason cannot log should simply not start, as this is a critical error if you want a defensive system.

After fixing that properly, I now had a puppetdb log file.

resource titles

Now I was staring at an actual exception:

2016-10-09 14:39:33,957 ERROR [c.p.p.command] [85bae55f-671c-43cf-9a54-c149cede
c659] [replace catalog] Fatal error on attempt 0
java.lang.IllegalArgumentException: Resource '{:type "File", :title "/var/lib/p
uppet/concat/thomas_vimrc/fragments/75_thomas_vimrc-\" allow adding additional
config through .vimrc.local_if filereadable(glob(\"~_.vimrc.local\"))_\tsource
~_.vimrc.local_endif_"}' has an invalid tag 'thomas:vimrc-" allow adding additi
onal config through .vimrc.local
if filereadable(glob("~/.vimrc.local"))
source ~/.vimrc.local
'. Tags must match the pattern /\A[a-z0-9_][a-z0-9_:\-.]*\Z/.
at com.puppetlabs.puppetdb.catalogs$validate_resources.invoke(catalogs.
clj:331) ~[na:na]

Given the name of the command (replace catalog), I felt certain this was going to be the problem standing between me and multiple hosts being monitored.

The problem was a few levels deep, but essentially I had code creating fragments of vimrc files using the concat module, and was naming the resources with file content as part of the title. That’s not a great idea, admittedly, but no other part of puppet had ever complained about it before. Even the files on my file system that store the fragments, which get their filename from these titles, happily stored with a double quote in its name.

So yet again, puppet’s lax approach to specifying types of variables at any of its layers (hiera, puppet code, ruby code, ruby templates, puppetdb) in any of its data formats (yaml, json, bytes for strings without encoding information) triggers errors somewhere in the stack without informing whatever triggered that error (ie, the agent run on the client didn’t complain or fail).

Once again, puppet has given me plenty of reasons to hate it with a passion, tipping the balance.

I couldn’t imagine doing server management without a tool like puppet. But you love it when you don’t have to tweak it much, and you hate it when you’re actually making extensive changes. Hopefully after today I can get back to the loving it part.

flattr this!

by Thomas at October 09, 2016 08:31 PM

October 07, 2016

Itamar Turner-Trauring

More learning, less time: how to quickly gather new tools and techniques

Update: Added newsletters to the list.

Have you ever worked hard to solve a problem, only to discover a few weeks later an existing design pattern that was even better than your solution? Or built an internal tool, only to discover an existing tool that already solved the problem?

To be a good software engineer you need a good toolbox. That means software tools you can use when appropriate, design patterns so you don't have to reinvent the wheel, testing techniques... the list goes on. Learning all existing tools and techniques is impossible, and just keeping up with every newly announced library would be a full time job.

How do you learn what you need to know to succeed at your work? And how can you do so without spending a huge amount of your free time reading and programming just to keep up?

A broad toolbox, the easy way

To understand how you can build your toolbox, consider the different levels of knowledge you can have. You can be an expert on a subject, or you can have some basic understanding, or you might just have a vague awareness that the subject exists.

For our purposes building awareness is the most important of the three. You will never be an expert in everything, and even basic understanding takes some time. But broad awareness takes much less effort: you just need to remember small amounts of information about each tool or technique.

You don't need to be an expert on a tool or technique, or even use it at all. As long as you know a tool exists you'll be able to learn more about it when you need to.

For example, there is a tool named Logstash that moves server logs around. That's pretty much all you have to remember about it, and it takes just 3 seconds to read that previous sentence. Maybe you'll never use that information... or maybe one day you'll need to get logs from a cluster of machines to a centralized location. At that point you'll remember the name "Logstash", look it up, and have the motivation to actually go read the documentation and play around with it.

Design patterns and other techniques take a bit more effort to gain useful awareness, but still, awareness is usually all you need. For example, property-based testing is hugely useful. But all it takes is a little reading to gain awareness, even if it will take more work to actually use it.

The more tools and techniques you are aware of the more potential solutions you will have to the problems you encounter while programming. Being aware of a broad range of tools and techniques is hugely valuable and easy to achieve.

Building your toolbox

How do you build your toolbox? How do you find the tools and techniques you need to be aware of? Here are three ways to do so quickly and efficiently.


A great way to learn new tools and techniques are newsletters like Ruby Weekly. There are newsletters on many languages and topics, from DevOps to PostgreSQL.

Newsletters typically include not just links but also short descriptions, so you can skim them and gain awareness even without reading all the articles. In contrast, sites like Reddit or Hacker News only include links, so you gain less information unless you spend more time reading.

The downside of newsletters is that they focus on the new. You won't hear about a classic design pattern or a standard tool unless someone happens to write a new blog post about it. You should therefore rely on additional sources as well.

Conference proceedings

Another broader source of tools and techniques are conferences. Conference talks are chosen by a committee with some understanding of the conference subject. Often they can be quite competitive: I believe the main US Python conference accepts only a third of proposals. And good conferences will aim for a broad range of talks, within the limits of their target audience. As a result conferences are a great way to discover relevant, useful tools and techniques, both new and old.

Of course, going to a conference can be expensive and time consuming. Luckily you don't have to go to the conference to benefit.

Just follow this quick procedure:

  1. Find a conference relevant to your interests. E.g. if you're a Ruby developer find a conference like RubyConf.
  2. Skim the talk descriptions; they're pretty much always online.
  3. If something sounds really interesting, there's a decent chance you can find a recording of the talk, or at least the slides.
  4. Mostly however you just need to see what people are talking about and make a mental note of things that sound useful or interesting.

For example, skimming the RubyConf 2016 program I see there's something called OpenStruct for dynamic data objects, FactoryGirl which is apparently a testing-related library, a library for writing video games, an explanation of hooks and so on. I'm not really a Ruby programmer, but if I ever want to write a video game in Ruby I'll go find that talk.

Meetups and user groups

Much like conferences, meetups are a great way to learn about a topic. And much like conferences, you don't actually have to go to the meetup to gain awareness.

For example, the Boston Python Meetup has had talks in recent months about CPython internals, microservices, BeeKeeper which is something for REST APIs, the Plone content management system, etc..

I've never heard of BeeKeeper before, but now I know its name and subject. That's very little information, gained very quickly... but next time I'm building a REST API with Python I can go look it up and see if it's useful.

If you don't know what a "REST API" is, well, that's another opportunity for growing your awareness: do a Google search and read a paragraph or two. If it's relevant to your job, keep reading. Otherwise, make a mental note and move on.

Book catalogs

Since your goal is awareness, not in-depth knowledge, you don't need to read a book to gain something: the title and description may be enough. Technical book publishers are in the business of publishing relevant books, so browsing their catalog can be very educational.

For example, the Packt book catalog will give you awareness of a long list of tools you might find useful one day. You can see that "Unity" is something you use for game development, "Spark" is something you use for data science, etc.. Spend 20 seconds reading the Spark book description and you'll learn Spark does "statistical data analysis, data visualization, predictive modeling" for "Big Data". If you ever need to do that you now have a starting point for further reading.

Using your new toolbox

There are only so many hours in the day, so many days in a year. That means you need to work efficiently, spending your limited time in ways that have the most impact.

The techniques you've just read do exactly that: you can learn more in less time by spending the minimum necessary to gain awareness. You only need to spend the additional time to gain basic understanding or expertise for those tools and techniques you actually end up using. And having a broad range of tools and techniques means you can get more done at work, without reinventing the wheel every time.

You don't need to work evenings or weekends to be a successful programmer! This post covers just some of the techniques you can use to be more productive within the limits of a normal working week. To help you get there I'm working on a book, The Programmer's Guide to a Sane Workweek.

Sign up in the email subscription form below to learn more about the book, and to get notified as I post more tips and tricks on how you can become a better software engineer.

October 07, 2016 04:00 AM

September 24, 2016

Hynek Schlawack

Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

by Hynek Schlawack ( at September 24, 2016 12:00 PM

September 17, 2016

Glyph Lefkowitz

Hitting The Wall

I’m an introvert.

I say that with a full-on appreciation of just how awful thinkpieces on “introverts” are.

However, I feel compelled to write about this today because of a certain type of social pressure that a certain type of introvert faces. Specifically, I am a high-energy introvert.

Cementing this piece’s place in the hallowed halls of just awful thinkpieces, allow me to compare my mild cognitive fatigue with the plight of those suffering from chronic illness and disability1. There’s a social phenomenon associated with many chronic illnesses, “but you don’t LOOK sick”, where well-meaning people will look at someone who is suffering, with no obvious symptoms, and imply that they really ought to be able to “be normal”.

As a high-energy introvert, I frequently participate in social events. I go to meet-ups and conferences and I engage in plenty of public speaking. I am, in a sense, comfortable extemporizing in front of large groups of strangers.

This all sounds like extroverted behavior, I know. But there’s a key difference.

Let me posit two axes for personality type: on the X axis, “introvert” to “extrovert”, and on the Y, “low energy” up to “high energy”.

The X axis describes what kinds of activities give you energy, and the Y axis describes how large your energy reserves are for the other type.

Notice that I didn’t say which type of activity you enjoy.

Most people who would self-describe as “introverts” are in the low-energy/introvert quadrant. They have a small amount of energy available for social activities, which they need to frequently re-charge by doing solitary activities. As a result of frequently running out of energy for social activities, they don’t enjoy social activities.

Most people who would self-describe as “extroverts” are also on the “low-energy” end of the spectrum. They have low levels of patience for solitary activity, and need to re-charge by spending time with friends, going to parties, etc, in order to have the mental fortitude to sit still for a while and focus. Since they can endlessly get more energy from the company of others, they tend to enjoy social activities quite a bit.

Therefore we have certain behaviors we expect to see from “introverts”. We expect them to be shy, and quiet, and withdrawn. When someone who behaves this way has to bail on a social engagement, this is expected. There’s a certain affordance for it. If you spend a few hours with them, they may be initially friendly but will visibly become uncomfortable and withdrawn.

This “energy” model of personality is of course an oversimplification - it’s my personal belief that everyone needs some balance of privacy and socialization and solitude and eventually overdoing one or the other will be bad for anyone - but it’s a useful one.

As a high-energy introvert, my behavior often confuses people. I’ll show up at a week’s worth of professional events, be the life of the party, go out to dinner at all of them, and then disappear for a month. I’m not visibily shy - quite the opposite, I’m a gregarious raconteur. In fact, I quite visibly enjoy the company of friends. So, usually, when I try to explain that I am quite introverted, this claim is met with (quite understandable) skepticism.

In fact, I am quite functionally what society expects of an “extrovert” - until I hit the wall.

In endurance sports, one is said to “hit the wall” at the point where all the short-term energy reserves in one’s muscles are exhausted, and there is a sudden, dramatic loss of energy. Regardless, many people enjoy endurance sports; part of the challenge of them is properly managing your energy.

This is true for me and social situations. I do enjoy social situations quite a bit! But they are nevertheless quite taxing for me, and without prolonged intermissions of solitude, eventually I get to the point where I can no longer behave as a normal social creature without an excruciating level of effort and anxiety.

Several years ago, I attended a prolonged social event2 where I hit the wall, hard. The event itself was several hours too long for me, involved meeting lots of strangers, and in the lead-up to it I hadn’t had a weekend to myself for a few weeks due to work commitments and family stuff. Towards the end I noticed I was developing a completely flat affect, and had to start very consciously performing even basic body language, like looking at someone while they were talking or smiling. I’d never been so exhausted and numb in my life; at the time I thought I was just stressed from work.

Afterwards though, I started having a lot of weird nightmares, even during the daytime. This concerned me, since I’d never had such a severe reaction to a social situation, and I didn’t have good language to describe it. It was also a little perplexing that what was effectively a nice party, the first half of which had even been fun for me, would cause such a persistent negative reaction after the fact. After some research, I eventually discovered that such involuntary thoughts are a hallmark of PTSD.

While I’ve managed to avoid this level of exhaustion before or since, this was a real learning experience for me that the consequences of incorrectly managing my level of social interaction can be quite severe.

I’d rather not do that again.

The reason I’m writing this, though3, is not to avoid future anxiety. My social energy reserves are quite large enough, and I now have enough self-knowledge, that it is extremely unlikely I’d ever find myself in that situation again.

The reason I’m writing is to help people understand that I’m not blowing them off because I don’t like them. Many times now, I’ve declined or bailed an invitation from someone, and later heard that they felt hurt that I was passive-aggressively refusing to be friendly.

I certainly understand this reaction. After all, if you see someone at a party and they’re clearly having a great time and chatting with everyone, but then when you invite them to do something, they say “sorry, too much social stuff”, that seems like a pretty passive-aggressive way to respond.

You might even still be skeptical after reading this. “Glyph, if you were really an introvert, surely, I would have seen you looking a little shy and withdrawn. Surely I’d see some evidence of stage fright before your talks.”

But that’s exactly the problem here: no, you wouldn’t.

At a social event, since I have lots of energy to begin with, I’ll build up a head of steam on burning said energy that no low-energy introvert would ever risk. If I were to run out of social-interaction-juice, I’d be in the middle of a big crowd telling a long and elaborate story when I find myself exhausted. If I hit the wall in that situation, I can’t feel a little awkward and make excuses and leave; I’ll be stuck creepily faking a smile like a sociopath and frantically looking for a way out of the converstaion for an hour, as the pressure from a large crowd of people rapidly builds up months worth of nightmare fuel from my spiraling energy deficit.

Given that I know that’s what’s going to happen, you won’t see me when I’m close to that line. You won’t be in at my desk when I silently sit and type for a whole day, or on my couch when I quietly read a book for ten hours at a time. My solitary side is, by definition, hidden.

But, if I don’t show up to your party, I promise: it’s not you, it’s me.

  1. In all seriousness: this is a comparison of kind and not of degree. I absolutely do not have any illusions that my minor mental issues are a serious disability. They are - by definition, since I do not have a diagnosis - subclinical. I am describing a minor annoyance and frequent miscommunication in this post, not a personal tragedy. 

  2. I’ll try to keep this anonymous, so hopefully you can’t guess - I don’t want to make anyone feel bad about this, since it was my poor time-management and not their (lovely!) event which caused the problem. 

  3. ... aside from the hope that maybe someone else has had trouble explaining the same thing, and this will be a useful resource for them ... 

by Glyph at September 17, 2016 09:18 PM

September 16, 2016

Itamar Turner-Trauring

Introducing the Programmer's Guide to a Sane Workweek

I'm working on a book: The Programmer's Guide to a Sane Workweek, a guide to how you can achieve a saner, shorter workweek. If you want to get a free course based on the the book signup in the email subscription at the end of the post. Meanwhile, here's the first excerpt from the book:

  • Are you tired of working evenings and weekends, of late projects and unrealistic deadlines?
  • Do you have children you want to see for more than just an hour in the evening after work?
  • Or do you want more time for side projects or to improve your programming skills?
  • In short, do you want a sane workweek?

A sane workweek is achievable: for the past 4 years I've been working less than 40 hours a week.

Soon after my daughter was born I quit my job as a product manager at Google and became a part-time consultant, writing software for clients. I wrote code for 20-something hours each week while our child was in daycare, and I spent the rest of my time taking care of our kid.

Later I got a job with one of my clients, a startup, where I worked as an employee but had a 28-hour workweek. These days I work at another startup, with a 35-hour workweek.

I'm not the only software engineer who has chosen to work a saner, shorter workweek. There are contractors who work part-time, spending the rest of their time starting their own business. There are employees with specialized skills who only work two days a week. There are even entrepreneurs who have deliberately created a business that isn't all-consuming.

Would you like to join us?

If you're a software developer working crazy hours then this book can help you get to a saner schedule. Of course what makes a schedule sane or crazy won't be the same for me as it is for you. You should spend some time thinking about what exactly it is that you want.

How much time do you want to spend working each week?

  • 40 hours?
  • 32 hours?
  • 20 hours?
  • Or do you never want to work again?

Depending on what you want there are different paths you can pursue.

Some paths to a saner workweek

Here are some ways you can reduce your workweek; I'll cover them in far more detail in later chapters of the book:

Normalizing your workweek

If you're working a lot more than 40 hours a week you always have the option of unilaterally normalizing your hours. That is, reducing your hours down to 40 hours or 45 hours or whatever you think is fair. Chances are your productivity and output will actually increase. You might face problems, however, if your employer cares more about hours "worked" than about output.

Reducing overhead

Chances are that the hours your employer counts as your work are just part of the time you spend on your job. In particular, commuting can take another large bite out your free time. Cut down on commuting and long lunch breaks and you've gotten some of that time back without any reduction in the hours your boss cares about.

Negotiating a shorter workweek at your current job

If you want a shorter-than-normal workweek you can try to negotiate that at your current job. Your manager doesn't want to replace a valued, trained employee: hiring new people is expensive and risky. That means you have an opening to negotiate shorter hours. This is one of the most common ways software engineers I know have reduced their hours.

Find a shorter workweek at a new job

If you're looking for a 40-hour workweek this is mostly about screening for a good company culture as part of your interview process. If you want a shorter-than-normal workweek you will need to negotiate a better job offer. That usually means your salary but you can sometimes negotiate shorter working hours. This path can be tricky; I've managed to do it, but have also been turned down, and I know of other people who have failed. It's easier if you've already worked for the company as a consultant, so they know what they're getting. Alternatively if your previous (ideally, your current) job gave you a shorter workweek you'll have better negotiating leverage.

Long-term contracts

Instead of working as an employee you can take on long-term contract work, often through an agency. The contract can specify how many hours you will work, and shorter workweeks are sometimes possible. You can even get paid overtime!


Instead of taking on long-term work, which is similar in many ways to being an employee, you go out and find project work for yourself. That means you need to spend something like half your time on marketing. By marketing well and providing high value to your clients you can charge high rates, allowing you to work reasonable hours.

Product business

All the paths so far involved exchanging money for time, in one form or another. As a software engineer you have another choice: you can make a product once and easily sell that same product multiple times. That means your income is no longer directly tied to how many hours you work. You'll need marketing and other business skills to do so, and you won't just be writing code.

Early retirement

Finally, if you don't want to work ever again there is the path of early retirement. That doesn't mean you can't get make money afterwards; it means you no longer have to make a living, you've earned enough that your time is your own. To get there you'll need very low living expenses, and a high saving rate while you're still working. Luckily programmers tend to get paid well.

Which path will you take?

Each of these paths has its own set of requirements and trade-offs, so it's worth considering which one fits your needs. At different times of your life you might prefer one path, and later you might prefer another. For example, I've worked as both a consultant and a part-time employee.

What kind of work environment do you want right now?

  • Do you want to work from your spare bedroom?
  • Do you like having co-workers?
  • Do you want to start your own business?
  • Do you want to just code, or do you want to expand your skills beyond that?

A later chapter will cover choosing your path in more detail. For now, take a little time to think it through and imagine what your ideal job would be like. Combine that with your weekly hours goal you should get some sense of which path is best for you.

It won't be easy

Working a sane workweek is not something corporate culture encourages, at least in the US. That means you won't be following the default, easy path that most workers do: you're going to need to do some work to get to your destination. In later chapters I'll explain how you can acquire the prerequisites for your chosen path, but for now here's a summary:

  • You'll need to get your engineering skills to a place where you're both productive and can work independently. As an employee this will help you negotiate with your employer. As a contractor or consultant it will help get you work.
  • You'll need to reduce your living expenses. You can then afford to work fewer hours, and the larger your savings in the bank the more time you can take to look for a new job. Plus it makes for a better negotiating position.
  • You'll need to be able to negotiate successfully, whether it's with your employer or with clients.
  • Finally, you'll need to have the self-confidence or stubbornness to choose and stick to a path that most people don't take.

How much do you really want to work a sane workweek? Do you care enough to make the necessary effort?

It won't be easy, but I think it's worth it.

Shall we get started? Sign up below to get a free course that will take you through the first steps of your journey.

September 16, 2016 04:00 AM

September 15, 2016

Moshe Zadka

Post-Object-Oriented Design

In the beginning, came the so-called “procedural” style. Data was data, and behavior, implemented as procedure, were separate things. Object-oriented design is the idea to bundle data and behavior into a single thing, usually called “classes”. In return for having to tie the two together, the thought went, we would get polymorphism.

Polymorphism is pretty neat. We send different objects the same message, for example, “turn yourself into a string”, and they respond appropriately — each according to their uniquely defined behavior.

But what if we could separate the data and beahvior, and still get polymorphism? This is the idea behind post-object-oriented design.

In Python, we achieve this with two external packages. One is the “attr” package. This package allows a useful way to define bundles of data, that still exhibit the minimum amount of behavior we do want: initialization, string representation, hashing and more.

The other is the “singledispatch” package (available as functools.singledispatch in Python 3.4+).

import attr
import singledispatch

In order to be specific, we imagine a simple protocol. The low-level details of the protocol do not concern us, but we assume some lower-level parsing allows us to communicate in dictionaries back and forth (perhaps serialized/deserialized using JSON).

Our protocol is one to send changes to a map. The only two messages are “set”, to set a key to a given value, and “delete”, to delete a key.

messages = (
    'type': 'set',
    'key': 'language',
    'value': 'python'
    'type': 'delete',
    'key': 'human'

We want to represent those as attr-based classes.

class Set(object):
    key = attr.ib()
    value = attr.ib()

class Delete(object):
    key = attr.ib()
print(Set(key='language', value='python'))
Set(key='language', value='python')

When incoming dictionaries arrive, we want to convert them to the logical classes. This code could not be simpler, in this example. (The reason is mostly because the protocol is simple.)

def from_dict(dct):
    tp = dct.pop('type')
    name_to_klass = dict(set=Set, delete=Delete)
        klass = name_to_klass[tp]
    except KeyError:
        raise ValueError('unknown type', tp)
    return klass(**dct)

Note how we take advantage of the fact that attr-based classes accept correctly-named keyword arguments.

from_dict(dict(type='set', key='name', value='myname')), from_dict(dict(type='delete', key='data'))
(Set(key='name', value='myname'), Delete(key='data'))

But this was easy! There was no need for polymorphism: we always get one type in (dictionaries), and we consult a mapping to decide which type to produce.

However, for serialization, we do need polymorphism. Enter our second tool — the singledispatch package. The default function is equivalent to a method defined on “object”: the ultimate super-class. Since we do not want to serialize generic objects, our default implementation errors out.

def to_dict(obj):
    raise TypeError("cannot serialize", obj)

Now, we implement the actual serializers. The names of the functions are not important. To emphasize they should not be used directly, we make them “private” by prepending an underscore.

def _to_dict_set(st):
    return dict(type='set', key=st.key, value=st.value)

def _to_dict_delete(dlt):
    return dict(type='delete', key=dlt.key)

Indeed, we do not call them directly.

print(to_dict(Set(key='k', value='v')))
{'type': 'set', 'value': 'v', 'key': 'k'}
{'type': 'delete', 'key': 'kk'}

However, arbitrary objects cannot be serialized.

except TypeError as e:
    print e
('cannot serialize', <object object at 0x7fbdb254ac60>)

Now that the structure of adding such an “external method” has been shown, another example can be given: “act on”: applying the changes requested to an in-memory map.

def act_on(command, d):
    raise TypeError("Cannot act on", command)

def act_on_set(st, d):
    d[st.key] = st.value

def act_on_delete(dlt, d):
    del d[dlt.key]

d = {}
act_on(Set(key='name', value='woohoo'), d)
print("After setting")
act_on(Delete(key='name'), d)
print("After deleting")
After setting
{'name': 'woohoo'}
After deleting

In this case, we kept the functionality “near” the code. However, note that the functionality could be implemented in a different module: these functions, even though they are polymorphic, follow Python namespace rules. This is useful: several different modules could implement “act_on”: for example, an in-memory map (as we defined above), a module using Redis or a module using a SQL database.

Actual methods are not completely obsolete. It would still be best to make methods do anything that would require private attribute access. In simple cases, as above, there is no difference between the public interface and the public implementation.

by moshez at September 15, 2016 06:03 AM

September 09, 2016

Itamar Turner-Trauring

How to choose a side project

If you're a programmer just starting out you'll often get told to work on side projects, beyond what you do at school or work. But there are so many things you could be doing: what should you be working on? How do you choose a side project you will actually finish? How will you make sure you're learning something?

Keep in mind that you don't actually have to work on side projects to be a good programmer. I know many successful software engineers who code only at their job and spend their free time on other hobbies. But if you do want to work on software in your spare time there are two different approaches you can take.

To understand these approaches let's consider a real side project that managed to simultaneously both succeed and fail.

Long ago, in an Internet far far away

Back in 2000 my friend Glyph started a small project called Twisted Reality. It was supposed to be a game engine, with the goal of implementing a particularly complex and sophisticated game.

Since the game had a chat system, and web server, and other means of communication the game grew a networking engine. Glyph and his friends hung out on the Internet Relay Chat (IRC) Python channel and every time someone asked a networking question they'd tell them "use Twisted Reality!" Over time more people would show up needing a small feature added to the networking engine, so they'd submit a patch. That's how I became a Twisted Reality contributor.

Eventually the networking engine grew so big that Twisted Reality was split into two projects: the Twisted networking framework and the Reality game engine. These days Twisted is used by companies like Apple, Cisco and Yelp, and is still going strong. The game engine has been through multiple rewrites, but the game it was designed for has never been built.

Approach #1: solving a problem

The difference between Twisted, a successful side project, and the game that never got written is that Twisted solved a specific, limited problem. If you need to write some networking code in Python then Twisted will help you get it done quickly and well. The game, however, was so ambitious that it was never started: there was always another simulation feature to be added to the game engine first.

If you are building a side project choose one that solves a specific, limited problem. For example, let's say you feel you're wasting time playing on Facebook when you should be doing homework.

  1. "Build the best time tracking app ever" is neither limited nor specific, nor is it really a problem you're solving.
  2. "I want to keep track of how much time I spend actually working on homework vs. procrastinating" is better, but still not quite problem-driven.
  3. A good problem statement is "I want to prevent myself from visiting Facebook and other specific websites while I'm working on homework." At this point you have a clear sense of what software you're building.

Why a specific and limited problem?

  • The problem statement will tell you whether you're making progress: are you any closer to solving the problem? Is the work you're doing actually related to the problem at all?
  • By limiting the problem you increase your chances of successfully building something usable. If you finish it and want to keep going, great, add another problem to expand its scope. But start with something small.

Approach #2: artificial limits

How do you choose a side project if you don't have any specific problems in mind? The key is to still have constraints and limits so that your project is small, achievable and has a clear goal.

One great way to do that is to set a time limit. I'm not a fan of hackathons, since they promote the idea that sleeplessness and working crazy hours is a reasonable way to write software. But with a longer time frame building something specific with a time limit can be a great way to create a side project.

The PyWeek project for example has you build a game in one week, using a theme chosen by the organizers. Building a game isn't solving a problem, but it can still be fun and educational. And the one week limit will ensure you focus your efforts and achieve something concrete.

Software has no value

Whether you decide to solve a problem or to set artificial time limits on your side project, the key is having constraints and a clear goal. Software is just a tool, there is no inherent value in producing more; value is produced by solving problems or the entertainment value of a game. A half-solved problem or a half-finished game are valueless, so you want your initial goal to be small and constrained.

I've learned this the hard way, focusing on the value of my code instead of on the problems it solved. If you want to avoid that and other mistakes I've made over 20 years of writing software check out my career as a Software Clown.

September 09, 2016 04:00 AM

August 28, 2016

Twisted Matrix Laboratories

Twisted 16.4.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.4.0.

The highlights of this release are:
  • twist, a new command line tool for running Twisted plugins, similar to twistd but with a simpler, cleaner interface.
  • A new interface for Protocols, IHandshakeListener, which tells Twisted to tell the Protocol when the TLS handshake has been completed.
  • async/await support for Deferreds, allowing you to write Python 3.5+ coroutines using Twisted
  • Trial can be invoked with "python -m twisted.trial".
  • All Twisted executables (trial, twistd, etc) are now Setuptools console scripts, meaning they will work much better on Windows.
  • 35+ more modules ported to Python 3, and many many cleanups on the way to Python 3 on Windows support.
  • All the security fixes of Twisted 16.3.1 + 16.3.2 (httpoxy, HTTP session identifier strengthening, HTTP+TLS consuming sockets)
  • 240+ closed tickets overall.
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

PS: Twisted 16.4.1 will be coming soon after this with a patch mitigating SWEET32, by updating the acceptable cipher list.

by HawkOwl ( at August 28, 2016 01:48 AM

August 25, 2016

Itamar Turner-Trauring

From 10x programmer to 0.1x programmer: creating more with less

You've heard of the mythical 10x programmers, programmers who can produce ten times as much as us normal humans. If you want to become a better programmer this myth is demoralizing, but it's also not useful: how can you write ten times as much code? On the other hand, consider the 0.1x programmer, a much more useful concept: anyone can choose to write only 10% code as much code as a normal programmer would. As they say in the business world, becoming a 0.1x programmer is actionable.

Of course writing less code might seem problematic, so let's refine our goal a little. Can you write 10% as much code as you do now and still do just as well at your job, still fixing the same amount of bugs, still implementing the same amount of features? The answer may still be "no", but at least this is a goal you can more easily work towards incrementally.

Doing more with less code

How do you do achieve just as much while writing less code?

1. Use a higher level programming language

As it turns out many of us are 0.1x programmers without even trying, compared to previous generations of programmers that were stuck with lower-level programming languages. If you don't have to worry about manual memory management or creating a data structure from scratch you can write much less code to achieve the same goal.

2. Use existing code

Instead of coding from scratch, use an existing library that achieves the same thing. For example, earlier this week I was looking at the problem of incrementing version numbers in source code and documentation as part of a release. A little searching and I found an open source tool that did exactly what I needed. Because it's been used by many people and improved over time chances are it's better designed, better tested, and less buggy than my first attempt would have been.

3. Spend some time thinking

Surprisingly spending more time planning up front can save you time in the long run. If you have 2 days to fix a bug it's worth spending 10% of that time, an hour and half, to think about how to solve it. Chances are the first solution you come up with in the first 5 minutes won't be the best solution, especially if it's a hard problem. Spend an hour more thinking and you might come up with a solution that takes two hours instead of two days.

4. Good enough features

Most feature requests have three parts:

  1. The stuff the customer must have.
  2. The stuff that is nice to have but not strictly necessary.
  3. The stuff the customer is willing to admit is not necessary.

The last category is usually dropped in advance, but you're usually still asked to implement the middle category of things that the customer and product manager really really want but aren't actually strictly necessary. So figure out the real minimum path to implement a feature, deliver it, and much of the time it'll turn out that no one will miss those nice-to-have additions.

5. Drop the feature altogether

Some features don't need to be done at all. Some features are better done a completely different way than requested.

Instead of saying "yes, I'll do that" to every feature request, make sure you understand why someone needs the feature, and always consider alternatives. If you come up with a faster, superior idea the customer or product manager will usually be happy to go along with your suggestion.

6. Drop the product altogether

Sometimes your whole product is not worth doing: it will have no customers, will garner no interest. Spending months and months on a product no one will ever use is a waste of time, not to mention depressing.

Lean Startup is one methodology for dealing with this: before you spend any time developing the product you do the minimal work possible to figure out if it's worth doing in the first place.


Your goal as programmer is not to write code, your goal is to solve problems. From low-level programming decisions to high-level business decisions there are many ways you can solve problems with less code. So don't start with "how do I write this code?", start with "how do I solve this problem?" Sometimes you'll do better not solving the problem at all, or redefining it. As you get better at solving problems with less code you will find yourself becoming more productive, especially if you start looking at the big picture.

Being productive is a great help if you're tired of working crazy hours. Want a shorter workweek? Check out The Programmer's Guide to a Sane Workweek.

August 25, 2016 04:00 AM

Moshe Zadka

Time Series Data

When operating computers, we are often exposed to so-called “time series”. Whether it is database latency, page fault rate or total memory used, these are all exposed as numbers that are usually sampled at frequent intervals.

However, not only computer engineers are exposed to such data. It is worthwhile to know what other disciplines are exposed to such data, and what they do with it. “Earth sciences” (geology, climate, etc.) have a lot of numbers, and often need to analyze trends and make predictions. Sometimes these predictions have, literally, billions dollars’ worth of decision hinging on them. It is worthwhile to read some of the textbooks for students of those disciplines to see how to approach those series.

Another discipline that needs to visually inspect time series data is physicians. EKG data is often vital to analyze patients’ health — and especially when compared to their historical records. For that, that data needs to be saved. A lot of EKG research has been done on how to compress numerical data, but still keep it “visually the same”. While the research on that is not as rigorous, and not as settled, as the trend analysis in geology, it is still useful to look into. Indeed, even the basics are already better than so-called “roll-ups”, which preserve none of the visual distinction of the data, flattening peaks and filling hills while keeping a score of “standard deviation” that is not as helpful as is usually hoped for.

by moshez at August 25, 2016 03:50 AM

August 24, 2016

Hynek Schlawack

Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

by Hynek Schlawack ( at August 24, 2016 03:40 PM

August 22, 2016

Hynek Schlawack

Better Python Object Serialization

The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects – for example to JSON in web APIs and structured logs.

by Hynek Schlawack ( at August 22, 2016 12:30 PM

August 20, 2016

Moshe Zadka

Extension API: An exercise in a negative case study

I was idly contemplating implementing a new Jupyter kernel. Luckily, they try to provide facility to make it easier. Unfortunately, they made a number of suboptimal choices in their API. Fortunately, those mistakes are both common and easily avoidable.

Subclassing as API

They suggest subclassing IPython.kernel.zmq.kernelbase.Kernel. Errr…not “suggest”. It is a “required step”. The reason is probably that this class already implements 21 methods. When you subclass, make sure to not use any of these names, or things will break randomly. If you do not want to subclass, good luck figuring out what the assumption that the system makes about these 21 methods because there is no interface or even prose documentation.

The return statement in their example is particularly illuminating:

        return {'status': 'ok',
                # The base class increments the execution count
                'execution_count': self.execution_count,
                'payload': [],
                'user_expressions': {},

Note the comment “base class increments the execution count”. This is a classic code smell: this seems like this would be needed in every single overrider, which means it really belongs in the helper class, not in every kernel.


The signature for the example do_execute is:

    def do_execute(self, code, silent, store_history=True, 

Of course, this means that user_expressions will sometimes be a dictionary and sometimes None. It is likely that the code will be written to anticipate one or the other, and will fail in interesting ways if None is actually sent.

Optional Overrides

As described in this section there are also ways to make the kernel better with optional overrides. The convention used, which is nowhere explained, is that do_ methods mean you should override to make a better kernel. Nowhere it is explained why there is no default history implementation, or where to get one, or why a simple stupid implementation is wrong.



All overrides return dictionaries, which get serialized directly into the underlying communication platform. This is a poor abstraction, especially when the documentation is direct links to the underlying protocol. When wrapping a protocol, it is much nicer to use an Interface as the documentation of what is assumed — and define an attr.s-based class to allow returning something which is automatically the correct type, and will fail in nice ways if a parameter is forgotten.


If you are providing an API, here are a few positive lessons based on the issues above:

  • You should expect interfaces, not subclasses. Use composition, not subclassing.If you want to provide a default implementation in composition, just check for a return of NotImplemeted(), and use the default.
  • Do the work of abstracting your customers from the need to use dictionaries and unwrap automatically. Use attr.s to avoid customer boilerplate.
  • Send all arguments. Isolate your customers from the need to come up with sane defaults.
  • As much as possible, try to have your interfaces be side-effect free. Instead of asking the customer to directly make a change, allow the customer to make the “needed change” be part of the return type. This will let the customers test their class much more easily.

by moshez at August 20, 2016 06:56 PM

August 19, 2016

Twisted Matrix Laboratories

Twisted 16.3.2 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.3.2.

This is a bug fix & security fix release, and is recommended for all users of Twisted. The fixes are:
  • A bugfix for a HTTP/2 edge case, (included in 16.3.1)
  • Fix for CVE-2008-7317 (generating potentially guessable HTTP session identifiers) (included in 16.3.1)
  • Fix for CVE-2008-7318 (sending secure session cookies over insecured connections) (included in 16.3.1)
  • Fix for CVE-2016-1000111 ( (included in 16.3.1)
  • Twisted's HTTP server, when operating over TLS, would not cleanly close sockets, causing it to build up CLOSE_WAIT sockets until it would eventually run out of file descriptors.
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available at on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by HawkOwl ( at August 19, 2016 09:45 AM

August 18, 2016

Jonathan Lange

Patterns are half-formed code

If “technology is stuff that doesn’t work yet”[1], then patterns are code we don’t know how to write yet.

In the Go Programming Language, the authors show how to iterate over elements in a map, sorted by keys:

To enumerate the key/value pairs in order, we must sort the keys explicitly, for instances, using the Strings function from the sort package if the keys are strings. This is a common pattern.

—Go Programming Language, Alan A. A. Donovan & Brian W. Kernighan, p94

The pattern is illustrated by the following code:

import "sort"

var names []string
for name := range ages {
    name = append(names, name)
for _, name := range names {
    fmt.Printf("%s\t%d\n", name, ages[name])

Peter Norvig calls this an informal design pattern: something referred to by name (“iterate through items in a map in order of keys”) and re-implemented from scratch each time it’s needed.

Informal patterns have their place but they are a larval form of knowledge, stuck halfway between intuition and formal understanding. When we see a recognize a pattern, our next step should always be to ask, “can we make it go away?”

Patterns are one way of expressing “how to” knowledge [2] but we have another, better way: code. Source code is a formal expression of “how to” knowledge that we can execute, test, manipulate, verify, compose, and re-use. Encoding “how to” knowledge is largely what programming is [3]. We talk about replacing people with programs precisely because we take the knowledge about how to do their job and encode it such that even a machine can understand it.

So how can we encode the knowledge of iterating through the items in a map in order of keys? How can we replace this pattern with code?

We can start by following Peter Norvig’s example and reach for a dynamic programming language, such as Python:

names = []
for name in ages:
for name in names:
    print("{}\t{}".format(name, ages[name]))

This is a very literal translation of the first snippet. A more idiomatic approach would look like:

names = sorted(ages.keys())
for name in names:
    print("{}\t{}".format(name, ages[name])

To turn this into a formal pattern, we need to extract a function that takes a map and returns a list of pairs of (key, value) in sorted order, like so:

def sorted_items(d):
    result = []
    sorted_keys = sorted(d.keys())
    for k in sorted_keys:
        result.append((k, d[k]))
    return result

for name, age in sorted_items(ages):
    print("{}\t{}".format(name, age))

The pattern has become a function. Instead of a name or a description, it has an identifier, a True Name that gives us power over the thing. When we invoke it we don’t need to comment our code to indicate that we are using a pattern because the name sorted_items makes it clear. If we choose, we can test it, optimize it, or perhaps even prove its correctness.

If we figure out a better way of doing it, such as:

def sorted_items(d):
    return [(k, d[k]) for k in sorted(d.keys())]

Then we only have to change one place.

And if we are willing to tolerate a slight change in behavior,

def sorted_items(d):
    return sorted(d.items())

Then we might not need the function at all.

It was being able to write code like this that drew me towards Python and away from Java, way back in 2001. It wasn’t just that I could get more done in fewer lines—although that helped—it was that I could write what I meant.

Of course, these days I’d much rather write:

import Data.List (sort)
import qualified Data.HashMap as Map

sortedItems :: (Ord k, Ord v) => Map.Map k v -> [(k, v)]
sortedItems d = sort (Map.toList d)

But that’s another story.

[1]Bran Ferren, via Douglas Adams
[2]Patterns can also contain “when to”, “why to”, “why not to”, and “how much” knowledge, but they _always_ contain “how to” knowledge.
[3]The excellent SICP lectures open with the insight that what we call “computer science” might be the very beginning of a science of “how to” knowledge.

by Jonathan Lange at August 18, 2016 05:00 PM

Itamar Turner-Trauring

Less stress, more productivity: why working fewer hours is better for you and your employer

Update: This post got to #1 on Hacker News and the /r/programming subreddit, and had over 40,000 views. Given that level of interest in the subject I've decided to write The Programmer's Guide to a Sane Workweek.

There's always too much work to be done on software projects, too many features to implement, too many bugs to fix. Some days you're just not going through the backlog fast enough, you're not producing enough code, and it's taking too long to fix a seemingly-impossible bug. And to make things worse you're wasting time in pointless meetings instead of getting work done.

Once it gets bad enough you can find yourself always scrambling, working overtime just to keep up. Pretty soon it's just expected, and you need to be available to answer emails at all hours even when there are no emergencies. You're tired and burnt out and there's still just as much work as before.

The real solution is not working even harder or even longer, but rather the complete opposite: working fewer hours.

Some caveats first:

  • The more experienced you are the better this will work. If this is your first year working after school you may need to just deal with it until you can find a better job, which you should do ASAP.
  • Working fewer hours is effectively a new deal you are negotiating with your employer. If you're living from paycheck to paycheck you have no negotiating leverage, so the first thing you need to do is make sure you have some savings in the bank.

Fewer hours, more productivity

Why does working longer hours not improve the situation? Because working longer makes you less productive at the same time that it encourages bad practices by your boss. Working fewer hours does the opposite.

1. A shorter work-week improves your ability to focus

As I've discussed before, working while tired is counter-productive. It takes longer and longer to solve problems, and you very quickly hit the point of diminishing returns. And working consistently for long hours is even worse for your mental focus, since you will quickly burn out.

Long hours: "It's 5 o'clock and I should be done with work, but I just need to finish this problem, just one more try," you tell yourself. But being tired it actually takes you another three hours to solve. The next day you go to work tired and unfocused.

Shorter hours: "It's 5 o'clock and I wish I had this fixed, but I guess I'll try tomorrow morning." The next morning, refreshed, you solve the problem in 10 minutes.

2. A shorter work-week promotes smarter solutions

Working longer hours encourages bad programming habits: you start thinking that the way to solve problems is just forcing yourself to get through the work. But programming is all about automation, about building abstractions to reduce work. Often you can get huge reductions in effort by figuring out a better way to implement an API, or that a particular piece of functionality is not actually necessary.

Let's imagine your boss hands you a task that must ship to your customer in 2 weeks. And you estimate that optimistically it will take you 3 weeks to implement.

Long hours: "This needs to ship in two weeks, but I think it's 120 hours to complete... so I guess I'm working evenings and weekends again." You end up even more burnt out, and probably the feature will still ship late.

Shorter hours: "I've got two weeks, but this is way too much work. What can I do to reduce the scope? Guess I'll spend a couple hours thinking about it."

And soon: "Oh, if I do this restructuring I can get 80% of the feature done in one week, and that'll probably keep the customer happy until I finish the rest. And even if I underestimated I've still got the second week to get that part done."

3. A shorter work-week discourages bad management practices

If your response to any issue is to work longer hours you are encouraging bad management practices. You are effectively telling your manager that your time is not valuable, and that they need not prioritize accordingly.

Long hours: If your manager isn't sure whether you should go to a meeting, they might tell themselves that "it might waste an hour of time, but they'll just work an extra hour in the evening to make it up." If your manager can't decide between two features, they'll just hand you both instead of making a hard decision.

Shorter hours: With shorter hours your time becomes more scarce and valuable. If your manager is at all reasonable less important meetings will get skipped and more important features will be prioritized.

Getting to fewer hours

A short work-week mean different things to different people. One programmer I know made clear when she started a job at a startup that she worked 40-45 hours a week and that's it. Everyone else worked much longer hours, but that was her personal limit. Personally I have negotiated a 35-hour work week.

Whatever the number that makes sense to you, the key is to clearly explain your limits and then stick to them. Tell you manager "I am going to be working a 40-hour work week, unless it's a real emergency." Once you've explained your limits you need to stick to them: no answering emails after hours, no agreeing to do just one little thing on the weekend.

And then you need to prove yourself by still being productive, and making sure that when you are working you are working. Spending a couple hours a day at work watching cat videos probably won't go well with shorter hours.

There are companies where this won't fly, of course, where management is so bad or norms are so out of whack that even a 40-hour work week by a productive team member won't be acceptable. In those cases you need to look for a new job, and as part of the interview figure out the work culture and project management practices of prospective employers. Do people work short hours or long hours? Is everything always on fire or do projects get delivered on time?

Whether you're negotiating your hours at your existing job or at a new job, you'll do better the more experienced and skilled of a programmer you are. If you want to learn how to get there check out The Programmer's Guide to a Sane Workweek.

August 18, 2016 04:00 AM

August 17, 2016

Glyph Lefkowitz

Probably best to get this out of the way before this weekend:

If I meet you at a technical conference, you’ll probably see me extend my elbow in your direction, rather than my hand. This is because I won’t shake your hand at a conference.

People sometimes joke about “con crud”, but the amount of lost productivity and human misery generated by conference-transmitted sickness is not funny. Personally, by the time the year is out, I will most likely have attended 5 conferences. This means that if I get sick at each one, I will spend more than a month out of the year out of commission being sick.

When I tell people this, they think I’m a germophobe. But, in all likelihood, I won’t be the one getting sick. I already have 10 years of building up herd immunity to the set of minor ailments that afflict the international Python-conference-attending community. It’s true that I don’t particularly want to get sick myself, but I happily shake people’s hands in more moderately-sized social gatherings. I’ve had a cold before and I’ve had one again; I have no illusion that ritually dousing myself in Purell every day will make me immune to all disease.

I’m not shaking your hand because I don’t want you to get sick. Please don’t be weird about it!

by Glyph at August 17, 2016 06:42 PM

August 14, 2016

Glyph Lefkowitz

A Container Is A Function Call

It seems to me that the prevailing mental model among users of container technology1 right now is that a container is a tiny little virtual machine. It’s like a machine in the sense that it is provisioned and deprovisioned by explicit decisions, and we talk about “booting” containers. We configure it sort of like we configure a machine; dropping a bunch of files into a volume, setting some environment variables.

In my mind though, a container is something fundamentally different than a VM. Rather than coming from the perspective of “let’s take a VM and make it smaller so we can do cool stuff” - get rid of the kernel, get rid of fixed memory allocations, get rid of emulated memory access and instructions, so we can provision more of them at higher density... I’m coming at it from the opposite direction.

For me, containers are “let’s take a program and made it bigger so we can do cool stuff”. Let’s add in the whole user-space filesystem so it’s got all the same bits every time, so we don’t need to worry about library management, so we can ship it around from computer to computer as a self-contained unit. Awesome!

Of course, there are other ecosystems that figured this out a really long time ago, but having it as a commodity within the most popular server deployment environment has changed things.

Of course, an individual container isn’t a whole program. That’s why we need tools like compose to put containers together into a functioning whole. This makes a container not just a program, but rather, a part of a program. And of course, we all know what the smaller parts of a program are called:


A container of course is not the function itself; the image is the function. A container itself is a function call.

Perceived through this lens, it becomes apparent that Docker is missing some pretty important information. As a tiny VM, it has all the parts you need: it has an operating system (in the docker build) the ability to boot and reboot (docker run), instrumentation (docker inspect) debugging (docker exec) etc. As a really big function, it’s strangely anemic.

Specifically: in every programming language worth its salt, we have a type system; some mechanism to identify what parameters a function will take, and what return value it will have.

You might find this weird coming from a Python person, a language where

def foo(a, b, c):
    return a.x(c.d(b))

is considered an acceptable level of type documentation by some3; there’s no requirement to say what a, b, and c are. However, just because the type system is implicit, that doesn’t mean it’s not there, even in the text of the program. Let’s consider, from reading this tiny example, what we can discover:

  • foo takes 3 arguments, their names are “a”, “b”, and “c”, and it returns a value.
  • Somewhere else in the codebase there’s an object with an x method, which takes a single argument and also returns a value.
  • The type of <unknown>.x’s argument is the same as the return type of another method somewhere in the codebase, <unknown-2>.d

And so on, and so on. At runtime each of these types takes on a specific, concrete value, with a type, and if you set a breakpoint and single-step into it with a debugger, you can see each of those types very easily. Also at runtime you will get TypeError exceptions telling you exactly what was wrong with what you tried to do at a number of points, if you make a mistake.

The analogy to containers isn’t exact; inputs and outputs aren’t obviously in the shape of “arguments” and “return values”, especially since containers tend to be long-running; but nevertheless, a container does have inputs and outputs in the form of env vars, network services, and volumes.

Let’s consider the “foo” of docker, which would be the middle tier of a 3-tier web application (cribbed from a real live example):

FROM pypy:2
RUN apt-get update -ym
RUN apt-get upgrade -ym
RUN apt-get install -ym libssl-dev libffi-dev
RUN pip install virtualenv
RUN mkdir -p /code/env
RUN virtualenv /code/env
RUN pwd

COPY requirements.txt /code/requirements.txt
RUN /code/env/bin/pip install -r /code/requirements.txt
COPY main /code/main
RUN chmod a+x /code/main

VOLUME /site
VOLUME /etc/ssl/private

ENTRYPOINT ["/code/main"]

In this file, we can only see three inputs, which are filesystem locations: /clf, /site, and /etc/ssl/private. How is this different than our Python example, a language with supposedly “no type information”?

  • The image has no metadata explaining what might go in those locations, or what roles they serve. We have no way to annotate them within the Dockerfile.
  • What services does this container need to connect to in order to get its job done? What hostnames will it connect to, what ports, and what will it expect to find there? We have no way of knowing. It doesn’t say. Any errors about the failed connections will come in a custom format, possibly in logs, from the application itself, and not from docker.
  • What services does this container export? It could have used an EXPOSE line to give us a hint, but it doesn’t need to; and even if it did, all we’d have is a port number.
  • What environment variables does its code require? What format do they need to be in?
  • We do know that we could look in requirements.txt to figure out what libraries are going to be used, but in order to figure out what the service dependencies are, we’re going to need to read all of the code to all of them.

Of course, the one way that this example is unrealistic is that I deleted all the comments explaining all of those things. Indeed, best practice these days would be to include comments in your Dockerfiles, and include example compose files in your repository, to give users some hint as to how these things all wire together.

This sort of state isn’t entirely uncommon in programming languages. In fact, in this popular GitHub project you can see that large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.

That is the current state of the container ecosystem. We are at the “late ’60s assembly language” stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.

When you’re building an image, you’re building it for a particular purpose. You already pretty much know what you’re trying to do and what you’re going to need to do it.

  1. When instantiated, the image is going to consume network services. This is not just a matter of hostnames and TCP ports; those services need to be providing a specific service, over a specific protocol. A generic reverse proxy might be able to handle an arbitrary HTTP endpoint, but an API client needs that specific API. A database admin tool might be OK with just “it’s a database” but an application needs a particular schema.
  2. It’s going to consume environment variables. But not just any variables; the variables have to be in a particular format.
  3. It’s going to consume volumes. The volumes need to contain data in a particular format, readable and writable by a particular UID.
  4. It’s also going to produce all of these things; it may listen on a network service port, provision a database schema, or emit some text that needs to be fed back into an environment variable elsewhere.

Here’s a brief sketch of what I want to see in a Dockerfile to allow me to express this sort of thing:

FROM ...
RUN ...

LISTENS ON: TCP:80 FOR: org.ietf.http/
CONNECTS TO: pgwritemaster.internal ON: TCP:5432 FOR: org.postgresql.db/
CONNECTS TO: {{ETCD_HOST}} ON: TCP:{{ETCD_PORT}} FOR: com.coreos.etcd/client-communication
ENVIRONMENT NEEDS: ETCD_HOST FORMAT: HOST(com.coreos.etcd/client-communication)
ENVIRONMENT NEEDS: ETCD_PORT FORMAT: PORT(com.coreos.etcd/client-communication)
VOLUME AT: /logs FORMAT: org.w3.clf REQUIRES: WRITE UID: 4321

An image thusly built would refuse to run unless:

  • Somewhere else on its network, there was an etcd host/port known to it, its host and port supplied via environment variables.
  • Somewhere else on its network, there was a postgres host, listening on port 5432, with a name-resolution entry of “pgwritemaster.internal”.
  • An environment variable for the etcd configuration was supplied
  • A writable volume for /logs was supplied, owned by user-ID 4321 where it could write common log format logs.

There are probably a lot of flaws in the specific syntax here, but I hope you can see past that, to the broader point that the software inside a container has precise expectations of its environment, and that we presently have no way of communicating those expectations beyond writing a Melvilleian essay in each Dockerfile comments, beseeching those who would run the image to give it what it needs.

Why bother with this sort of work, if all the image can do with it is “refuse to run”?

First and foremost, today, the image effectively won’t run. Oh, it’ll start up, and it’ll consume some resources, but it will break when you try to do anything with it. What this metadata will allow the container runtime to do is to tell you why the image didn’t run, and give you specific, actionable, fast feedback about what you need to do in order to fix the problem. You won’t have to go groveling through logs; which is always especially hard if the back-end service you forgot to properly connect to was the log aggregation service. So this will be an order of magnitude speed improvement on initial deployments and development-environment setups for utility containers. Whole applications typically already come with a compose file, of course, but ideally applications would be built out of functioning self-contained pieces and not assembled one custom container at a time.

Secondly, if there were a strong tooling standard for providing this metadata within the image itself, it might become possible for infrastructure service providers (like, ahem, my employer) to automatically detect and satisfy service dependencies. Right now, if you have a database as a service that lives outside the container system in production, but within the container system in development and test, there’s no way for the orchestration layer to say “good news, everyone! you can find the database you need here: ...”.

My main interest is in allowing open source software developers to give service operators exactly what they need, so the upstream developers can get useful bug reports. There’s a constant tension where volunteer software developers find themselves fielding bug reports where someone deployed their code in a weird way, hacked it up to support some strange environment, built a derived container that had all kinds of extra junk in it to support service discovery or logging or somesuch, and so they don’t want to deal with the support load that that generates. Both people in that exchange are behaving reasonably. The developers gave the ops folks a container that runs their software to the best of their abilities. The service vendors made the minimal modifications they needed to have the container become a part of their service fabric. Yet we arrive at a scenario where nobody feels responsible for the resulting artifact.

If we could just say what it is that the container needs in order to really work, in a way which was precise and machine-readable, then it would be clear where the responsibility lies. Service providers could just run the container unmodified, and they’d know very clearly whether or not they’d satisfied its runtime requirements. Open source developers - or even commercial service vendors! - could say very clearly what they expected to be passed in, and when they got bug reports, they’d know exactly how their service should have behaved.

  1. which mostly but not entirely just means “docker”; it’s weird, of course, because there are pieces that docker depends on and tools that build upon docker which are part of this, but docker remains the nexus. 

  2. Yes yes, I know that they’re not really functions Tristan, they’re subroutines, but that’s the word people use for “subroutines” nowadays. 

  3. Just to be clear: no it isn’t. Write a damn docstring, or at least some type annotations

by Glyph at August 14, 2016 10:22 PM

Python Packaging Is Good Now

Okay folks. Time’s up. It’s too late to say that Python’s packaging ecosystem terrible any more. I’m calling it.

Python packaging is not bad any more. If you’re a developer, and you’re trying to create or consume Python libraries, it can be a tractable, even pleasant experience.

I need to say this, because for a long time, Python’s packaging toolchain was … problematic. It isn’t any more, but a lot of people still seem to think that it is, so it’s time to set the record straight.

If you’re not familiar with the history it went something like this:

The Dawn

Python first shipped in an era when adding a dependency meant a veritable Odyssey into cyberspace. First, you’d wait until nobody in your whole family was using the phone line. Then you’d dial your ISP. Once you’d finished fighting your SLIP or PPP client, you’d ask a netnews group if anyone knew of a good gopher site to find a library that could solve your problem. Once you were done with that task, you’d sign off the Internet for the night, and wait about 48 hours too see if anyone responded. If you were lucky enough to get a reply, you’d set up a download at the end of your night’s web-surfing.

pip search it wasn’t.

For the time, Python’s approach to dependency-handling was incredibly forward-looking. The import statement, and the pluggable module import system, made it easy to get dependencies from wherever made sense.

In Python 2.01, Distutils was introduced. This let Python developers describe their collections of modules abstractly, and added tool support to producing redistributable collections of modules and packages. Again, this was tremendously forward-looking, if somewhat primitive; there was very little to compare it to at the time.

Fast forwarding to 2004; setuptools was created to address some of the increasingly-common tasks that open source software maintainers were facing with distributing their modules over the internet. In 2005, it added easy_install, in order to provide a tool to automate resolving dependencies and downloading them into the right locations.

The Dark Age

Unfortunately, in addition to providing basic utilities for expressing dependencies, setuptools also dragged in a tremendous amount of complexity. Its author felt that import should do something slightly different than what it does, so installing setuptools changed it. The main difference between normal import and setuptools import was that it facilitated having multiple different versions of the same library in the same program at the same time. It turns out that that’s a dumb idea, but in fairness, it wasn’t entirely clear at the time, and it is certainly useful (and necessary!) to be able to have multiple versions of a library installed onto a computer at the same time.

In addition to these idiosyncratic departures from standard Python semantics, setuptools suffered from being unmaintained. It became a critical part of the Python ecosystem at the same time as the author was moving on to other projects entirely outside of programming. No-one could agree on who the new maintainers should be for a long period of time. The project was forked, and many operating systems’ packaging toolchains calcified around a buggy, ancient version.

From 2008 to 2012 or so, Python packaging was a total mess. It was painful to use. It was not clear which libraries or tools to use, which ones were worth investing in or learning. Doing things the simple way was too tedious, and doing things the automated way involved lots of poorly-documented workarounds and inscrutable failure modes.

This is to say nothing of the fact that there were critical security flaws in various parts of this toolchain. There was no practical way to package and upload Python packages in such a way that users didn’t need a full compiler toolchain for their platform.

To make matters worse for the popular perception of Python’s packaging prowess2, at this same time, newer languages and environments were getting a lot of buzz, ones that had packaging built in at the very beginning and had a much better binary distribution story. These environments learned lessons from the screw-ups of Python and Perl, and really got a lot of things right from the start.

Finally, the Python Package Index, the site which hosts all the open source packages uploaded by the Python community, was basically a proof-of-concept that went live way too early, had almost no operational resources, and was offline all the dang time.

Things were looking pretty bad for Python.


Here is where we get to the point of this post - this is where popular opinion about Python packaging is stuck. Outdated information from this period abounds. Blog posts complaining about problems score high in web searches. Those who used Python during this time, but have now moved on to some other language, frequently scoff and dismiss Python as impossible to package, its packaging ecosystem as broken, PyPI as down all the time, and so on. Worst of all, bad advice for workarounds which are no longer necessary are still easy to find, which causes users to pre-emptively break their environments where they really don’t need to.

From The Ashes

In the midst of all this brokenness, there were some who were heroically, quietly, slowly fixing the mess, one gnarly bug-report at a time. pip was started, and its various maintainers fixed much of easy_install’s overcomplexity and many of its flaws. Donald Stufft stepped in both on Pip and PyPI and improved the availability of the systems it depended upon, as well as some pretty serious vulnerabilities in the tool itself. Daniel Holth wrote a PEP for the wheel format, which allows for binary redistribution of libraries. In other words, it lets authors of packages which need a C compiler to build give their users a way to not have one.

In 2013, setuptools and distribute un-forked, providing a path forward for operating system vendors to start updating their installations and allowing users to use something modern.

Python Core started distributing the ensurepip module along with both Python 2.7 and 3.3, allowing any user with a recent Python installed to quickly bootstrap into a sensible Python development environment with a one-liner.

A New Renaissance

I won’t give you a full run-down of the state of the packaging art. There’s already a website for that. I will, however, give you a précis of how much easier it is to get started nowadays. Today, if you want to get a sensible, up-to-date python development environment, without administrative privileges, all you have to do is:

$ python -m ensurepip --user
$ python -m pip install --user --upgrade pip
$ python -m pip install --user --upgrade virtualenv

Then, for each project you want to do, make a new virtualenv:

$ python -m virtualenv lets-go
$ . ./lets-go/bin/activate
(lets-go) $ _

From here on out, now the world is your oyster; you can pip install to your heart’s content, and you probably won’t even need to compile any C for most packages. These instructions don’t depend on Python version, either: as long as it’s up-to-date, the same steps work on Python 2, Python 3, PyPy and even Jython. In fact, often the ensurepip step isn’t even necessary since pip comes preinstalled. Running it if it’s unnecessary is harmless, even!

Other, more advanced packaging operations are much simpler than they used to be, too.

  • Need a C compiler? OS vendors have been working with the open source community to make this easier across the board:
    $ apt install build-essential python-dev # ubuntu
    $ xcode-select --install # macOS
    $ dnf install @development-tools python-devel # fedora
    C:\> REM windows
    C:\> start

Okay that last one’s not as obvious as it ought to be but they did at least make it freely available!

  • Want to upload some stuff to PyPI? This should do it for almost any project:

    $ pip install twine
    $ python sdist bdist_wheel
    $ twine upload dist/*
  • Want to build wheels for the wild and wooly world of Linux? There’s an app4 for that.

Importantly, PyPI will almost certainly be online. Not only that, but a new, revamped site will be “launching” any day now3.

Again, this isn’t a comprehensive resource; I just want to give you an idea of what’s possible. But, as a deeply experienced Python expert I used to swear at these tools six times a day for years; the most serious Python packaging issue I’ve had this year to date was fixed by cleaning up my git repo to delete a cache file.

Work Still To Do

While the current situation is good, it’s still not great.

Here are just a few of my desiderata:

  • We still need better and more universally agreed-upon tooling for end-user deployments.
  • Pip should have a GUI frontend so that users can write Python stuff without learning as much command-line arcana.
  • There should be tools that help you write and update a Or a setup.python.json or something, so you don’t actually need to write code just to ship some metadata.
  • The error messages that you get when you try to build something that needs a C compiler and it doesn’t work should be clearer and more actionable for users who don’t already know what they mean.
  • PyPI should automatically build wheels for all platforms by default when you upload sdists; this is a huge project, of course, but it would be super awesome default behavior.

I could go on. There are lots of ways that Python packaging could be better.

The Bottom Line

The real takeaway here though, is that although it’s still not perfect, other languages are no longer doing appreciably better. Go is still working through a number of different options regarding dependency management and vendoring, and, like Python extensions that require C dependencies, CGo is sometimes necessary and always a problem. Node has had its own well-publicized problems with their dependency management culture and package manager. Hackage is cool and all but everything takes a literal geological epoch to compile.

As always, I’m sure none of this applies to Rust and Cargo is basically perfect, but that doesn’t matter, because nobody reading this is actually using Rust.

My point is not that packaging in any of these languages is particularly bad. They’re all actually doing pretty well, especially compared to the state of the general programming ecosystem a few years ago; many of them are making regular progress towards user-facing improvements.

My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date. Working with Python packaging is more or less fine right now. It could be better, but lots of people are working on improving it, and the structural problems that prevented those improvements from being adopted by the community in a timely manner have almost all been addressed.

Go! Make some virtualenvs! Hack some setup.pys! If it’s been a while and your last experience was really miserable, I promise, it’s better now.

Am I wrong? Did I screw up a detail of your favorite language? Did I forget to mention the one language environment that has a completely perfect, flawless packaging story? Do you feel the need to just yell at a stranger on the Internet about picayune details? Feel free to get in touch!

  1. released in October, 2000 

  2. say that five times fast. 

  3. although I’m not sure what it means to “launch” when the site is online, and running against the production data-store, and you can use it for pretty much everything... 

  4. “app” meaning of course “docker container” 

by Glyph at August 14, 2016 09:17 AM

What’s In A Name

Amber’s excellent lightning talk on identity yesterday made me feel many feels, and reminded me of this excellent post by Patrick McKenzie about false assumptions regarding names.

While that list is helpful, it’s very light on positively-framed advice, i.e. “you should” rather than “you shouldn’t”. So I feel like I want to give a little bit of specific, prescriptive advice to programmers who might need to deal with names.

First and foremost: stop asking for unnecessary information. If I’m just authenticating to your system to download a comic book, you do not need to know my name. Your payment provider might need a billing address, but you absolutely do not need to store my name.

Okay, okay. I understand that may make your system seem a little impersonal, and you want to be able to greet me, or maybe have a name to show to other users beyond my login ID or email address that has to be unique on the site. Fine. Here’s what a good “name” field looks like:

You don’t need to break my name down into parts. If you just need a way to refer to me, then let me tell you whatever the heck I want. Honorific? Maybe I have more than one; maybe I don’t want you to use any.

And this brings me to “first name / last name”.

In most cases, you should not use these terms. They are oversimplifications of how names work, appropriate only for children in English-speaking countries who might not understand the subtleties involved and only need to know that one name comes before the other.

The terms you’re looking for are given name and surname, or perhaps family name. (“Middle name” might still be an appropriate term because that fills a more specific role.) But by using these more semantically useful terms, you include orders of magnitude more names in your normalization scheme. More importantly, by acknowledging the roles of the different parts of a name, you’ll come to realize that there are many other types of name, such as:

If your application does have a legitimate need to normalize names, for example, to interoperate with third-party databases, or to fulfill some regulatory requirement:

  • When you refer to a user of the system, always allow them to customize how their name is presented. Give them the benefit of the doubt. If you’re concerned about users abusing this display-name system to insult other users, it's understandable that you may need to moderate that a little. But there's no reason to ever moderate or regulate how a user's name is displayed to themselves. You can start to address offensive names by allowing other users to set nicknames for them. Only as a last resort, allow other users to report their name as not-actually-their-name, abusive or rude; if you do that, you have to investigate those reports. Let users affirm other users’ names, too, and verify reports: if someone attracts a million fake troll accounts, but all their friends affirm that their name is correct, you should be able to detect that. Don’t check government IDs in order to do this; they’re not relevant.
  • Allow the user to enter their normalized name as a series of names with classifiers attached to each one. In other words, like this:
  • Keep in mind that spaces are valid in any of these names. Many people have multi-word first names, middle names, or last names, and it can matter how you classify them. For one example that should resonate with readers of this blog, it’s “Guido” “van Rossum”, not “Guido” “Van” “Rossum”. It is definitely not “Guido” “Vanrossum”.
  • So is other punctuation. Even dashes. Even apostrophes. Especially apostrophes, you insensitive clod. Literally ten billion people whose surnames start with “O’” live in Ireland and they do not care about your broken database input security practices.
  • Allow for the user to have multiple names with classifiers attached to each one: “legal name in China”, “stage name”, “name on passport”, “maiden name”, etc. Keep in mind that more than one name for a given person may simultaneously accurate for a certain audience and legally valid. They can even be legally valid in the same context: many people have social security cards, birth certificates, driver’s licenses and passports with different names on them; sometimes due to a clerical error, sometimes due to the way different systems work. If your goal is to match up with those systems, especially more than one of them, you need to account for that possibility.

If you’re a programmer and you’re balking at this complexity: good. Remember that for most systems, sticking with the first option - treating users’ names as totally opaque text - is probably your best bet. You probably don’t need to know the structure of the user’s name for most purposes.

by Glyph at August 14, 2016 12:48 AM

August 12, 2016

Glyph Lefkowitz

The One Python Library Everyone Needs

Do you write programs in Python? You should be using attrs.

Why, you ask? Don’t ask. Just use it.

Okay, fine. Let me back up.

I love Python; it’s been my primary programming language for 10+ years and despite a number of interesting developments in the interim I have no plans to switch to anything else.

But Python is not without its problems. In some cases it encourages you to do the wrong thing. Particularly, there is a deeply unfortunate proliferation of class inheritance and the God-object anti-pattern in many libraries.

One cause for this might be that Python is a highly accessible language, so less experienced programmers make mistakes that they then have to live with forever.

But I think that perhaps a more significant reason is the fact that Python sometimes punishes you for trying to do the right thing.

The “right thing” in the context of object design is to make lots of small, self-contained classes that do one thing and do it well. For example, if you notice your object is starting to accrue a lot of private methods, perhaps you should be making those “public”1 methods of a private attribute. But if it’s tedious to do that, you probably won’t bother.

Another place you probably should be defining an object is when you have a bag of related data that needs its relationships, invariants, and behavior explained. Python makes it soooo easy to just define a tuple or a list. The first couple of times you type host, port = ... instead of address = ... it doesn’t seem like a big deal, but then soon enough you’re typing [(family, socktype, proto, canonname, sockaddr)] = ... everywhere and your life is filled with regret. That is, if you’re lucky. If you’re not lucky, you’re just maintaining code that does something like values[0][7][4][HOSTNAME][“canonical”] and your life is filled with garden-variety pain rather than the more complex and nuanced emotion of regret.

This raises the question: is it tedious to make a class in Python? Let’s look at a simple data structure: a 3-dimensional cartesian coordinate. It starts off simply enough:

class Point3D(object):

So far so good. We’ve got a 3 dimensional point. What next?

class Point3D(object):
    def __init__(self, x, y, z):

Well, that’s a bit unfortunate. I just want a holder for a little bit of data, and I’ve already had to override a special method from the Python runtime with an internal naming convention? Not too bad, I suppose; all programming is weird symbols after a fashion.

At least I see my attribute names in there, that makes sense.

class Point3D(object):
    def __init__(self, x, y, z):

I already said I wanted an x, but now I have to assign it as an attribute...

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x

... to x? Uh, obviously ...

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

... and now I have to do that once for every attribute, so this actually scales poorly? I have to type every attribute name 3 times?!?

Oh well. At least I’m done now.

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):

Wait what do you mean I’m not done.

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))

Oh come on. So I have to type every attribute name 5 times, if I want to be able to see what the heck this thing is when I’m debugging, which a tuple would have given me for free?!?!?

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)

7 times?!?!?!?

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

9 times?!?!?!?!?

from functools import total_ordering
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

Okay, whew - 2 more lines of code isn’t great, but now at least we don’t have to define all the other comparison methods. But now we’re done, right?

from unittest import TestCase
class Point3DTests(TestCase):

You know what? I’m done. 20 lines of code so far and we don’t even have a class that does anything; the hard part of this problem was supposed to be the quaternion solver, not “make a data structure which can be printed and compared”. I’m all in on piles of undocumented garbage tuples, lists, and dictionaries it is; defining proper data structures well is way too hard in Python.2

namedtuple to the (not really) rescue

The standard library’s answer to this conundrum is namedtuple. While a valiant first draft (it bears many similarities to my own somewhat embarrassing and antiquated entry in this genre) namedtuple is unfortunately unsalvageable. It exports a huge amount of undesirable public functionality which would be a huge compatibility nightmare to maintain, and it doesn’t address half the problems that one runs into. A full enumeration of its shortcomings would be tedious, but a few of the highlights:

  • Its fields are accessable as numbered indexes whether you want them to be or not. Among other things, this means you can’t have private attributes, because they’re exposed via the apparently public __getitem__ interface.
  • It compares equal to a raw tuple of the same values, so it’s easy to get into bizarre type confusion, especially if you’re trying to use it to migrate away from using tuples and lists.
  • It’s a tuple, so it’s always immutable. Sort of.

As to that last point, either you can use it like this:

Point3D = namedtuple('Point3D', ['x', 'y', 'z'])

in which case it doesn’t look like a type in your code; simple syntax-analysis tools without special cases won’t recognize it as one. You can’t give it any other behaviors this way, since there’s nowhere to put a method. Not to mention the fact that you had to type the class’s name twice.

Alternately you can use inheritance and do this:

class Point3D(namedtuple('_Point3DBase', 'x y z'.split()])):

This gives you a place you can put methods, and a docstring, and generally have it look like a class, which it is... but in return you now have a weird internal name (which, by the way, is what shows up in the repr, not the class’s actual name). However, you’ve also silently made the attributes not listed here mutable, a strange side-effect of adding the class declaration; that is, unless you add __slots__ = 'x y z'.split() to the class body, and then we’re just back to typing every attribute name twice.

And this doesn’t even mention the fact that science has proven that you shouldn’t use inheritance.

So, namedtuple can be an improvement if it’s all you’ve got, but only in some cases, and it has its own weird baggage.

Enter The attr

So here’s where my favorite mandatory Python library comes in.

Let’s re-examine the problem above. How do I make Point3D with attrs?

import attr

Since this isn’t built into the language, we do have to have 2 lines of boilerplate to get us started: the import and the decorator saying we’re about to use it.

import attr
class Point3D(object):

Look, no inheritance! By using a class decorator, Point3D remains a Plain Old Python Class (albeit with some helpful double-underscore methods tacked on, as we’ll see momentarily).

import attr
class Point3D(object):
    x = attr.ib()

It has an attribute called x.

import attr
class Point3D(object):
    x = attr.ib()
    y = attr.ib()
    z = attr.ib()

And one called y and one called z and we’re done.

We’re done? Wait. What about a nice string representation?

>>> Point3D(1, 2, 3)
Point3D(x=1, y=2, z=3)


>>> Point3D(1, 2, 3) == Point3D(1, 2, 3)
>>> Point3D(3, 2, 1) == Point3D(1, 2, 3)
>>> Point3D(3, 2, 3) > Point3D(1, 2, 3)

Okay sure but what if I want to extract the data defined in explicit attributes in a format appropriate for JSON serialization?

>>> attr.asdict(Point3D(1, 2, 3))
{'y': 2, 'x': 1, 'z': 3}

Maybe that last one was a little on the nose. But nevertheless, it’s one of many things that becomes easier because attrs lets you declare the fields on your class, along with lots of potentially interesting metadata about them, and then get that metadata back out.

>>> import pprint
>>> pprint.pprint(attr.fields(Point3D))
(Attribute(name='x', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='y', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='z', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

I am not going to dive into every interesting feature of attrs here; you can read the documentation for that. Plus, it’s well-maintained, so new goodies show up every so often and I might miss something important. But attrs does a few key things that, once you have them, you realize that Python was sorely missing before:

  1. It lets you define types concisely, as opposed to the normally quite verbose manual def __init__.... Types without typing.
  2. It lets you say what you mean directly with a declaration rather than expressing it in a roundabout imperative recipe. Instead of “I have a type, it’s called MyType, it has a constructor, in the constructor I assign the property ‘A’ to the parameter ‘A’ (and so on)”, you say “I have a type, it’s called MyType, it has an attribute called a”, and behavior is derived from that fact, rather than having to later guess about the fact by reverse engineering it from behavior (for example, running dir on an instance, or looking at self.__class__.__dict__).
  3. It provides useful default behavior, as opposed to Python’s sometimes-useful but often-backwards defaults.
  4. It adds a place for you to put a more rigorous implementation later, while starting out simple.

Let’s explore that last point.

Progressive Enhancement

While I’m not going to talk about every feature, I’d be remiss if I didn’t mention a few of them. As you can see from those mile-long repr()s for Attribute above, there are a number of interesting ones.

For example: you can validate attributes when they are passed into an @attr.s-ified class. Our Point3D, for example, should probably contain numbers. For simplicity’s sake, we could say that that means instances of float, like so:

import attr
from attr.validators import instance_of
class Point3D(object):
    x = attr.ib(validator=instance_of(float))
    y = attr.ib(validator=instance_of(float))
    z = attr.ib(validator=instance_of(float))

The fact that we were using attrs means we have a place to put this extra validation: we can just add type information to each attribute as we need it. Some of these facilities let us avoid other common mistakes. For example, this is a popular “spot the bug” Python interview question:

class Bag:
    def __init__(self, contents=[]):
        self._contents = contents
    def add(self, something):
    def get(self):
        return self._contents[:]

Fixing it, of course, becomes this:

class Bag:
    def __init__(self, contents=None):
        if contents is None:
            contents = []
        self._contents = contents

adding two extra lines of code.

contents inadvertently becomes a global varible here, making all Bag objects not provided with a different list share the same list. With attrs this instead becomes:

class Bag:
    _contents = attr.ib(default=attr.Factory(list))
    def add(self, something):
    def get(self):
        return self._contents[:]

There are several other features that attrs provides you with opportunities to make your classes both more convenient and more correct. Another great example? If you want to be strict about extraneous attributes on your objects (or more memory-efficient on CPython), you can just pass slots=True at the class level - e.g. @attr.s(slots=True) - to automatically turn your existing attrs declarations a matching __slots__ attribute. All of these handy features allow you to make better and more powerful use of your attr.ib() declarations.

The Python Of The Future

Some people are excited about eventually being able to program in Python 3 everywhere. What I’m looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I’ve seen it used in.

Give it a try: you may find yourself surprised at places where you’ll now use a tidily explained class, where previously you might have used a sparsely-documented tuple, list, or a dict, and endure the occasional confusion from co-maintainers. Now that it’s so easy to have structured types that clearly point in the direction of their purpose (in their __repr__, in their __doc__, or even just in the names of their attributes), you might find you’ll use a lot more of them. Your code will be better for it; I know mine has been.

  1. Scare quotes here because the attributes aren’t meaningfully exposed to the caller, they’re just named publicly. This pattern, getting rid of private methods entirely and having only private attributes, probably deserves its own post... 

  2. And we hadn’t even gotten to the really exciting stuff yet: type validation on construction, default mutable values... 

by Glyph at August 12, 2016 09:47 AM

Remember that thing I said in my pycon talk about native packaging being the main thing to worry about, and single-file binaries being at best a stepping stone to that and at worst a bit of a red herring? You don’t have to take it from me. From the authors of a widely-distributed command-line application that was rewritten from Python into Go specifically for easier distribution, and then rewritten in Python:

... [the] majority of people prefer native packages so distributing precompiled binaries wasn’t a big win for this type of project1 ...

I don’t want to say “I told you so”, but... no. Wait a second. That is exactly what I want to do. That is what I am doing.

I told you so.

by Glyph at August 12, 2016 03:37 AM

August 11, 2016

Glyph Lefkowitz

Hello lazyweb,

I want to run some “legacy” software (Trac, specifically) on a Swarm cluster. The files that it needs to store are mostly effectively write-once (it’s the attachments database) but may need to be deleted (spammers and purveyors of malware occasionally try to upload things for spamming or C&C) so while mutability is necessary, there’s a very low risk of any write contention.

I can’t use a networked filesystem, or any volume drivers, so no easy-mode solutions. Basically I want to be able to deploy this on any swarm cluster, so no cheating and fiddling with the host.

Is there any software that I can easily run as a daemon that runs in the background, synchronizing the directory of a data volume between N containers where N is the number of hosts in my cluster?

I found this but it strikes me as ... somehow wrong ... to use that as a critical part of my data-center infrastructure. Maybe it would actually be a good start? But in addition to not being really designed for this task, it’s also not open source, which makes me a little nervous. This list, or even this smaller one is huge and bewildering. So I was hoping you could drop me a line if you’ve got an idea what I could use for this.

by Glyph at August 11, 2016 05:28 AM

August 08, 2016

Glyph Lefkowitz

I like keeping a comprehensive an accurate addressbook that includes all past email addresses for my contacts, including those which are no longer valid. I do this because I want to be able to see conversations stretching back over the years as originating from that person.

Unfortunately this causes problems when sending mail sometimes. On macOS, at least as of El Capitan, neither the Mail application nor the Contacts application have any mechanism for indicating preference-order of email addresses that I’ve been able to find. Compounding this annoyance, when completing a recipient’s address based on their name, it displays all email addresses for a contact without showing their label, which means even if I label one “preferred” or “USE THIS ONE NOW”, or “zzz don’t use this hasn’t worked since 2005”, I can’t tell when I’m sending a message.

But it seems as though it defaults to sending messages to the most recent outgoing address for that contact that it can see in an email. For people I send email to regularly to this is easy enough. For people who I’m aware have changed their email address, but where I don’t actually want to send them a message, I think I figured out a little hack that makes it work: make a new folder called “Preferred Addresses Hack” (or something suitably), compose a new message addressed to the correct address, then drag the message out of drafts into the folder; since it has a recent date and is addressed to the right person, will index it and auto-complete the correct address in the future.

However, since the previous behavior appeared somewhat non-deterministic, I might be tricking myself into believing that this hack worked. If you can confirm it, I’d appreciate it if you would let me know.

by Glyph at August 08, 2016 11:24 PM

Itamar Turner-Trauring

Why living below your means can help you find a better job

Sooner or later you're going to have to find a new job. Maybe you'll decide you can't take another day of the crap your boss is putting you through. Maybe you'll get laid off, or maybe you'll just decide to move to a new city.

Whatever the reason, when you're looking for a new job one of the keys to success is the ability to be choosy. If you need a job right now to pay your bills that means you've got no negotiating leverage. And there's no reason to think the first job offer you get will be the best job for you; maybe it'll be the second offer, or the tenth.

It's true that having in-demand skills or being great at interviews will help you get more and better offers. But interviewing for jobs always takes time... and if you can't afford to go without income for long then you won't be able to pick and choose between offers.

How do you make sure you don't need to find a new job immediately and that you can be choosy about which job you take? By living below your means.

Saving for a rainy day

You're going to have to find a new job someday but you should start preparing right now. By the time you realize you need a new job it may be too late. How do you prepare? By saving money in an easy to access way, an emergency stash that will pay for your expenses while you have no income. Cash in the bank is a fine way to do this since the goal is not to maximize returns, the goal is to have money available when you need it.

Let's say you need to spend 100 Gold Pieces a month to sustain your current lifestyle. And you decide you want 4 months of expenses saved just in case you lose your job during a recession, when jobs will take longer to find. That means you need 400GP savings in the bank.

If your expenses are 100GP and you have no savings that suggests your take-home pay is also 100GP (if you're spending more than you make better fix that first!). Increasing pay is hard to do quickly, so you need to reduce your spending temporarily until you have those 400GP saved. At that point you can go back to spending your full paycheck with the knowledge that you have money saved for a rainy day.

But you can do better.

Living below your means

If you're always spending less than your income you get a double benefit: you're saving more and it takes you longer to exhaust your savings. To see that we can look at two two scenarios, one where you permanently reduce your spending to 80GP a month and another where you permanently reduce it to 50GP a month.

Scenario 1: You have 100GP take-home pay, 80GP expenses. You're saving 20GP a month, so it will take you 20 months to save 400GP. Since your monthly expenses are 80GP this means you can now go 400/80 = 5 months without any pay.

Scenario 2: You have 100GP take-home pay, 50GP expenses. You're saving 50GP a month, so it will take you 8 months to save 400GP. Since your monthly expenses are 50GP this means you can now go 400/50 = 8 months without any pay.

As you can see, the lower your expenses the better off you are. At 80GP/month expenses it takes you 20 months to save 5 months' worth of expenses. At 50GP/month expenses it only takes you 8 months to save 8 months' worth of expenses. Reducing your expenses allows you to save faster and makes your money last longer!

The longer your money lasts the more leverage you have during a job search: you can walk away from a bad job offer and have an easier time negotiating a better offer. And you also have the option of taking a lower-paid job if that job is attractive enough in other ways. Finally, a permanently reduced cost of living also means that over time you are less and less reliant on your job as a source of income.

Reduce your expenses today!

To prepare for the day when you need to look for a job you should reduce expenses temporarily until you've saved enough to pay for a few months' living expenses. Once you've done that you'll have a sense of whether that lower level of expenses works for you; there's a pretty good chance you'll be just as happy at 90GP or 80GP as at 100GP.

In that case you should permanently reduce your expenses rather than going back to 100GP a month. You'll have more money in the bank, you'll have money you can invest for the long term, and the money you have saved will last you longer when you eventually have to look for a new job.

One reason you might want to look for a job is to find one with better hours than your current one. And having money in the bank is also useful when you're negotiating for a shorter workweek at your current job. If you want to learn more check out The Programmer's Guide to a Sane Workweek.

August 08, 2016 04:00 AM

August 03, 2016

Itamar Turner-Trauring

Why lack of confidence can make you a better programmer

What if you're not a superstar programmer? What if other people work longer than you, or harder than you, or have built better projects than you? Can you succeed without self-confidence? I believe you can, and moreover I feel that lack of confidence can actually make you a better programmer.

The problem with self-confidence

It's easy to think that self-confidence is far more useful than lack of confidence. Self-confidence will get you to try new things and can convince others of your worth. Self-confidence seems self-evidently worthwhile: if it isn't worthwhile, why is that self-confident person so confident?

But in fact unrealistic self-confidence can be quite harmful. Software is usually far more complex than we believe it to be, far harder to get right than we think. And if we're self-confident we may think our software works even when it doesn't.

When I was younger I suffered from the problem of too much self-confidence. I wrote software that didn't work, or was built badly... and I only realized its flaws after the fact, when it was too late. Software that crashed at 4AM, patches to open source software that never worked in the first place, failed projects I learned nothing from (at the time)... it's a long list.

I finally became a better programmer when I learned to doubt myself, to doubt the software that I wrote.

Why good programmers lack confidence in themselves

Why does doubting yourself, lacking confidence in yourself, make you a better a programmer?

When we write software we're pretty much always operating beyond the point of complexity where we can fit the whole program in our mind. That means you always have to doubt your ability to catch all the problems, to come up with the best solutions.

And so we get peer feedback, write tests, and get code reviews, all in the hopes of overcoming our inevitable failures:

  1. "My design isn't great," you think. So you talk it over with a colleague and together you come up with an even better idea.
  2. "My code might have bugs," you say. So you write tests, and catch any bugs and prevent future bugs.
  3. "My code might still have UI problems," you imagine. So you manually test your web application and check all the edge cases.
  4. "I might have forgotten something," you consider. So you get a code review, and get suggestions for improving your code.

These techniques also have the great benefit of teaching you to be a better programmer, increasing the complexity you can work with. You'll still need tests and code reviews and all the rest; our capacity for understanding is always finite and always stretched.

You can suffer from too much lack of confidence: you should not judge yourself harshly. But we're all human, and therefore all fallible. If you want your creations to succeed you should embrace lack of confidence: test your code, look for things you've missed, get help from others. And if you head over to Software Clown you can discover the many mistakes caused by my younger self's over-confidence, and the lessons you can learn from those mistakes.

August 03, 2016 04:00 AM

August 02, 2016

Itamar Turner-Trauring

Why living below your means can help you find a better job

Sooner or later you're going to have to find a new job. Maybe you'll decide you can't take another day of the crap your boss is putting you through. Maybe you'll get laid off, or maybe you'll just decide to move to a new city.

Whatever the reason, when you're looking for a new job one of the keys to success is the ability to be choosy. If you need a job right now to pay your bills that means you've got no negotiating leverage. And there's no reason to think the first job offer you get will be the best job for you; maybe it'll be the second offer, or the tenth.

It's true that having in-demand skills or being great at interviews will help you get more and better offers. But interviewing for jobs always takes time... and if you can't afford to go without income for long then you won't be able to pick and choose between offers.

How do you make sure you don't need to find a new job immediately and that you can be choosy about which job you take? By living below your means.

Saving for a rainy day

You're going to have to find a new job someday but you should start preparing right now. By the time you realize you need a new job it may be too late. How do you prepare? By saving money in an easy to access way, an emergency stash that will pay for your expenses while you have no income. Cash in the bank is a fine way to do this since the goal is not to maximize returns, the goal is to have money available when you need it.

Let's say you need to spend 100 Gold Pieces a month to sustain your current lifestyle. And you decide you want 4 months of expenses saved just in case you lose your job during a recession, when jobs will take longer to find. That means you need 400GP savings in the bank.

If your expenses are 100GP and you have no savings that suggests your take-home pay is also 100GP (if you're spending more than you make better fix that first!). Increasing pay is hard to do quickly, so you need to reduce your spending temporarily until you have those 400GP saved. At that point you can go back to spending your full paycheck with the knowledge that you have money saved for a rainy day.

But you can do better.

Living below your means

If you're always spending less than your income you get a double benefit: you're saving more and it takes you longer to exhaust your savings. To see that we can look at two two scenarios, one where you permanently reduce your spending to 80GP a month and another where you permanently reduce it to 50GP a month.

Scenario 1: You have 100GP take-home pay, 80GP expenses. You're saving 20GP a month, so it will take you 20 months to save 400GP. Since your monthly expenses are 80GP this means you can now go 400/80 = 5 months without any pay.

Scenario 2: You have 100GP take-home pay, 50GP expenses. You're saving 50GP a month, so it will take you 8 months to save 400GP. Since your monthly expenses are 50GP this means you can now go 400/50 = 8 months without any pay.

As you can see, the lower your expenses the better off you are. At 80GP/month expenses it takes you 20 months to save 5 months' worth of expenses. At 50GP/month expenses it only takes you 8 months to save 8 months' worth of expenses. Reducing your expenses allows you to save faster and makes your money last longer!

The longer your money lasts the more leverage you have during a job search: you can walk away from a bad job offer and have an easier time negotiating a better offer. And you also have the option of taking a lower-paid job if that job is attractive enough in other ways. Finally, a permanently reduced cost of living also means that over time you are less and less reliant on your job as a source of income.

Reduce your expenses today!

To prepare for the day when you need to look for a job you should reduce expenses temporarily until you've saved enough to pay for a few months' living expenses. Once you've done that you'll have a sense of whether that lower level of expenses works for you; there's a pretty good chance you'll be just as happy at 90GP or 80GP as at 100GP.

In that case you should permanently reduce your expenses rather than going back to 100GP a month. You'll have more money in the bank, you'll have money you can invest for the long term, and the money you have saved will last you longer when you eventually have to look for a new job.

August 02, 2016 04:00 AM

August 01, 2016

Hynek Schlawack

Please Fix Your Decorators

If your Python decorator unintentionally changes the signatures of my callables or doesn’t work with class methods, it’s broken and should be fixed. Sadly most decorators are broken because the web is full of bad advice.

by Hynek Schlawack ( at August 01, 2016 12:00 PM

July 31, 2016

Hynek Schlawack

Testing & Packaging

How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!).

by Hynek Schlawack ( at July 31, 2016 07:00 PM

Itamar Turner-Trauring

Write test doubles you can trust using verified fakes

When you're writing tests for your code you often encounter some complex object that impedes your testing. Let's say some of your code uses a client library to talk to Twitter's API. You don't want your tests to have to talk to Twitter's servers: your tests would be slower, flakier, harder to setup, and require a working network connection.

The usual solution is to write a test double of some sort, an object that pretends to be the Twitter client but is easier to use in a test. Terminology varies slightly across programming communities, but you're probably going to make a fake, mock or stub Twitter client for use in your tests.

As you can tell from names like "fake" and "mock", there's a problem here: if your tests are running against a fake object, how do you know they will actually work against the real thing? You will often find the fake is insufficiently realistic, or just plain wrong, which means your tests using it were a waste of time. I've personally had Python mock objects cause completely broken code to pass its tests, because the mock was a bit too much of a mockup.

There's a better kind of test double, though, known as a "verified fake". The key point about a verified fake is that, unlike regular fakes, you've actually proven it has the same behavior as the real thing. That means that when you use the verified fake for a Twitter client in a test you can trust that the fake will behave the same as real Twitter client. And that means you can trust your tests are not being misled by a broken fake.


July 31, 2016 04:00 AM

July 29, 2016

Glyph Lefkowitz

Don’t Trust Sourceforge, Ever

If you use a computer and you use the Internet, chances are you’ll eventually find some software that, for whatever reason, is still hosted on Sourceforge. In case you’re not familiar with it, Sourceforge is a publicly-available malware vector that also sometimes contains useful open source binary downloads, especially for Windows.

In addition to injecting malware into their downloads (a practice they claim, hopefully truthfully, to have stopped), Sourceforge also presents an initial download page over HTTPS, then redirects the user to HTTP for the download itself, snatching defeat from the jaws of victory. This is fantastically irresponsible, especially for a site offering un-sandboxed binaries for download, especially in the era of Let’s Encrypt where getting a TLS certificate takes approximately thirty seconds and exactly zero dollars.

So: if you can possibly find your downloads anywhere else, go there.

But, rarely, you will find yourself at the mercy of whatever responsible stewards1 are still operating Sourceforge if you want to get access to some useful software. As it happens, there is a loophole that will let you authenticate the binaries that you download from them so you won’t be left vulnerable to an evil barista: their “file release system”, the thing you use to upload your projects, will allow you to download other projects as well.

To use it, first, make yourself a sourceforge account. You may need to create a dummy project as well. Sourceforge maintains an HTTPS-accessible list of key fingerprints for all the SSH servers that they operate, so you can verify the public key below.

Then you’ll need to connect to their upload server over SFTP, and go to the path /home/frs/project/<the project’s name>/.../ to get the file.

I have written a little Python script2 that automates the translation of a Sourceforge file-browser download URL, one that you can get if you right-click on a download in the “files” section of a project’s website, and runs the relevant scp command to retrieve the file for you. This isn’t on PyPI or anything, and I’m not putting any effort into polishing it further; the best possible outcome of this blog post is that it immediately stops being necessary.

  1. Are you one of those people? I would prefer to be lauding your legacy of decades of valuable contributions to the open source community instead of ridiculing your dangerous incompetence, but repeated bug reports and support emails have gone unanswered. Please get in touch so we can discuss this. 

  2. Code:

    #!/usr/bin/env python2
    import sys
    import os
    sfuri = sys.argv[1]
    # for example,
    import re
    matched = re.match(
    if not matched:
        sys.stderr.write("Not a SourceForge download link.\n")
    project, path = matched.groups()
    sftppath = "/home/frs/project/{project}/{path}".format(project=project, path=path)
    def knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "rb"
        ) as read_known_hosts:
            data ="\n")
            for line in data:
                if '' in line.split()[0]:
                    return True
        return False
    sfkey = """ ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2uifHZbNexw6cXbyg1JnzDitL5VhYs0E65Hk/tLAPmcmm5GuiGeUoI/B0eUSNFsbqzwgwrttjnzKMKiGLN5CWVmlN1IXGGAfLYsQwK6wAu7kYFzkqP4jcwc5Jr9UPRpJdYIK733tSEmzab4qc5Oq8izKQKIaxXNe7FgmL15HjSpatFt9w/ot/CHS78FUAr3j3RwekHCm/jhPeqhlMAgC+jUgNJbFt3DlhDaRMa0NYamVzmX8D47rtmBbEDU3ld6AezWBPUR5Lh7ODOwlfVI58NAf/aYNlmvl2TZiauBCTa7OPYSyXJnIPbQXg6YQlDknNCr0K769EjeIlAfY87Z4tw==
    if not knows_about_web_sf_net():
        with open(
                os.path.expanduser("~/.ssh/known_hosts"), "ab"
        ) as append_known_hosts:
    cmd = "scp{sftppath} .".format(sftppath=sftppath)

by Glyph at July 29, 2016 06:06 AM

July 06, 2016

Twisted Matrix Laboratories

Twisted 16.3 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.3.0.

The highlights of this release are:
  • The Git migration has happened, so we've updated our development documentation to match. We're now trialling accepting pull requests at, so if you've ever wanted an excuse to contribute, now's the chance!
  • In our steady shedding of baggage, twisted.spread.ui, twisted.manhole (not to be confused with twisted.conch.manhole!), and a bunch of old and deprecated stuff from twisted.python.reflect and twisted.protocols.sip have been removed.
  • twisted.web's HTTP server now handles pipelined requests better -- it used to try and process them in parallel, but this was fraught with problems and now it processes them in series, which is less surprising to code that expects the Request's transport to not be buffered (e.g. WebSockets). There is also a bugfix for HTTP timeouts not working in 16.2.
  • Twisted now has HTTP/2 support in its web server! This is currently not available by default -- you will need to install hyper-h2, which is available in the [h2] setuptools extras. If you want to play around with it "pip install twisted[h2]" (on Python 2, a bugfix release will make it available on Python 3) .
  • 53 tickets closed overall, including cleanups that move us closer to a total Python 3 port.
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by HawkOwl ( at July 06, 2016 12:50 PM

June 26, 2016

Itamar Turner-Trauring

Code faster by typing less

Sometimes it's disappointing to see how much code you've written in a given day, so few lines for so much effort. There are big picture techniques you can use to do better, like planning ahead so you write the most useful lines of code. But in the end you still want to be producing as much code as possible given the situation. One great way to do that: type less.

If you're only producing a few lines of code in a day, where did all the rest of your time at the keyboard go? As a programmer you spend a lot of time typing, after all. Here's some of what I spend time on:

  • Opening and closing files.
  • Checking in code, merging code, reverting code.
  • Searching and browsing code to figure out how things work and where a bug might be coming from.
  • Stepping through code in a debugger.

If you pay attention you will find you spend a significant amount of time on these sort of activities. And there's a pretty good chance you're also wasting time doing them. Consider the following transcript:

$ git add
$ git add
$ git add --interactive
$ git diff | less
$ git commit -m "I wrote some code, hurrah."
$ git push

You're typing the same thing over and over again! Every time you open a file: lots of typing. Every time you find a class definition manually: more typing.

What do programmers do when they encounter manual, repetitive work? Automate! Pretty much every IDE or text editor for programmers has built-in features or 3rd party packages to automate and reduce the amount of time you waste on these manual tasks. That means Emacs, Vim, Sublime, Eclipse or whatever... as long as it's aimed at programmers, is extendable and has a large user base you're probably going to find everything you need.

As an example, on a day to day basis I use the following Emacs add-ons:

  • Elpy for Python editing, which lets me jump to the definition of method or class with a single keystroke and then jump right back with another single keystroke. It also highlights (some) errors in the code.
  • Magit, which changes using git from painful and repetitive to pleasant and fast. The example above would involve typing a a TAB <arrow key down> Ctrl-space <arrow key down> a c I wrote some code, hurrah. Ctrl-C Ctrl-C. That's a little obscure if you've never used it, but trust me: it's a really good UI.
  • Projectile, which let's me jump to any file in a git or other VCS repository in three keystrokes plus a handful of characters from the filename.
  • undo-tree-mode, which makes my undo history easily, quickly and visually accessible.

Each of these tools saves just a little bit of time... but I use them over and over again every single day. Small savings add up to big ones.

Here's what you should do: every month or two allocate a few hours to investigating and learning new features or tools for your text editor. Whatever your editor there are tools available, and sometimes even just keyboard shortcuts, that will speed up your development. You want to limit how much time you spend on this because too much time spent will counteract any savings. And you want to do this on an ongoing basis because there's always new tools and shortcuts to discover. Over time you will find yourself spending less time typing useless, repetitive text... and with more time available to write code.

June 26, 2016 04:00 AM

June 15, 2016

Itamar Turner-Trauring

Writing for software engineers: here's the best book I've found

Where some software engineers can only see the immediate path ahead, more experienced software engineers keep the destination in mind and navigate accordingly. Sometimes the narrow road is the path to righteousness, and sometimes the broad road is the path to wickedness, or at least to broken deadlines and unimplemented requirements. How do you learn to see further, beyond the immediate next steps, beyond the stated requirements? One method you can use is writing.

Writing as thinking can help you find better solutions for easy problems, and solve problems that seem impossible. You write a "design document", where "design" is a verb not a noun, an action not a summary. You write down your assumptions, your requirements, your ideas, the presumed tradeoffs and try to nail down the best solution. Unlike vague ideas in your head, written text can be re-read and sharpened. By putting words to paper you force yourself to clarify your ideas, focus your definitions, make your assumptions more explicit. You can also share written documents with others for feedback and evaluation; since software development is a group effort that often means a group of people will be doing the thinking.

Once you've found a better way forward you need to convince others that your chosen path makes sense. That means an additional kind of writing, writing for persuasion and action: explaining your design, convincing the stakeholders.

How do you learn how to write? If you're just starting your journey as a writer you need to find the right guide; there are many kinds of writing with different goals and different styles. Some books obsess over grammar, others focus on academic writing or popular non-fiction. Writing as a programmer is writing in the context of an organization, writing that needs to propel you forward, not merely educate or entertain. This is where Flower and Ackerman's "Writers at Work" will provide a wise and knowledgeable guide. (There are a dozen other books with the same name; make sure you get the right one!)

Linda Flower is one the originators of the cognitive process theory of writing, and so is well suited to writing a book that covers not just surface style, but the process of adapting writing to one's situation. This book won't teach you how to write luminous prose or craft a brilliant academic argument. The first example scenario the book covers is of someone who has joined the Request For Proposal (RFP) group at a software company. The ten page scenario talks about the RFP process in general, how RFPs are usually constructed within the organization, the team in charge of creating RFPs and their various skills and motivations, the fact RFP's may end up in competitors' hands... What this book will teach you is how do the writing you do in your job: complex, action oriented, utilitarian.

"Writers at Work" focuses on process, context, and writing as a form of organizational action, and helps you understand how to approach your task. It will help you answer these questions and more:

  • What context are you writing in? The book guides you through the process of discovering the audience, rhetorical situation, discourse community, goals, problems and agendas. Software development always has a stated reason, but often there are deeper reasons why you're working on a particular project, specific ways to communicate with different people (engineers, management, customers), differing agendas and multiple audiences. The book will help you define the situation you are writing in, which will help you figure out what you're trying to build and then convince others when you've found the solution.
  • How do you define the problem you are trying to solve? The book points out the helpful technique of operationalization, defining the problem in a way that implies an action to solve it.
  • Structuring your writing: if you have a design, how do you communicate it in a convincing way? How do you explain it to different audiences with different levels of knowledge?
  • How do you test your writing? Writing can require testing just as much as software does.

For reasons I don't really understand this practical, immensely useful textbook is out of print, and (for now) you can get copies for cheap. Buy a copy, read it, and start writing!

June 15, 2016 04:00 AM

June 07, 2016

Moshe Zadka

__name__ == __main__ considered harmful

Every single Python tutorial shows the pattern of

# define functions, classes,
# etc.

if __name__ == '__main__':

This is not a good pattern. If your code is not going to be in a Python module, there is no reason not to unconditionally call ‘main()’ at the bottom. So this code will only be used in modules — where it leads to unpredictable effects. If this module is imported as ‘foo’, then the identity of ‘foo.something’ and ‘__main__.something’ will be different, even though they share code.

This leads to hilarious effects like @cache decorators not doing what they are supposed to, parallel registry lists and all kinds of other issues. Hilarious unless you spend a couple of hours debugging why ‘isinstance()’ is giving incorrect results.

If you want to write a main module, make sure it cannot be imported. In this case, reversed stupidity is intelligence — just reverse the idiom:

# at the top
if __name__ != '__main__':
    raise ImportError("this module cannot be imported")

This, of course, will mean that this module cannot be unit tested: therefore, any non-trivial code should go in a different module that this one imports. Because of this, it is easy to gravitate towards a package. In that case, put the code above in a module called ‘‘. This will lead to the following layout for a simple package:

                 # Empty
                 if __name__ != '__main__':
                     raise ImportError("this module cannot be imported")
                 from PACKAGE_NAME import api
                 # Actual code
                 import unittest
                 # Testing code

And then, when executing:

$ python -m PACKAGE_NAME arg1 arg2 arg3

This will work in any environment where the package is on the sys.path: in particular, in any virtualenv where it was pip-installed. Unless a short command-line is important, it allows skipping over creating a console script in completely, and letting “python -m” be the official CLI. Since pex supports setting a module as an entry point, if this tool needs to be deployed in other environment, it is easy to package into a tool that will execute the script:

$ pex . --entry-point SOME_PACKAGE --output-file toolname

by moshez at June 07, 2016 05:40 AM