Planet Twisted

May 19, 2016

Hynek Schlawack

Conditional Python Dependencies

Since the inception of wheels that install Python packages without executing arbitrary code, we need a static way to encode conditional dependencies for our packages. Thanks to PEP 508 we do have a blessed way but sadly the prevalence of old setuptools versions makes it a minefield to use.

by Hynek Schlawack (hs@ox.cx) at May 19, 2016 12:00 AM

May 18, 2016

Twisted Matrix Laboratories

Twisted 16.2 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.2!

Just in time for PyCon US, this release brings a few headlining features (like the haproxy endpoint) and the continuation of the modernisation of the codebase. More Python 3, less deprecated code, what's not to like?
  • twisted.protocols.haproxy.proxyEndpoint, a wrapper endpoint that gives some extra information to the wrapped protocols passed by haproxy;
  • Migration of twistd and other twisted.application.app users to the new logging system (twisted.logger);
  • Porting of parts of Twisted Names' server to Python 3;
  • The removal of the very old MSN client code and the deprecation of the unmaintained ICQ/OSCAR client code;
  • More cleanups in Conch in preparation for a Python 3 port and cleanups in HTTP code in preparation for HTTP/2 support;
  • Over thirty tickets overall closed since 16.1.
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by HawkOwl (noreply@blogger.com) at May 18, 2016 05:10 PM

May 06, 2016

Moshe Zadka

Forking Skip-level Dependencies

I have recently found I explain this concept over and over to people, so I want to have a reference.

Most modern languages comes with a “dependency manager” of sorts that helps manage the 3rd party libraries a given project uses. Rust has Cargo, Node.js has npm, Python has pip and so on. All of these do some things well and some things poorly. But one thing that can be done (well or poorly) is “support forking skip-level dependencies”.

In order to explain what I mean, here as an example: our project is PlanetLocator, a program to tell the user which direction they should face to see a planet. It depends on a library called Astronomy. Astronomy depends on Physics. Physics depends on Math.

  • PlanetLocator
    • Astronomy
      • Physics
        • Math

PlanetLocator is a SaaS, running on our servers. One day, we find Math has a critical bug, leading to a remote execution vulnerability. This is pretty bad, because it can be triggered via our application by simply asking PlanetLocator for the location of Mars at a specific date in the future. Luckily, the bug is simple — in Math’s definition of Pi, we need to add a couple of significant digits.

How easy is it to fix?

Well, assume PlanetLocator is written in Go, and not using any package manager. A typical import statement in PlanetLocator is

import “github.com/astronomy/astronomy”

A typical import statement in Astronomy is

import “github.com/physics/physics

..and so on.

We fork Math over to “github.com/planetlocator/math” and fix the vulnerability. Now we have to fork over physics to use the forked math, and astronomy to use the forked physics and finally, change all of our imports to import the forked astronomy — and Physics, Astronomy and PlanetLocator had no bugs!

Now assume, instead, we had used Python. In our requirements.txt file, we could put

git+https://github.com/planetlocator/math#egg=math

and voila! even though Physics’ “setup.py” said “install_requires=[‘math’]”, it will get our forked math.

When starting to use a new language/dependency manager, the first question to ask is: will it support me forking skip-level dependencies? Because every upstream maintainer is,  effectively, an absent-maintainer if rapid response is at stake (for any reason — I chose security above, but it might be beating the competition to a deadline, or fulfilling contractual obligations).

by moshez at May 06, 2016 04:17 AM

May 03, 2016

Glyph Lefkowitz

Letters To The Editor: Re: Email

Since I removed comments from this blog, I’ve been asking y’all to email me when you have feedback, with the promise that I’d publish the good bits. Today I’m making good on that for the first time, with this lovely missive from Adam Doherty:


I just wanted to say thank you. As someone who is never able to say no, your article on email struck a chord with me. I have had Gmail since the beginning, since the days of hoping for an invitation. And the day I received my invitation was the the last day my inbox was ever empty.

Prior to reading your article I had over 40,000 unread messages. It used to be a sort of running joke; I never delete anything. Realistically though was I ever going to do anything with them?

With 40,000 unread messages in your inbox, you start to miss messages that are actually important. Messages that must become tasks, tasks that must be completed.

Last night I took your advice; and that is saying something - most of the things I read via HN are just noise. This however spoke to me directly.

I archived everything older than two weeks, was down to 477 messages and kept pruning. So much of the email we get on a daily basis is also noise. Those messages took me half a second to hit archive and move on.

I went to bed with zero messages in my inbox, woke up with 21, archived 19, actioned 2 and then archived those.

Seriously, thank you so very much. I am unburdened.


First, I’d like to thank Adam for writing in. I really do appreciate the feedback.

Second, I wanted to post this here not in service of showcasing my awesomeness1, but rather to demonstrate that getting to the bottom of your email can have a profound effect on your state of mind. Even if it’s a running joke, even if you don’t think it’s stressing you out, there’s a good chance that, somewhere in the back of your mind, it is. After all, if you really don’t care, what’s stopping you from hitting select all / archive right now?

At the very least, if you did that, your mail app would load faster.


  1. although, let there be no doubt, I am awesome 

by Glyph at May 03, 2016 06:06 AM

April 27, 2016

Itamar Turner-Trauring

How you should choose which technology to learn next

Keeping up with the growing software ecosystem — new databases, new programming languages, new web frameworks — becomes harder and harder every year as more and more software is written. It is impossible to learn all existing technologies, let alone the new ones being released every day. If you want to learn another programming language you can choose from Dart, Swift, Go, Idris, Futhark, Ceylon, Zimbu, Elm, Elixir, Vala, OCaml, LiveScript, Oz, R, TypeScript, PureScript, Haskell, F#, Scala, Dylan, Squeak, Julia, CoffeeScript... and about a thousand more, if you're still awake. This stream of new technologies can be overwhelming, a constant worry that your skills are getting rusty and out of date.

Luckily you don't need to learn all technologies, and you are likely to use only a small subset during your tenure as a programmer. Instead your goal should be to maximize your return on investment: learn the most useful tools, with the least amount of effort. How then should you choose which technologies to learn?

Don't spend too much time on technologies which are either too close or too far from your current set of knowledge. If you are an expert on PostgreSQL then learning another relational database like MySQL won't teach you much. Your existing knowledge is transferable for the most part, and you'd have no trouble applying for a job requiring MySQL knowledge. On the other hand a technology that is too far from your current tools will be much more difficult to learn, e.g. switching from web development to real-time embedded devices.

Focus on technologies that can build on your existing knowledge while still being different enough to teach you something new. Learning these technologies provides multiple benefits:

  • Since you have some pre-existing knowledge you can learn them faster.
  • They can help you with your current job by giving you a broader but still relevant set of tools.
  • They can make it easier to expand the scope of a job search because they relate to your existing experience.

There are three ways you can build on your existing knowledge of tools and technologies:

  1. Branch out to nearby technologies: If you're a backend web developer you are interacting with a database, with networking via the HTTP protocol, with a browser running Javascript. You will end up knowing at least a little bit about these technologies, and you have some sense of how they interact with the technology you already know well. These are all great candidates for a new technology to learn next.
  2. Alternative solutions for a problem you understand: If you are an expert on the PostgreSQL database you might want to learn MongoDB. It's still a database, solving a problem whose parameters you already understand: how to store and search structured data. But the way MongoDB solves this problem is fundamentally different than PostgreSQL, which means you will learn a lot.
  3. Enhance your usage of existing tools: Tools for testing your existing technology stack can make you a better programmer by providing faster feedback and a broader view of software quality and defects. Learning how to better use a sophisticated text editor like Emacs/Vim or an IDE like Eclipse with your programming language of choice can make you a more productive programmer.

Neither you nor any other programmer will ever be able to learn all the technologies in use today: there are just too many. What you can and should do is learn those that will help with your current projects, and those that you can learn more easily. The more technologies you know, the broader the range of technologies you have at least partial access to, and the easier it will be to learn new ones.

April 27, 2016 04:00 AM

April 24, 2016

Glyph Lefkowitz

Email Isn’t The Thing You’re Bad At

I’ve been using the Internet for a good 25 years now, and I’ve been lucky enough to have some perspective dating back farther than that. The common refrain for my entire tenure here:

We all get too much email.

A New, New, New, New Hope

Luckily, something is always on the cusp of replacing email. AOL instant messenger will totally replace it. Then it was blogging. RSS. MySpace. Then it was FriendFeed. Then Twitter. Then Facebook.

Today, it’s in vogue to talk about how Slack is going to replace email. As someone who has seen this play out a dozen times now, let me give you a little spoiler:

Slack is not going to replace email.

But Slack isn’t the problem here, either. It’s just another communication tool.

The problem of email overload is both ancient and persistent. If the problem were really with “email”, then, presumably, one of the nine million email apps that dot the app-stores like mushrooms sprouting from a globe-spanning mycelium would have just solved it by now, and we could all move on with our lives. Instead, it is permanently in vogue1 to talk about how overloaded we all are.

If not email, then what?

If you have twenty-four thousand unread emails in your Inbox, like some kind of goddamn animal, what you’re bad at is not email, it’s transactional interactions.

Different communication media have different characteristics, but the defining characteristic of email is that it is the primary mode of communication that we use, both professionally and personally, when we are asking someone else to perform a task.

Of course you might use any form of communication to communicate tasks to another person. But other forms - especially the currently popular real-time methods - appear as a bi-directional communication, and are largely immutable. Email’s distinguishing characteristic is that it is discrete; each message is its own entity with its own ID. Emails may also be annotated, whether with flags, replied-to markers, labels, placement in folders, archiving, or deleting. Contrast this with a group chat in IRC, iMessage, or Slack, where the log is mostly2 unchangeable, and the only available annotation is “did your scrollbar ever move down past this point”; each individual message has only one bit of associated information. Unless you have catlike reflexes and an unbelievably obsessive-compulsive personality, it is highly unlikely that you will carefully set the “read” flag on each and every message in an extended conversation.

All this makes email much more suitable for communicating a task, because the recipient can file it according to their system for tracking tasks, come back to it later, and generally treat the message itself as an artifact. By contrast if I were to just walk up to you on the street and say “hey can you do this for me”, you will almost certainly just forget.

The word “task” might seem heavy-weight for some of the things that email is used for, but tasks come in all sizes. One task might be “click this link to confirm your sign-up on this website”. Another might be “choose a time to get together for coffee”. Or “please pass along my resume to your hiring department”. Yet another might be “send me the final draft of the Henderson report”.

Email is also used for conveying information: here are the minutes from that meeting we were just in. Here is transcription of the whiteboard from that design session. Here are some photos from our family vacation. But even in these cases, a task is implied: read these minutes and see if they’re accurate; inspect this diagram and use it to inform your design; look at these photos and just enjoy them.

So here’s the thing that you’re bad at, which is why none of the fifty different email apps you’ve bought for your phone have fixed the problem: when you get these messages, you aren’t making a conscious decision about:

  1. how important the message is to you
  2. whether you want to act on them at all
  3. when you want to act on them
  4. what exact action you want to take
  5. what the consequences of taking or not taking that action will be

This means that when someone asks you to do a thing, you probably aren’t going to do it. You’re going to pretend to commit to it, and then you’re going to flake out when push comes to shove. You’re going to keep context-switching until all the deadlines have passed.

In other words:

The thing you are bad at is saying ‘no’ to people.

Sometimes it’s not obvious that what you’re doing is saying ‘no’. For many of us — and I certainly fall into this category — a lot of the messages we get are vaguely informational. They’re from random project mailing lists, perhaps they’re discussions between other people, and it’s unclear what we should do about them (or if we should do anything at all). We hang on to them (piling up in our Inboxes) because they might be relevant in the future. I am not advocating that you have to reply to every dumb mailing list email with a 5-part action plan and a Scrum meeting invite: that would be a disaster. You don’t have time for that. You really shouldn’t have time for that.

The trick about getting to Inbox Zero3 is not in somehow becoming an email-reading machine, but in realizing that most email is worthless, and that’s OK. If you’re not going to do anything with it, just archive it and forget about it. If you’re subscribed to a mailing list where only 1 out of 1000 messages actually represents something you should do about it, archive all the rest after only answering the question “is this the one I should do something about?”. You can answer that question after just glancing at the subject; there are times when checking my email I will be hitting “archive” with a 1-second frequency. If you are on a list where zero messages are ever interesting enough to read in their entirety or do anything about, then of course you should unsubscribe.

Once you’ve dug yourself into a hole with thousands of “I don’t know what I should do with this” messages, it’s time to declare email bankruptcy. If you have 24,000 messages in your Inbox, let me be real with you: you are never, ever going to answer all those messages. You do not need a smartwatch to tell you exactly how many messages you are never going to reply to.

We’re In This Together, Me Especially

A lot of guidance about what to do with your email addresses email overload as a personal problem. Over the years of developing my tips and tricks for dealing with it, I certainly saw it that way. But lately, I’m starting to see that it has pernicious social effects.

If you have 24,000 messages in your Inbox, that means you aren’t keeping track or setting priorities on which tasks you want to complete. But just because you’re not setting those priorities, that doesn’t mean nobody is. It means you are letting availability heuristic - whatever is “latest and loudest” - govern access to your attention, and therefore your time. By doing this, you are rewarding people (or #brands) who contact you repeatedly, over inappropriate channels, and generally try to flood your attention with their priorities instead of your own. This, in turn, creates a culture where it is considered reasonable and appropriate to assume that you need to do that in order to get someone’s attention.

Since we live in the era of subtext and implication, I should explicitly say that I’m not describing any specific work environment or community. I used to have an email startup, and so I thought about this stuff very heavily for almost a decade. I have seen email habits at dozens of companies, and I help people in the open source community with their email on a regular basis. So I’m not throwing shade: almost everybody is terrible at this.

And that is the one way that email, in the sense of the tools and programs we use to process it, is at fault: technology has made it easier and easier to ask people to do more and more things, without giving us better tools or training to deal with the increasingly huge array of demands on our time. It’s easier than ever to say “hey could you do this for me” and harder than ever to just say “no, too busy”.

Mostly, though, I want you to know that this isn’t just about you any more. It’s about someone much more important than you: me. I’m tired of sending reply after reply to people asking to “just circle back” or asking if I’ve seen their email. Yes, I’ve seen your email. I have a long backlog of tasks, and, like anyone, I have trouble managing them and getting them all done4, and I frequently have to decide that certain things are just not important enough to do. Sometimes it takes me a couple of weeks to get to a message. Sometimes I never do. But, it’s impossible to be mad at somebody for “just checking in” for the fourth time when this is probably the only possible way they ever manage to get anyone else to do anything.

I don’t want to end on a downer here, though. And I don’t have a book to sell you which will solve all your productivity problems. I know that if I lay out some incredibly elaborate system all at once, it’ll seem overwhelming. I know that if I point you at some amazing gadget that helps you keep track of what you want to do, you’ll either balk at the price or get lost fiddling with all its knobs and buttons and not getting a lot of benefit out of it. So if I’m describing a problem that you have here, here’s what I want you to do.

Step zero is setting aside some time. This will probably take you a few hours, but trust me; they will be well-spent.

Email Bankruptcy

First, you need to declare email bankruptcy. Select every message in your Inbox older than 2 weeks. Archive them all, right now. In the past, you might have to worry about deleting those messages, but modern email systems pretty much universally have more storage than you’ll ever need. So rest assured that if you actually need to do anything with these messages, they’ll all be in your archive. But anything in your Inbox right now older than a couple of weeks is just never going to get dealt with, and it’s time to accept that fact. Again, this part of the process is not about making a decision yet, it’s just about accepting a reality.

Mailbox Three

One extra tweak I would suggest here is to get rid of all of your email folders and filters. It seems like many folks with big email problems have tried to address this by ever-finer-grained classification of messages, ever more byzantine email rules. At least, it’s common for me, when looking over someone’s shoulder to see 24,000 messages, it’s common to also see 50 folders. Probably these aren’t helping you very much.

In older email systems, it was necessary to construct elaborate header-based filtering systems so that you can later identify those messages in certain specific ways, like “message X went to this mailing list”. However, this was an incomplete hack, a workaround for a missing feature. Almost all modern email clients (and if yours doesn’t do this, switch) allow you to locate messages like this via search.

Your mail system ought to have 3 folders:

  1. Inbox, which you process to discover tasks,
  2. Drafts, which you use to save progress on replies, and
  3. Archive, the folder which you access only by searching for information you need when performing a task.

Getting rid of unnecessary folders and queries and filter rules will remove things that you can fiddle with.

Moving individual units of trash between different heaps of trash is not being productive, and by removing all the different folders you can shuffle your messages into before actually acting upon them you will make better use of your time spent looking at your email client.

There’s one exception to this rule, which is filters that do nothing but cause a message to skip your Inbox and go straight to the archive. The reason that this type of filter is different is that there are certain sources or patterns of message which are not actionable, but rather, a useful source of reference material that is only available as a stream of emails. Messages like that should, indeed, not show up in your Inbox. But, there’s no reason to file them into a specific folder or set of folders; you can always find them with a search.

Make A Place For Tasks

Next, you need to get a task list. Your email is not a task list; tasks are things that you decided you’re going to do, not things that other people have asked you to do5. Critically, you are going to need to parse e-mails into tasks. To explain why, let’s have a little arithmetic aside.

Let’s say it only takes you 45 seconds to go from reading a message to deciding what it really means you should do; so, it only takes 20 seconds to go from looking at the message to remembering what you need to do about it. This means that by the time you get to 180 un-processed messages that you need to do something about in your Inbox, you’ll be spending an hour a day doing nothing but remembering what those messages mean, before you do anything related to actually living your life, even including checking for new messages.

What should you use for the task list? On some level, this doesn’t really matter. It only needs one really important property: you need to trust that if you put something onto it, you’ll see it at the appropriate time. How exactly that works depends heavily on your own personal relationship with your computers and devices; it might just be a physical piece of paper. But for most of us living in a multi-device world, something that synchronizes to some kind of cloud service is important, so Wunderlist or Remember the Milk are good places to start, with free accounts.

Turn Messages Into Tasks

The next step - and this is really the first day of the rest of your life - start at the oldest message in your Inbox, and work forward in time. Look at only one message at a time. Decide whether this message is a meaningful task that you should accomplish.

If you decide a message represents a task, then make a new task on your task list. Decide what the task actually is, and describe it in words; don’t create tasks like “answer this message”. Why do you need to answer it? Do you need to gather any information first?

If you need to access information from the message in order to accomplish the task, then be sure to note in your task how to get back to the email. Depending on what your mail client is, it may be easier or harder to do this6, but in the worst case, following the guidelines above about eliminating unnecessary folders and filing in your email client, just put a hint into your task list about how to search for the message in question unambiguously.

Once you’ve done that:

Archive the message immediately.

The record that you need to do something about the message now lives in your task list, not your email client. You’ve processed it, and so it should no longer remain in your inbox.

If you decide a message doesn’t represent a task, then:

Archive the message immediately.

Do not move on to the next message until you have archived this message. Do not look ahead7. The presence of a message in your Inbox means you need to make a decision about it. Follow the touch-move rule with your email. If you skip over messages habitually and decide you’ll “just get back to it in a minute”, that minute will turn into 4 months and you’ll be right back where you were before.

Circling back to the subject of this post; once again, this isn’t really specific to email. You should follow roughly the same workflow when someone asks you to do a task in a meeting, or in Slack, or on your Discourse board, or wherever, if you think that the task is actually important enough to do. Note the slack timestamp and a snippet of the message so you can search for it again, if there is a relevant attachment. The thing that makes email different is really just the presence of an email box.

Banish The Blue Dot

Almost all email clients have a way of tracking “unread” messages; they cheerfully display counters of them. Ignore this information; it is useless. Messages have two states: in your inbox (unprocessed) and in your archive (processed). “Read” vs. “Unread” can be, at best, of minimal utility when resuming an interrupted scanning session. But, you are always only ever looking at the oldest message first, right? So none of the messages below it should be unread anyway...

Be Ruthless

As you try to start translating your flood of inbound communications into an actionable set of tasks you can actually accomplish, you are going to notice that your task list is going to grow and grow just as your Inbox was before. This is the hardest step:

Decide you are not going to do those tasks, and simply delete them. Sometimes, a task’s entire life-cycle is to be created from an email, exist for ten minutes, and then have you come back to look at it and then delete it. This might feel pointless, but in going through that process, you are learning something extremely valuable: you are learning what sorts of things are not actually important enough to do you do.

If every single message you get from some automated system provokes this kind of reaction, that will give you a clue that said system is wasting your time, and just making you feel anxious about work you’re never really going to get to, which can then lead to you un-subscribing or filtering messages from that system.

Tasks Before Messages

To thine own self, not thy Inbox, be true.

Try to start your day by looking at the things you’ve consciously decided to do. Don’t look at your email, don’t look at Slack; look at your calendar, and look at your task list.

One of those tasks, probably, is a daily reminder to “check your email”, but that reminder is there more to remind you to only do it once than to prevent you from forgetting.

I say “try” because this part is always going to be a challenge; while I mentioned earlier that you don’t want to unthinkingly give in to availability heuristic, you also have to acknowledge that the reason it’s called a “cognitive bias” is because it’s part of human cognition. There will always be a constant anxious temptation to just check for new stuff; for those of us who have a predisposition towards excessive scanning behavior have it more than others.

Why Email?

We all need to make commitments in our daily lives. We need to do things for other people. And when we make a commitment, we want to be telling the truth. I want you to try to do all these things so you can be better at that. It’s impossible to truthfully make a commitment to spend some time to perform some task in the future if, realistically, you know that all your time in the future will be consumed by whatever the top 3 highest-priority angry voicemails you have on that day are.

Email is a challenging social problem, but I am tired of email, especially the user interface of email applications, getting the blame for what is, at its heart, a problem of interpersonal relations. It’s like noticing that you get a lot of bills through the mail, and then blaming the state of your finances on the colors of the paint in your apartment building’s mail room. Of course, the UI of an email app can encourage good or bad habits, but Gmail gave us a prominent “Archive” button a decade ago, and we still have all the same terrible habits that were plaguing Outlook users in the 90s.

Of course, there’s a lot more to “productivity” than just making a list of the things you’re going to do. Some tools can really help you manage that list a lot better. But all they can help you to do is to stop working on the wrong things, and start working on the right ones. Actually being more productive, in the sense of getting more units of work out of a day, is something you get from keeping yourself healthy, happy, and well-rested, not from an email filing system.

You can’t violate causality to put more hours into the day, and as a frail and finite human being, there’s only so much work you can reasonably squeeze in before you die.

The reason I care a lot about salvaging email specifically is that it remains the best medium for communication that allows you to be in control of your own time, and by extension, the best medium for allowing people to do creative work.

Asking someone to do something via SMS doesn’t scale; if you have hundreds of unread texts there’s no way to put them in order, no way to classify them as “finished” and “not finished”, so you need to keep it to the number of things you can fit in short term memory. Not to mention the fact that text messaging is almost by definition an interruption - by default, it causes a device in someone’s pocket to buzz. Asking someone to do something in group chat, such as IRC or Slack, is similarly time-dependent; if they are around, it becomes an interruption, and if they’re not around, you have to keep asking and asking over and over again, which makes it really inefficient for the asker (or the asker can use a @highlight, and assume that Slack will send the recipient, guess what, an email).

Social media often comes up as another possible replacement for email, but its sort order is even worse than “only the most recent and most frequently repeated”. Messages are instead sorted by value to advertisers or likeliness to increase ‘engagement’”, i.e. most likely to keep you looking at this social media site rather than doing any real work.

For those of us who require long stretches of uninterrupted time to produce something good – “creatives”, or whatever today’s awkward buzzword for intersection of writers, programmers, graphic designers, illustrators, and so on, is – we need an inbound task queue that we can have some level of control over. Something that we can check at a time of our choosing, something that we can apply filtering to in order to protect access to our attention, something that maintains the chain of request/reply for reference when we have to pick up a thread we’ve had to let go of for a while. Some way to be in touch with our customers, our users, and our fans, without being constantly interrupted. Because if we don’t give those who need to communicate with such a tool, they’ll just blast @everyone messages into our slack channels and @mentions onto Twitter and texting us Hey, got a minute? until we have to quit everything to try and get some work done.

Questions about this post?

Go ahead and send me an email.


Acknowledgements

As always, any errors or bad ideas are certainly my own.

First of all, Merlin Mann, whose writing and podcasting were the inspiration, direct or indirect, for many of my thoughts on this subject; and who sets a good example because he won’t answer your email.

Thanks also to David Reid for introducing me to Merlin's work, as well as Alex Gaynor, Tristan Seligmann, Donald Stufft, Cory Benfield, Piët Delport, Amber Brown, and Ashwini Oruganti for feedback on drafts.


  1. Email is so culturally pervasive that it is literally in Vogue, although in fairness this is not a reference to the overflowing-Inbox problem that I’m discussing here. 

  2. I find the “edit” function in Slack maddening; although I appreciate why it was added, it’s easy to retroactively completely change the meaning of an entire conversation in ways that make it very confusing for those reading later. You don’t even have to do this intentionally; sometimes you make a legitimate mistake, like forgetting the word “not”, and the next 5 or 6 messages are about resolving that confusion; then, you go back and edit, and it looks like your colleagues correcting you are a pedantic version of Mr. Magoo, unable to see that you were correct the first time. 

  3. There, I said it. Are you happy now? 

  4. Just to clarify: nothing in this post should be construed as me berating you for not getting more work done, or for ever failing to meet any commitment no matter how casual. Quite the opposite: what I’m saying you need to do is acknowledge that you’re going to screw up and rather than hold a thousand emails in your inbox in the vain hope that you won’t, just send a quick apology and move on. 

  5. Maybe you decided to do the thing because your boss asked you to do it and failing to do it would cost you your job, but nevertheless, that is a conscious decision that you are making; not everybody gets to have “boss” priority, and unless your job is a true Orwellian nightmare, not everything your boss says in email is an instant career-ending catastrophe. 

  6. In Gmail, you can usually just copy a link to the message itself. If you’re using OS X’s Mail.app, you can use this Python script to generate links that, when clicked, will open the Mail app:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    from __future__ import (print_function, unicode_literals,
                            absolute_import, division)
    
    from ScriptingBridge import SBApplication
    import urllib
    
    mail = SBApplication.applicationWithBundleIdentifier_("com.apple.mail")
    
    for viewer in mail.messageViewers():
        for message in viewer.selectedMessages():
            for header in message.headers():
                name = header.name()
                if name.lower() == "message-id":
                    content = header.content()
                    print("message:" + urllib.quote(content))
    

    You can then paste these links into just about any task tracker; if they don’t become clickable, you can paste them into Safari’s URL bar or pass them to the open command-line tool. 

  7. The one exception here is that you can look ahead in the same thread to see if someone has already replied. 

by Glyph at April 24, 2016 11:54 PM

Moshe Zadka

Use virtualenv

In a conversation recently with a friend, we agreed that “something the instructions tell you to do ‘sudo pip install’…which is good, because then you know to ignore them”.

There is never a need for “sudo pip install”, and doing it is an anti-pattern. Instead, all installation of packages should go into a virtualenv. The only exception is, of course, virtualenv (and arguably, pip and wheel). I got enough questions about this that I wanted to write up an explanation about the how, why and why the counter-arguments are wrong.

What is virtualenv?

The documentation says:

virtualenv is a tool to create isolated Python environments.

The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into/usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.

Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.

The tl:dr; is:

  • virtualenv allows not needing administrator privileges
  • virtualenv allows installing different versions of the same library
  • virtualenv allows installing an application and never accidentally updating a dependency

The first problem is the one the “sudo” comment addresses — but the real issues stem from the second and third: not using a virtual environment leads to the potential of conflicts and dependency hell.

How to use virtualenv?

Creating a virtual environment is easy:

$ virtualenv dirname

will create the directory, if it does not exist, and then create a virtual environment in it. It is possible to use it either activated or unactivated. Activating a virtual environment is done by

$ . dirname/bin/activate
(dirname)$

this will make python, as well as any script installed using setuptools’ “console_scripts” option in the virtual environment, on the command-execution path. The most important of those is pip, and so using pip will install into the virtual environment.

It is also possible to use a virtual environment without activating it, by directly calling dirname/bin/python or any other console script. Again, pip is an example of those, and used for installing into the virtual environment.

Installing tools for “general use”

I have seen a couple of times the argument that when installing tools for general use it makes sense to install them into the system install. I do not think that this is a reasonable exception for two reasons:

  • It still forces to use root to install/upgrade those tools
  • It still runs into the dependency/conflict hell problems

There are a few good alternatives for this:

  • Create a (handful of) virtual environments, and add them to users’ path.
  • Use “pex” to install Python tools in a way that isolates them even further from system dependencies.

Exploratory programming

People often use Python for exploratory programming. That’s great! Note that since pip 7, pip is building and caching wheels by default. This means that creating virtual environments is even cheaper: tearing down an environment and building a new one will not require recompilation. Because of that, it is easy to treat virtual environments as disposable except for configuration: activate a virtual environment, explore — and whenever needing to move things into production, ‘pip freeze’ will allow easy recreation of the environment.

by moshez at April 24, 2016 04:54 AM

April 20, 2016

Glyph Lefkowitz

Far too many things can stop the BLOB

It occurs to me that the lack of a standard, well-supported, memory-efficient interface for BLOBs in multiple programming languages is one of the primary driving factors of poor scalability characteristics of open source SaaS applications.

Applications like Gitlab, Redmine, Trac, Wordpress, and so on, all need to store potentially large files (“attachments”). Frequently, they elect to store these attachments (at least by default) in a dedicated filesystem directory. This leads to a number of tricky concurrency issues, as the filesystem has different (and divorced) concurrency semantics from the backend database, and resides only on the individual API nodes, rather than in the shared namespace of the attached database.

Some databases do support writing to BLOBs like files. Postgres, SQLite, and Oracle do, although it seems MySQL lags behind in this area (although I’d love to be corrected on this front). But many higher-level API bindings for these databases don’t expose support for BLOBs in an efficient way.

Directly using the filesystem, as opposed to a backing service, breaks the “expected” scaling behavior of the front-end portion of a web application. Using an object store, like Cloud Files or S3, is a good option to achieve high scalability for public-facing applications, but that creates additional deployment complexity.

So, as both a plea to others and a note to myself: if you’re writing a database-backed application that needs to store some data, please consider making “store it in the database as BLOBs” an option. And if your particular database client library doesn’t support it, consider filing a bug.

by Glyph at April 20, 2016 01:01 AM

April 15, 2016

Itamar Turner-Trauring

Improving your skills as a 9 to 5 programmer

Do you only code 9 to 5, but wonder if that's good enough? Do you see other programmers working on personal projects or open source projects, going to hackathons, and spending all their spare time writing software? You might think that as someone who only writes software at their job, who only works 9-5, you will never be as good. You might believe that only someone who eats, sleeps and breathes code can excel. But actually it's possible to stick to a 40-hour week and still be a valuable, skilled programmer.

Working on personal or open source software projects doesn't automatically make you better programmer. Hackathons might even be a net negative if they give you the impression that building software to arbitrary deadlines while exhausted is a reasonable way to produce anything of value. There are inherent limits to your productive working hours. If you don't feel like spending more time coding when you get home, then don't: you'll be too tired or unfocused to gain anything.

Spending time on side projects does have some value, but the most useful result is not so much practice as knowledge. Established software projects tend to use older technology and techniques, simply because they've been in existence for a while. The main value you get from working on other software projects and interacting with developers outside of work is knowledge of:

  1. A broader range of technologies and tools.
  2. New techniques and processes. Perhaps your company doesn't do much testing, but you can learn about test-driven development elsewhere.

Having a broad range of tools and techniques to reach for is a valuable skill both at your job and when looking for a new job. But actual coding is not an efficient way to gain this knowledge. You don't actually need to use new tools and techniques, and you'll never really have to time to learn all tools and all techniques in detail anyway. You get the most value just from having some sense of what tools and techniques are out there, what they do and when they're useful. If a new tool you discover is immediately relevant to your job you can just learn it during working hours, and if it's not you can should just file it away in your brain for later.

Learning about new tools can also help you find a new job, even when you don't actually use them. I was once asked at an interview about the difference between NoSQL and traditional databases. At the time I'd never used MongoDB or any other NoSQL database, but I knew enough to answer satisfactorily. Being able to answer that question told the interviewer I'd be able to use that tool, if necessary, even if I hadn't done it before.

Instead of coding in your spare time you can get similar benefits, and more efficiently, by directly focusing on acquiring knowledge of new tools and techniques. And since this knowledge will benefit your employer and you don't need to spend significant time on it, you can acquire it during working hours. You're never actually working every single minute of your day, you always have some time when you're slacking off on the Internet. Perhaps you're doing so right now! You can use that time to expand your knowledge.

Each week you should allocate one hour of your time at work to learning about new tools and techniques. Choosing a particular time will help you do this on a regular basis. Personally I'd choose Friday afternoons, since by that point in the week I'm not achieving much anyway. Don't skip this hour just because of deadlines or tiredness. You'll do better at deadlines, and be less tired, if you know of the right tools and techniques to efficiently solve the problems you encounter at your job.

April 15, 2016 04:00 AM

April 13, 2016

Glyph Lefkowitz

I think I’m using GitHub wrong.

I use a hodgepodge of https: and : (i.e. “ssh”) URL schemes for my local clones; sometimes I have a remote called “github” and sometimes I have one called “origin”. Sometimes I clone from a fork I made and sometimes I clone from the upstream.

I think the right way to use GitHub would instead be to always fork first, make my remote always be “origin”, and consistently name the upstream remote “upstream”. The problem with this, though, is that forks rapidly fall out of date, and I often want to automatically synchronize all the upstream branches.

Is there a script or a github option or something to synchronize a fork with upstream automatically, including all its branches and tags? I know there’s no comment field, but you can email me or reply on twitter.

by Glyph at April 13, 2016 09:11 PM

April 04, 2016

Twisted Matrix Laboratories

Twisted 16.1 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.1!

This release is hot off the heels of 16.0 released last month, including some nice little tidbits. The highlights include:
  • twisted.application.internet.ClientService, a service that maintains a persistent outgoing endpoint-based connection -- a replacement for ReconnectingClientFactory that uses modern APIs;
  • A large (77% on one benchmark) performance improvement when using twisted.web's client on PyPy;
  • A few conch modules have been ported to Python 3, in preparation for further porting of the SSH functionality;
  • Full support for OpenSSL 1.0.2f and above;
  • t.web.http.Request.addCookie now accepts Unicode and bytes keys/values;
  • twistd manhole no longer uses a hard-coded SSH host key, and will generate one for you on the fly (this adds a 'appdirs' PyPI dependency, installing with [conch] will add it automatically);
  • Over eighteen tickets overall closed since 16.0.
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by HawkOwl (noreply@blogger.com) at April 04, 2016 05:14 PM

Itamar Turner-Trauring

Vanquish whiteboard interview puzzles with test-driven development

Whiteboard coding puzzles are of course utterly terrifying and totally unrealistic, which does not recommend them as an interview procedure. Yet they are still commonly used, which means the next time you interview for a programming job you might find yourself asked to solve an algorithmic problem with a 30 minute deadline, no text editor and a stranger who will decide your future employment staring at you expectantly. Personally this is where I start to panic, just a little: am I fraud? Can I really write software? I can, in fact, and so can you, and one of the techniques you can use to develop software will also allow you to go from a blank whiteboard to an impressed interviewer: test-driven development.

The most important thing you need to understand is this: the puzzle is a distraction. You want to spend the next 30 minutes impressing the interviewer. Ideally you will also solve the puzzle, but that is not your goal; your goal is to impress the interviewer so you can get a job offer. And if you do it right, sometimes you can even fail to solve the puzzle and still succeed at your goal.

Simply trying to solve the puzzle may fail to impress your interviewer:

Interviewer: ... and so you need to find the optimal route for the salesman between all the cities.
You: OK, so... <stare at the whiteboard>
Interviewer: I'm getting bored. Here's a hint!
You: <write some code>... OK, maybe that works?
Interviewer: Foolish interviewee, you have missed a bug that is obvious to me since I have given this puzzle to dozens of other candidates, most of whom were more impressive than you.
You: Let me fix that.
Interviewer: You are out of time. Next up, lunch interview!

Impressing the interviewer requires more: you need to show off your thinking and process, and take control of the conversation. The details are important, but at a high level you must follow a four-step process:

  1. Explain the steps of your process to the interviewer.
  2. Write some test cases on the whiteboard.
  3. Try to implement a solution.
  4. Run all the test cases against the implementation. If the implementation fails the tests go back to step 3.

Step 1: Explain what you're about to do

You don't want the interviewer interrupting you mid-thought, or deciding you're floundering. The first thing you do then is explain to the interviewer the process you are about to follow: writing tests, then doing a first pass implementation, then validating the implementation against the tests, then fixing the bugs you find.

Step 2: Write some tests

Write a series of test cases on the whiteboard: this input gives that output. Start with simple edge cases, and try to find some more interesting cases as well. Thinking through the transformations will help you get a better sense of how to solve the problem, and writing them on the whiteboard will demonstrate progress to the interviewer. Make sure to narrate your thought process.

Step 3: Initial implementation

Start solving the puzzle by writing code. To ensure you don't get interrupted too soon, remind the interviewer that next you will be testing the code with your pre-written tests. If you get stuck, return to your test cases and see how you can change the code to do what they suggest. Make sure to narrate your thought process.

Step 4: Testing and bug fixing

For each test case, compare the output you expect to what your current code does. This will help you catch bugs on your own and again demonstrate progress to your interviewer. If you find a bug then rewrite the code to fix it, and then start testing again.

Let's listen to an interview that follows these four steps; your code may be the same, but you sound more confident and professional:

Interviewer: ... and so you need to find the optimal route for the salesman between all the cities.
You: Let me explain how I will go about solving this. First, I will write down some test cases. Second, I will write a first pass implementation; it will probably be buggy, so third, I will run the test cases against my implementation to find the bugs and fix them. Interviewer: OK, sounds good.
You: <write some test cases>... OK, next I will write a first pass of code. Remember that when I'm done I won't be finished yet, I still will need to run my tests.
Interviewer: <This candidate will write high-quality code!>
You: <write some code>... and now to run my test cases. Feed this input, x is multiplied, run through the loop... this one gives expected result! OK, next test case... hm, that doesn't match. Let me see... ah! OK, that's fixed, let's run the tests again...
Interviewer: Unfortunately we are out of time, but I can see where you're going with this.

Next time you're interviewing for a programming job remember these four steps: explain, test, code, debug with tests. You will face your next puzzle armed with a process, helping you overcome your anxiety. By writing tests you are more likely to solve the puzzle correctly. And most importantly you will impress your interviewer, and maybe even get a job offer.

April 04, 2016 04:00 AM

March 26, 2016

Moshe Zadka

Weak references, caches and immutable objects

Consider the following situation:

  • We have a lot of immutable objects. For our purposes, an “immutable” object is one where “hash(…)” is defined.
  • We have a (pure) function that is fairly expensive to compute.
  • The objects get created and destroyed regularly.

We often would like to cache the function. As an example, consider a function to serialize an object — if the same objects serialized several times, we would like to avoid recomputing the serialization.

One naive solution would be to implement a cache:

cache = {}
def serialize(obj):
    if obj not in cache:
        cache[obj] = _really_serialize(obj)
    return cache[obj]

The problem with that is that the cache would keep references to our objects long after they should have died. We can try and use an LRU (for example, repoze.lru) so that only a certain number of objects would extend their lifetimes in that way, but the size of the LRU would trade-off space overhead and time overhead.

An alternative is to use weak references. Weak references are references that do not keep objects from being collected. There are several ways to use weak references, but here one is ideally suited:

import weakref
cache = weakref.WeakKeyDictionary()
def serialize(obj):
    if obj not in cache:
        cache[obj] = _really_serialize(obj)
    return cache[obj]

Note that this is the same code as before — except that the cache is a weak key dictionary. A weak key dictionary keeps weak references to the keys, but strong references to the value. When a key is garbage collected, the entry in the dictionary disappears.

>>> import weakref
>>> a=weakref.WeakKeyDictionary()
>>> fs = frozenset([1,2,3])
>>> a[fs] = "three objects"
>>> print a[fs]
three objects
>>> len(a)
1
>>> fs = None
>>> len(a)
0

by moshez at March 26, 2016 04:36 AM

March 22, 2016

Itamar Turner-Trauring

Learning from the programmers who built your tools

The software you use was written by developers just like you. They made mistakes just like you do, learned and improved just like you do. But you have a huge advantage: you can learn from the mistakes and discoveries they have already made. I've discussed comparing and contrasting a single task across multiple alternative technologies, using resource cleanup idioms of C++, Go and Python as an example. Another technique for gaining technical depth is examining a single technology and seeing how its support for a single task evolved over time. As an example, let's consider resource cleanup in Python.

To recap the previous post, resource cleanup is a problem any programming language needs to solve: you've opened a file, and eventually you will need to close it. However, in the interim your code might return, or throw an exception. You want to have that file cleaned up no matter what, so you don't leak resources:

def write():
    f = open("myfile", "w")
    # If this throws an exception, e.g. when disk is full,
    # then f.close() will never be run:
    f.write("hello")
    f.close()

In Python the clean up idiom started as an extension to the exception handling syntax:

def write():
    f = open("myfile", "w")
    try:
        f.write("hello")
    finally:
        f.close()

Whether the code in the try block returns, throws an exception or continues, the finally block will always be called.

What problems can we spot in this idiom? Why would the Python developers try to improve on it? Let's make a list; notice that these are all focused on solving problems with humans, not with computers:

  1. You still have to remember the particular function name to clean up each kind of resource, e.g. close() for files and release() for locks.
  2. It's repetitive: every single time you open a file you have to call close() on it, meaning more code to write and more code to read.
  3. The cleanup code happens long after the resource is initialized, interrupting your flow of reading the code.

The Python developers eventually came up with a new, improved language feature that solves these problems:

def write():
    with open("myfile", "w") as f:
        f.write("hello")

This solves all three problems:

  1. Each resource knows how to clean itself up.
  2. Clean up is done automatically, no need to explicitly call the method.
  3. Clean up is done at the right time, but without extra code to read.

Can this be improved? I think so. Consider the following example:

>>> with open("/tmp/file", "w") as f:
...     f.write("hello")
... 
5
>>> print(f)
<_io.TextIOWrapper name='/tmp/file' mode='w' encoding='UTF-8'>

Even though the the file has been closed, the resource has been cleaned up, the variable referring to the object persists outside the with block. This is "namespace pollution", extra unnecessary variables being added that can potentially introduce bugs. Better if f only existed inside the with block.

By examining the improvements to a particular feature you can take advantage of all the hard work the developers put into coming with alternatives: instead of just learning from the latest version you can see how their thinking evolved over time. You can then try to come up with improvements on your own, to exercise your new understanding. While I've examined a language feature in this post you can apply the same technique to any form of technology. Pick your favorite database, for example, and a feature that has evolved over time: why did it change, and what could be improved?

The developers who wrote the software you use had to find their solutions the hard way. You should respect their work by building on what they have learned.

March 22, 2016 04:00 AM

March 19, 2016

Itamar Turner-Trauring

Book Review: Become a better learner by discovering "How Learning Works"

Learning is never easy, but it is even harder when you are learning on your own. When in school you have teachers to rely on, when you are new to the profession you have experienced programmers to guide you. But eventually you reach a point where you have to learn on your own, without help or support. You must then know how to teach yourself, which is to say you must first learn how to teach. To acquire such a skill you would need an expert on human learning, and one who is also an able teacher. And if such a teacher were not available the next best thing would be a book that they wrote, a book summarizing what they know and how you can apply it.

"How Learning Works" is the unexpected child of the two somewhat conflicting goals of academia: research and teaching. Research universities value research far more than teaching, even as teaching is required of faculty. But some faculty do care about teaching, and some scientists study learning and teaching. This book is the result of intersection of those two interests: practical principles for teaching based on what scientists have discovered about learning. And these principles can also be applied by learners themselves, as the book's conclusion points out, learners like me and you.

The book is organized around seven principles, reviewing the relevant academic research and then providing practical teaching advice based on the findings. The evidence for the principles is quickly summarized and for the most part is fairly plausible, with the usual caveats about the difficulty of such research. And since the book is driven by scientific evidence its lessons often go far beyond naive common sense. While discussing how students organize their knowledge the book discusses research regarding "expert blind spot." While experts are much better at organizing knowledge, their very different ways of understanding can actually make it more difficult for them to teach novices. This is one reason why you will hear so much seemingly contradictory advice on subjects like testing: experts often assume the scope and limitations of their advice are obvious, even when they're not.

While the book is organized around high-level principles, the explanations and advice it gives are detailed, specific and very hands-on. When discussing the principle of targeted feedback, for example, the book reviews research suggesting:

  1. "Feedback is more effective when it identifies particular aspects [students need to improve]."
  2. "Too much feedback tends to overwhelm students."
  3. "Even minimal feedback can lead to better results."
  4. Immediate feedback can be less helpful than delayed feedback.

The chapter then follows up with at least eight different ways to improve the relevance, frequency and timeliness of feedback. E.g. the book suggests asking students to to explain how they applied feedback in later work, a suggestion well-worth following even without a teacher's requirement. The detailed breakdown and suggestions are just one part of the overall principle the chapter covers; an additional section covers more research and corresponding advice. The advice itself quite obviously comes from practiced teachers, although it is somewhat over-focused on academic teaching involving a large class and a semester schedule. Even so, I have found the book full of advice and ideas relevant to far more than just academic teaching.

Both the principles covered and the resulting advice show up continuously in the previous blog posts I've written. For example, I previously talked about how providing the solutions you've come up with is important when asking for help. We can see how some of the principles discussed in the book apply in this situation. First, "prior knowledge can help or hinder learning." By providing your solutions you give your respondent a much more detailed understanding of what you know and how it affected your approach to solving the problem. When they provide feedback by helping you solve the problem they will be able to tailor it to your particular understanding, as in the principle mentioned earlier which stresses the importance of targeted feedback.

"How Learning Works" is a wonderful way to understand how you learn and how to improve your learning. It has helped me immensely in writing this blog, by making me a better teacher and better learner. I hope it will also help you take the next step in learning on your own. Grab a copy from your local library or buy it from Amazon.

March 19, 2016 04:00 AM

An apology

I previously promised I would post a review of a book on writing, but I've had to reject my original planned recommendation. Meanwhile I've written a review of another book that is well worth your while, How Learning Works. Enjoy!

March 19, 2016 04:00 AM

March 15, 2016

Twisted Matrix Laboratories

Twisted 16.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.0!

Twisted 16.0 brings some important changes, and some nice-to-haves as well. The major things are:
  • TLS endpoints have arrived! They're like the old `ssl:` endpoints, but support faster IPv4/IPv6 connections (using HostnameEndpoint) and always do hostname verification.
  • Conch now uses Cryptography instead of PyCrypto for underlying cryptographic operations. This means it'll work much better on PyPy!
  • Headers objects (notably used by t.web.server.Request) now support Unicode for the vast majority of cases, encoding keys to ISO-8859-1 and values to UTF-8.
  • WSGI support and AMP have been ported to Python 3, along with a handful of other modules.
  • More shedding of the past, with the GTK+ 1 reactor being removed.
  • Over 45 tickets have been closed since 15.5.
For more information, check the NEWS file (link provided below).

You can find the downloads at on PyPI (or alternatively our website). The NEWS file is also available.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber Brown (HawkOwl)
Twisted Release Manager

by HawkOwl (noreply@blogger.com) at March 15, 2016 06:16 AM

Itamar Turner-Trauring

Stagnating at your programming job? Don't quit just yet!

Are you bored at work? Do you feel like you're stagnating, not learning anything new, like you're not growing as a programmer anymore? If you love learning, and love solving problems, having a job where you're bored is no fun. Your first thought may be to start looking for a new job, within your company or elsewhere. But this may be a mistake, a missed opportunity to take your skills to the next level. Sometimes the new opportunity is hidden exactly where you're stagnating.

If you're bored at work, if you feel like you're solving the same problems over and over again, you probably have become quite good at your job. You've become an expert. You understand the problem domain, the tools you're using and their limitations... you can do your job without thinking much. Now, consider what it takes to build a new large piece of software: you need both to acquire expertise on the problem domain and to design and implement the new system. Doing both at once can be difficult if designing the software is near the edge of your competencies.

But since you're an expert in your job's problem domain you are ideally suited to design or redesign a relevant new system. As an expert you need only focus on the design aspect, you don't need to learn the problem domain at the same time. If you have supportive management you may be able to either automate your repetitive work, leading to new and more interesting challenges, or rewrite a legacy system you are currently maintaining.

Automating repetitive work with a framework:

The Django web framework originates from automation of repetitive work. The programmers at the Lawrence Journal-World newspaper realized they were building the same features over and over and decided to share them within a single framework, a framework that is now used by thousands of developers. But I've no doubt one of the short term benefits was a much more interesting job.

When you encounter repetitive work your first instinct as a programmer should be automation. If you're solving the same problem over and over your work may benefit from a software framework or library to automate shared functionality. Instead of rewriting the same piece of code five times you write the framework once and then only need to fill in the differing details for each particular situation. Building a framework will be a new technical challenge, and can sometimes allow you to introduce new technologies or best practices to your organization. And less repetition means you'll be less bored.

Rewriting a legacy system:

A different reason you might be bored is that you're spending all of your time fixing bugs in an old and creaky piece of software. Maintaining legacy software can be tedious: you have to deal with old technologies, hard-coded assumptions that are no longer applicable, years of patches and retrofits. But if rebuilding the legacy system provides value to your organization you may have an opportunity to write something new and do it the right way. Rewriting software is always risky, but if you can manage it you will be able to introduce new technologies and best practices to your organization, and build something new and hopefully better.

If you're stagnating or bored it might be time to move on to a new position or job. But before you do that try to take advantage of your current situation as an expert. Guided by your boredom, you may be able to apply your expertise in new and interesting ways: automating yourself out of your current job and into a new one of framework maintainer, or rebuilding the legacy system everyone loves to hate.

March 15, 2016 04:00 AM

March 10, 2016

Itamar Turner-Trauring

Gaining technical depth via compare and contrast

(Update 2016/03/11: expanded on difference between programming languages with and without exceptions.)

As a programmer you are expected to learn new technologies regularly. Even when the documentation is excellent, there will typically be underlying assumptions that go unstated because they are so obvious to the writer. And documentation is not always good.

But if you have relevant technical depth you will be able to recognize the commonalities and differences within a category of technologies, e.g. programming languages or databases. This means a new programming language will be easier to learn: you will recognize familiar features, different trade-offs, and some of the motivations of design choices. You will also be better able to judge the usefulness of the new technology. One way to improve your technical depth is to compare a single task across multiple technologies.

Let's consider a particular task: cleaning up a resource. If your code wants to write to a file you will open the file, write to it, and eventually close it. Forgetting to close the file might mean writes don't get written to disk until much later than you expected, or that certain resources get leaked. On Unix systems if you don't close file descriptors your process will eventually run out and not be able to open any new files.

Most programming languages allow returning from a function at multiple points, so cleanup ends up being repetitive. This makes it easier for you to forget to cleanup a resource you acquired or created within the function.

def write():
    f = open("myfile", "w")
    if something():
       f.close()  # Repetitive resource cleanup
       return
    
    f.write("hello")
    f.close()  # Repetitive resource cleanup

Many languages allow leaving a function in more than one way, e.g. with both returns and exceptions. Once you have exceptions in your language any part of your code might result in leaving the function due to a thrown exception, making resource cleanup even harder to get right:

def write():
    f = open("myfile", "w")
    # If this throws an exception, e.g. when disk is full,
    # then f.close() will never be run:
    f.write("hello")
    f.close()

As a result most languages provide an idiom or feature for automatically cleaning up resources, regardless of how or when you return from a function. Let's compare the idioms for C++, Go and Python and see what we can learn.

Python functions can return via returned result, or via a raised exception. One way to cleanup a resource is via try/finally clause based on the exception handling syntax of try/except:

def write():
    f = open("myfile", "w")
    try:
        f.write("hello")
    finally:
        f.close()

The code in the finally block will always be called regardless of whether the try block returned, raised an exception, or execution continues. (Python also has a more modern with idiom that I'm going to ignore for brevity's sake.)

Go lacks exceptions, so there is no exception syntax to build on. Instead, Go provides a defer statement that schedules a cleanup function to be run when the main function returns.

func write() {
    f, err := os.Open("myfile")
    if err != nil {
        return
    }
    defer f.Close()
    f.WriteString("hello")
}

Python has a similar facility implemented as library code in the unittest.TestCase class, where you can register cleanup functions for a test:

class MyTest(TestCase):
    def test_files(self):
        f = open("/tmp/myfile")
        # f.close() will be called after test finishes:
        self.addCleanup(f.close)
        # etc.

While try/finally could be used, failed tests are indicated by raising an AssertionError exception. This means any test that wants to cleanup multiple resources will be forced to have many nested try/finally clauses, which is the likely motivation for having the TestCase.addCleanup API.

The C++ idiom is very different, relying on class destructors: we construct a File class whose destructor closes the file, and then allocate the File object on the stack when we use it. When the function returns the File instance on the stack is destroyed, and therefore its destructor is called and the underlying file is closed.

class File {
public:
    File(const char* filename):
        m_file(std::fopen(filename, "w")) {
    }

    ~file() {
        std::fclose(m_file);
    }
// etc.
private:
    std::FILE* m_file;
// etc.
} ;

void write() {
  File my_file("myfile");
  my_file.write("hello");
}

Notice that this relies on deterministic deallocation of my_file: since it's on the stack, it will always be deallocated when the function ends. This mechanism cannot be used in Python or Go because they are garbage collected, and so there is no guarantee an object will be cleared from memory immediately. Python will close a file when it is garbage collected, but warns you that you should have closed it yourself:

$ python3 -Wall
>>> open("/etc/passwd", "rb")
<_io.BufferedReader name='/etc/passwd'>
>>> 1 + 2  # There's decent chance file will get GC'd now, and indeed:
__main__:1: ResourceWarning: unclosed file <_io.BufferedReader name='/etc/passwd'>
3

What have we learned from all this?

  • Languages without garbage collection can potentially rely on deterministic object destruction to cleanup resources.
  • Languages with exceptions can always support resource cleanup via exception handlers plus success case cleanup, or perhaps more simply via related syntax as in Python or Java.
  • Scheduled clean up functions can be a language feature as in Go, or a library feature available in any programming language.

You can now apply this knowledge to the next new programming language you learn.

Comparing a specific task can help you gain technical depth in other areas as well. And if you're learning a new technology comparing tasks with technologies you already know will help you learn the new technology that much faster. A good task is easy but not completely trivial: adding integers doesn't differ much between programming languages, so comparing it won't teach you anything interesting. For databases you might compare "how would I allocate a unique id to a newly created record?" or "how can I safely increment a counter from multiple clients?" And ideally you should compare more than two technologies, since there's almost always more than two solutions to any problem.

Got any questions on increasing technical depth? Send me an email and let me know.

March 10, 2016 05:00 AM

March 07, 2016

Itamar Turner-Trauring

The most important thing to do when asking for help

Asking for help is an admission you don't know everything, and an imposition on the person you're asking. It's also key to solving problems and learning to be a better developer. How should you ask for help in the most effective way? In order to come up with a good answer I asked my wife, a much more experienced and skilled engineering manager than I am. And having asked for help, I learned something new: the most important thing to do when asking for help.

Why is asking for help worth doing right? Whether you're a novice or an expert, asking for help is an opportunity for feedback that should not be wasted. You will learn more and faster with good feedback than you could ever learn on your own. And asking for help means interrupting someone else's work: if you're going to do that, best to do it in a way that helps both you and them.

The most important thing you should do when you ask for help: present some potential solutions to the problem. They don't have to be good solutions, nor do they have to be complete. They should just be whatever you can manage: big or small, vague or specific, likely or unlikely.

Providing solutions along with the question helps you by encouraging you to practice problem solving. Instead of saying "I'm stuck, help" you're forced to see how far you can get solving things on your own. You might only get a little bit further down the path to a solution, but every little bit helps improve your skills. If you get a compilation error talking about a missing file, you could just go ask someone "why?" But the need to provide a minimal solution or two suggests further avenues of investigation: "I need to install something", "the configuration is broken." Following up on those ideas will probably teach you something new, even if it doesn't always solve the problem.

Providing potential solutions also helps the person answering the question. They will learn what you do and do not know, since your solutions will demonstrate your level of understanding. They will allow them to correct any misconceptions you have or provide additional information that you're missing. And in cases where they don't immediately know the answer your potential solutions will give them more context to think about and collaboratively solve the problem with you.

Next time you're asking for help make sure you provide some ideas for solving the problems. You will learn more, and the person answering your question will be able to teach you more and help you more effectively. Think there's something more important to do when asking for help? Send me an email and let me know.

March 07, 2016 05:00 AM

March 03, 2016

Itamar Turner-Trauring

From rules to skills

Becoming a better software engineer means you need to go beyond just following rules. That means learning principles, like the fact you only have a limited ability to keep details in your head. But you also need to learn new skills. Some of these skills are tied to particular technologies: applying new tools, learning new languages. But there are other, less obvious skills you will need to learn.

One that I've already talked about is knowing when to ask for help, part of the larger skill of learning. And if you read that post closely you'll notice the reliance on other skills, like estimation and planning.

Another critical skill for any programmer is writing: taking a vague notion and turning into a reasoned argument. Writing, as William Zinsser said, is thinking on paper, though these days it's much more likely to be on a computer. There are many books on writing, so to kick off what I hope to be a series of book reviews I'm going to be reviewing a particularly excellent one next week. I will be posting book reviews to the mailing list first and only later to to the blog, so if you want to read the review without delay you can subscribe now.

March 03, 2016 05:00 AM

March 02, 2016

Itamar Turner-Trauring

Stuck on a problem? Here's when to ask for help

When you're stuck and can't solve a problem you need to decide when to ask to help. Part of becoming a better programmer is learning how to solve problems on your own, but at the same time you also want to learn from others. Wait too long and you've wasted your time; wait too little and you won't learn as much. The key is knowing how to steer between these two extremes.

It's worth emphasizing that when you get help does actually impact your learning. "How Learning Works" is an excellent book covering its title subject which I intend to review at some point. The book points to research that suggests getting immediate feedback impedes learning; a summary is available in this paper. So if you're trying to maximize learning you should not ask for help immediately.

On the other hand, merely spending a long time trying to solve a problem won't necessarily teach you much either. In many cases the information you need lives only in someone's head. Even if you are telepathic, reading someone's mind is merely an impolite method for asking someone's help: better to just ask directly. And when you could concievably solve something on your own it is often much faster to get help. Another set of eyes will quickly catch a typo your brain is filtering out.

When then should you ask? Depends how much progress you've made. In order to answer that question reasonably you therefore need to do some advance preparation, gathering information before you begin your task:

  1. You will need a vague estimate of how long the task ought to take. If you're a novice in the area it's best to get an estimate from an expert; ask them to estimate how long it would take them, then quadruple the estimate. If the expert says one hour then assume it will take you four. If you know what you're doing you can come up with an estimate on your own.
  2. You should also plan out your task: the necessary steps to achieve it, what order to do them in, and which steps you expect to be easier or harder. If you don't know how to do that then again ask for feedback from an expert. But don't just ask for a plan: also ask them to explain how they came up with the plan.

Given an estimate and a plan, if you eventually get stuck you will have some sense of how much progress you have made. The more progress you've made according to estimate and the planned steps the more time you should give yourself to solve the problem on your own. If you're on the second hour of your four hour estimate and you're still stuck on the first out of five steps then you should probably ask for help. If you're stuck on the last step of the plan and you have plenty of time to spare then it's worth trying to solve it on your own. Go for a walk, go home if it's the end of the day, try a different debugging method, etc..

How you ask for help is also important. Asking "how might you go about solving this problem" will help you learn much more than merely asking "what should I do next?" And it's often better to sit down together with a peer and go over the problem and how to solve it than to just ask a question and go back to your desk.

In short: estimate your work, make a plan, measure your progress, and ask for help when being stuck is noticeably impeding your progress. Stuck right now? Send me an email and tell me about it.

March 02, 2016 05:00 AM

March 01, 2016

Itamar Turner-Trauring

Don't memorize, automate!

Software projects are full of rules that you must memorize:

  • "Do not modify this file, as it is automatically generated."
  • "If you modify the SchemaVersion field in this module you must also edit config/schema_version.ini."
  • "All classes and functions must be documented."

But human beings are very bad at remembering things, especially if they're sleep-deprived parents like me. A moment of distraction and you've forgotten to write an upgrade script for the database schema... and you might only find out after the production upgrade fails. What to do? Automate!

Most of the time we think of testing as testing of the product we're developing. E.g. you might want to make sure your website operates as expected. But sometimes we want to test the software development process. When the instructions in the source code are addressed to you, the developer, that suggests that those are process requirements. The user doesn't care how well-documented your code is, they just want the website to work.

When you see process instructions there are three approaches you can take, from best to worst:

  1. Automate out of existence. If you have a coding standard that requires certain formatting, use a tool that automatically formats your code appropriately. The Go programming language ships with gofmt, Python has third-party utilities like autopep8, etc..
  2. Test for problems and notify the developer when they are found. If your coding standard requires docstrings (or Javadoc comments, etc.) you can't have software create them: a human must do so. You can however test for their absence using lint tools, which can be run as part of your test suite. When you add a missing docstring the lint tool will automatically stop complaining.
  3. Automatically remind yourself of a necessary step. After you've taken the necessary actions you manually indicate you have done so.

Let's look in detail at the third category of automation since it's less obvious when or how to implement it. Imagine you have some source file which needs to be reviewed for legal compliance every time it is changed. In cases like this the necessary action you have to take is very hard to check automatically: how would your software know whether you've talked to the company lawyer or not? You can however remind yourself or whoever is making changes that they need to do so:

$ py.test test_importantfile_changed.py
1 passed in 0.00 seconds

$ echo "I CHANGED THIS" >> importantfile.txt

$ py.test test_importantfile_changed.py
========================= FAILURES =========================
E           AssertionError: 
E           'importantfile.txt' has changed!
E           Please have Legal Compliance approve these
E           changes, then run 'python compliance.py'.
1 failed in 0.00 seconds

$ python compliance.py
Hash updated.

$ py.test test_importantfile_changed.py
1 passed in 0.00 seconds

The implementation works by storing a hash of the file alongside it. Whenever the tests are run the stored hash is compared to a newly calculated hash; if they differ an error is raised. Here's what compliance.py looks like:

from hashlib import md5

SCHEMA_FILE = "importantfile.txt"
HASH_FILE = SCHEMA_FILE + ".md5"

def stored_hash():
    with open(HASH_FILE) as f:
        return f.read()

def calculate_hash():
    hasher = md5()
    with open(SCHEMA_FILE) as f:
        for line in f:
            hasher.update(line)
    return hasher.hexdigest()

if __name__ == '__main__':
    # Update the stored hash if called as script:
    with open(HASH_FILE, "w") as f:
        f.write(calculate_hash())
    print("Hash updated.")

And here's what the test looks like:

from compliance import stored_hash, calculate_hash

def test_schema_changed():
    if stored_hash() != calculate_hash():
        raise AssertionError("""
'importantfile.txt' has changed!
Please have Legal Compliance approve these
changes, then run 'python compliance.py'.
""")

Instead of trying to remember all the little requirements of your development process, you just need to remember one core principle: automation. So automate your process completely, or automate catching problems, or just automate reminders if you must. Automation is what software is for after all.

March 01, 2016 05:00 AM

February 27, 2016

Itamar Turner-Trauring

Confused by testing terminology?

Have you ever been confused by the term "unit testing," or heard it used in a way you didn't expect? Or wondered what exactly "functional testing" means? Have you come up with an excellent way to test your software, only to be disdainfully told that's not real unit testing? I believe testing is suffering from a case of confusing terminology, and I'd like to suggest a cure.

Consider a Reddit post I recently read where a programmer asked about "unit testing a web application." What they very clearly meant was "automated testing": how could they automate testing their application? Since they were using Python this usage might have been encouraged by the fact the Python standard library has a unittest module intended for generic automated testing. One of the answers to this question used unit testing in a different sense: automated tests for a self-contained unit code, and which only interact with in-memory objects. The answer therefore talked about details relevant to that particular kind of testing (e.g. mocking) without any regard to whether this was relevant to the use case in the original post.

Now you could argue that one or the other of these definitions is the correct one; our goal should be to educate programmers about the correct definition. But a similar confusion appears with many other forms of testing, suggesting a deeper problem. "Functional testing" might mean black box testing of the specification of the system, as per Wikipedia. At my day job we use the term differently: testing of interactions with external systems outside the control of our own code. "Regression testing" might mean verifying software continues to perform correctly, again as per Wikipedia. But at a previous company regression testing meant "tests that interact with the external API."

Why is it so hard to have a consistent meaning for these terms? I believe it's because we are often ideologically committed to testing as a magic formula for software quality. When a particular formula proves not quite relevant to our particular project our practical side kicks in and we tweak the formula until it actually does what we need. The terminology stays the same, however, even as the technique changes.

Imagine a web developer who is trying to test a HTTP-based interaction with very simple underlying logic. Their thought process might go like this:

  1. "Unit testing is very important, I must unit test this code."
  2. "But, oh, it's quite difficult to test each function individually... I'd have to simulate a whole web framework! Not to mention the logic is either framework logic or pretty trivial, and I really want to be testing the external HTTP interaction."
  3. "Oh, I know, I'll just write a test that sends an HTTP request and make assertions about the HTTP response."
  4. "Hooray! I have unit tested my application."

When they post about this on Reddit they are then scolded about how this is not really unit testing. But whether or not it's unit testing is irrelevant: it's useful testing, and that's what really matters! How then can we change our terminology to actually allow for meaningful conversations?

We must abandon our belief that particular techniques will magically result in High Quality Code™. There is no universal criteria for code quality; it can only be judged in the context of a particular project's goals. Those goals should be our starting point, rather than our particular favorite testing technique. Testing can then be explained as a function of those goals.

For example, imagine you are trying to implement realistic looking ocean waves for a video game. What is the goal of your testing? "My testing should ensure the waves look real." How would you do that? Not with automated tests. You're going to have to look at the rendered graphics, and then ask some other humans to look at it. If you're going to name this form of testing you might call it "looks-good-to-a-human testing."

Or consider that simple web application discussed above. They might call what they're doing "external HTTP contract testing." It's more cumbersome than "unit testing," "end-to-end testing," "automated testing", or "acceptance testing"... but so much more informative. There might eventually also be a need to test the public API of the library code the HTTP API relies on, if it grows complex enough or is used by other applications, That testing would then be "library's public API contract testing." And if they told you or me about we would know why they were testing, and we'd have a pretty good idea of how they were doing the testing.

So next time you're thinking or talking about testing don't talk about "unit testing" or "end-to-end testing." Instead, talk about your goals: what the testing is validating or verifying. Eventually you might reach the point of talking about particular testing techniques. But if you start with your goals you are much more likely both to be understood and to reach for the appropriate tools for your task.

Agree? Disagree? Send me an email with your thoughts.

February 27, 2016 05:00 AM

Want to learn more? Subscribe to my mailing list

If you've enjoyed or learned from my posts over the past week, you may wish to subscribe to my mailing list. Tomorrow I'll be sending subscribers some additional related articles and videos, including:

  • A classic lectures on how to solve hard problems,
  • An example of why following the rules is a good idea, even as you learn to break them,
  • A musical interlude, and more!

February 27, 2016 05:00 AM

February 24, 2016

Itamar Turner-Trauring

Still stuck at the end of the day?

Have you ever found yourself staring at your monitor late in the day, with no idea what's going on? Your code is broken, your tests won't pass, you don't understand how to implement the feature... there's always something. And there's always a deadline, too, and you may be tired but if you just work one more hour maybe you'll figure it out. You can probably guess what I think of that.

Go home. Shut off your computer, stop thinking about it, and just go home.

If you've read my previous post you'll recognize the relevant principle: pay attention to your body. At the end of the day you're tired, having spent a whole day working and thinking. For simple problems what would take you two hours to solve in the evening will take you only ten minutes the next morning. As for hard problems, you probably won't be able to solve those at all without a good night's sleep.

What happens if you keep at it when tired? Two hours later you've finally solved the problem but you are now even more tired, and you've used up half your evening. Chances are you're going be tired the next day, and therefore less productive.

"But," you might argue, "my boss! They said this had to ship today!" To which I say: if a cat laid eggs it would be a hen. Just because they want something to happen doesn't mean it's possible. If it really is an emergency you can try to push through and pay the cost in efficiency. The fact that it's important does not guarantee success; the odds are still against you. And, honestly, it's almost never a real emergency.

There's almost a century of evidence suggesting that on average worker output is maximized with a 40-hour work week. Your personal limits might be higher or lower, but if you're going to do your job effectively you need to stay below that limit. So go home, take a break, and come back refreshed the next day. And if your employer is deliberately reducing your efficiency by requiring long hours? Then maybe it's time to look for a new job.

Got any comments? Send me an email!

February 24, 2016 05:00 AM

February 23, 2016

Itamar Turner-Trauring

Some principles for better coding

In my previous post I showed you how a good rule ("always write tests") can be wrong in certain cases, and for each individual case I gave some alternative suggestions. How did I go about coming up with this exception to the rules? And more importantly, how can you do so?

Underlying the rules you have been taught are principles. A principle is a general guideline, whereas a rule will tell you exactly what to do. Knowing when and how to apply a principle requires more thought than simply following a rule. I plan to discuss this at some point in the future. At the moment however I'll limit myself to listing some of the principles underlying my last post:

  • You are more than just a rational thinker. Pay attention to your body: sometimes coding tasks are hard because you need a snack, or some sleep. Don't ignore your emotions. Instead use them as a guide for more structured and logical forms of thought.
  • You have a limited ability to keep details in your head. Computers do not suffer from this problem, so use them to compensate for your limitations.
  • Whatever problem you're having has already been solved in some form or another by someone else. We can all stand on the shoulders of giants, we just need to take the time to find the right one to stand on.
  • The goals of your project are the only way to measure what software quality means. If you're throwing away your code in an hour you probably have different measure of quality than code that will be used for a decade.

There are many more principles, of course, but even internalizing just these four will make you a better programmer. In my next few posts I will discuss other applications of these principles so you can see them used in other contexts.

Meanwhile, if you re-read my last post can you see how I applied these principles? Are there any other principles you can deduce? Send me an email with any ideas or questions you have on the subject.

February 23, 2016 05:00 AM

February 21, 2016

Itamar Turner-Trauring

What should you do when testing is a pain?

Have you ever sat down to test your code and realized it's going to be an utter pain? It may be too difficult, boring, or repetitive, it may just be too confusing... whatever the problem is you just can't get yourself motivated to write those tests. What should you do? The easy answer: stop procrastinating and just follow the rules. Testing is good for you, like eating boiled brussels sprouts. But the easy answer is often the wrong answer. Boiled brussels sprouts are inedible; roasted brussels sprouts are delicious. Maybe there's another way to achieve your goals.

If you're unhappy it's worth examining why you feel that way. The answer may help you solve the problem. Perhaps you just need to take a break, or have a snack, or even take a vacation. Other times it's the nature of the task itself that is the problem. Let's look at some reasons why testing can be painful and see what solutions they suggest.

"This is going to be soooooo boring."

If writing the tests is boring that is likely because you're writing many repetitive, similar tests. You might be trying to write tests to cover all the possible edge cases in an algorithm. Luckily there is a tool that is good at repetitive tasks: software. Testing tools that automatically generate test inputs can simplify your tests and find edge cases you wouldn't have thought of yourself. Quickcheck deriviatives are available for many programming languages. Personally I've had great success using Hypothesis, a QuickCheck-inspired library for Python.

"I'm not sure what the code should do."

This is a common problem when doing test-driven or test-first development, where tests are written before code. But if you're not sure what the code should look like you shouldn't be writing the tests yet. Instead you should code a test-less prototype to get a better sense of the API and structure of the code. The prototype doesn't have to be correct or even work at all: its purpose is to help you see the shape of the API. Once you've got a better understanding of the API writing the tests will prove easier.

"I've found a bug but it's too difficult to reproduce with a test."

If you find a race condition in threaded code it's often impossible to prove you've fixed the problem with a reproducable test. And one race condition suggests there may be more. The solution: stress testing, running your code under high load in hopes of catching problems. Stress testing won't prove that your code is bug-free, but at least it will catch some problems.

"This is going to take far too much work to test."

First, ask someone for help: perhaps a colleague will have an idea. Next, look for a tool that will help you with your particular testing needs. Testing a web application? Lots of tools for automating that. Testing a Unix terminal application? Some people have at least thought about it.

What to do if you're still stuck? If the code is important enough, if getting it wrong will get someone killed or destroy your company, you should probably just pay the cost for testing. But usually that is not the case and other constraints apply: you need to ship your software and don't have enough time to figure out or build a reliable test. Some options:

  1. Check if the code is covered some other way. An end-to-end test may exercise large amounts of glue code that is quite painful to test on its own. If the end-to-end test fails then debugging the cause may be painful, but at least you have some coverage.
  2. Manually test the change. This doesn't guarantee it will continue to work in the future, but at least you know it works now.
  3. Give up. Sometimes testing your code is just too expensive given your project's goals.

Paying attention to your pain and thinking about it can save you a whole lot of unnecessary work. What do you think? Have I missed any other cases where unpleasant testing can be avoided? Send me an email and let me know.

February 21, 2016 05:00 AM

Do you want to code without rules?

When you first learned to code you followed the rules you were taught. Hopefully they were good rules: always write tests, follow the coding standard, don't trust unvalidated input. But as your skills grew the rules started to feel wrong sometimes; you started noticing their limitations and caveats. What comes next?

After following the rules the next step is breaking the rules. You'll begin to sense the underlying patterns and principles: why the rules exist and when it's OK to ignore them. Sometimes it's OK to cross the street during a red light if there are no cars coming. Sometimes you don't have to write tests.

Eventually you will go beyond rules: you will have internalized the principles and understand tradeoffs and options automatically. You won't think about what to do, you'll just know. Then it's time for a new set of rules as you discover a higher-level set of problems and solutions.

Learning when to break the rules is hard on your own, let along going beyond rules: you have to discover the unstated underlying principles. Learning from experts can also be difficult. They often assume the principles that you can't yet see, offering seemingly contradictory advice that you don't know how to apply.

Do you want to code without rules? Then keep reading, as I try to show you when to break the rules, and more importantly why.

February 21, 2016 05:00 AM

February 17, 2016

Hynek Schlawack

Python 3 in 2016

My completely anecdotal view on the state of Python 3 in 2016. Based on my own recent experience, observations, and exchanges with other members of the Python community.

by Hynek Schlawack (hs@ox.cx) at February 17, 2016 12:00 AM

Python 3 in 2016

My completely anecdotal view on the state of Python 3 in 2016. Based on my own recent experience, observations, and exchanges with other members of the Python community.

by Hynek Schlawack (hs@ox.cx) at February 17, 2016 12:00 AM

February 16, 2016

Glyph Lefkowitz

Monads are simple to understand.

You can just think of them like a fleet of mysterious inverted pyramids ominously hovering over a landscape dotted with the tombs of ancient and terrible gods. Tombs from which they may awake at any moment if they are “evaluated”.

The IO loop is then the malevolent personification force of entropy, causing every action we take to push the universe further into the depths of uncontrolled chaos.

Simple!

by Glyph at February 16, 2016 12:36 AM

February 11, 2016

Moshe Zadka

Learn Public Speaking: The Heart-Attack Way

I know there are all kinds of organizations, and classes, that teach public speaking. But if you want to learn how to do public speaking the way I did, it’s pretty easy. Get a bunch of friends together, who are all interested in improving their skills. Meet once a week.

The rules are this: one person gets to give a talk with no more props than a marker and a whiteboard. If it’s boring, or they hesitate, another person can butt in, say “boring” and give a talk about something else. Talks are to be prepared somewhat in advance, and to be about five minutes long.

This is scary. You stand there, at the board, never knowing when someone will decide you’re boring. You try to entertain the audience. You try to keep their interest. You lose your point for 10 seconds…and you’re out. It’s the fight club of public speaking. After a few weeks of this, you’ll give flowing talks, and nothing will ever scare you again about public speaking — you’ll laugh as you realize that nobody is going to kick you out.

Yeah, Israel was a good training ground for speaking.

by moshez at February 11, 2016 04:52 PM

February 10, 2016

Glyph Lefkowitz

This is an experiment with a subtly different format.

Right now when I want to say something quickly, I pop open the Twitter app and just type it. But I realized that the only reason I'm doing this rather than publishing on my own site is a UI affordance: Twitter lets me hit two keys to start composing, ⌘N and then two keys to finish, ⌘Return. Also, to tweet something, I don't need to come up with a title.

So I added an Emacs minor-mode that lets me hit a comparable number of keys; translated into the Emacs keyboard shortcut idiom it is of course Meta-Shift-Control-Hyper-C Backflip-LeftPedal-N to create a post and Bucky-Reverse-Erase-x Omega-Shift-Epsilon-j to publish it. Such posts will be distinguished by the presence of the "microblog" tag and the empty title.

(Also, the sky's the limit in terms of character-count.)

Feel free to let me know if you think the format works or not.

by Glyph at February 10, 2016 04:29 AM

Hynek Schlawack

hasattr() – A Dangerous Misnomer

Don’t use Python’s hasattr() unless you’re writing Python 3-only code and understand how it works.

by Hynek Schlawack (hs@ox.cx) at February 10, 2016 12:00 AM

February 05, 2016

Twisted Matrix Laboratories

Jan '16 SFC Sponsored Development (HawkOwl)

Hi everyone!

The Twisted Fellowship, for this time, has come to an end, and as such, here is my final report.

Tickets reviewed/merged:
  • #8140 (merged): DummyRequest is out of sync with the real Request
  • #7943 (reviewed, merged): Remove usage of microdom
  • #8132 (reviewed, merged): Port twisted.web.vhost to Python 3
  • #7993 (reviewed, merged): Port twisted.web.wsgi to Python 3
  • #8173 (merged): Twisted does not have a code of conduct

Tickets triaged:
  • (braid) #66 - Drop the Fedora 17 and Fedora 18 builders
  • (braid) #22 - Properly install twisted's trac plugins
  • - #7813 (closed, already done) - twisted.trial.test.test_doctest should be ported to python3
Tickets worked on:
  • (braid) #168 (done on branch) - Install Trac GitHub plugin
  • (braid) #169 (done on branch) - In 'staging' GitHub repo add the webhook to poke the 'staging' Trac
  • (braid) #167 (done on branch) - Create staging Trac
  • (braid) #139 (done on branch) - Manage git mirror repo used by Trac using braid
  • (braid) #1 (done on branch) - Use virtualenvs in deployment
  • (braid) #164 (work in progress) - Migrate Twisted SVN accounts to GitHub twisted/twisted
  • (braid) #178 (work in progress) - Migrate to BuildBot Nine
  • (braid) #142 (work in progress) - Migrate IRC announcements from Kenaan
  • (braid) #185 (merged) - Add dns entry for staging.twistedmatrix.com
Even though we didn't get the Git migration done in time for the end of the fellowship, I am happy to report that it is in a much closer and much better known state than before. If you would like to assist in getting some of the work done above reviewed and merged, drop by https://github.com/twisted-infra/braid/pulls !

- Amber

by HawkOwl (noreply@blogger.com) at February 05, 2016 04:37 PM

February 02, 2016

Twisted Matrix Laboratories

January 2016 - SFC Sponsored Development

This is my report for the work done in January 2016 as part of the Twisted Maintainer Fellowship program.
It is my last report of the Twisted Maintainer Fellowship 2015 program.

With this fellowship the review queue size was reduced and the review round-trips were done much quicker.
This fellowship has produced the Git/GitHub migration plan but has failed to finalize its execution.

Tickets reviewed and merged

* #7671 - It is way too hard to specify a trust root combining multiple certificates, especially to HTTP
* #7993 - Port twisted.web.wsgi to Python 3
* #8140 - twisted.web.test.requesthelper.DummyRequest is out of sync with the real Request
* #8148 - Deprecate twisted.protocols.mice
* #8173 - Twisted does not have a code of conduct
* #8180 - Conch integration tests fail because DSA is deprecated in OpenSSH 7.
* #8187 - Use a less ancient OpenSSL method in twisted.test.test_sslverify

Tickets reviewed and not merged yet

* #7889 - replace win32api.OpenProcess and win32api.FormatMessage with cffi
* #8150 - twisted.internet.ssl.KeyPair should provide loadPEM
* #8159 - twisted.internet._win32serialport incompatible with pyserial 3.x
* #8169 - t.w.static.addSlash does not work on Python 3
* #8188 - Advertise H2 via ALPN/NPN when available.

Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at February 02, 2016 01:18 PM

February 01, 2016

Moshe Zadka

Docker: Are we there yet?

Obviously, we know the answer. This blog is intended to allow me to have an easy place to point people when they ask me “so what’s wrong with Docker”?

[To clarify, I use Docker myself, and it is pretty neat. All the more reason missing features annoy me.]

Docker itself:

  • User namespaces — slated to land in February 2016, so pretty close.
  • Temporary adds/squashing — currently “closed” and suggests people use work-arounds.
  • Dockerfile syntax is limited — this is related to the issue above, but there are a lot of missing features in Dockerfile (for example, a simple form of “reuse” other than chaining). No clear idea when it will be possible to actually implement the build in terms of an API, because there is no link to an issue or PR.

Tooling:

  • Image size — Minimal versions of Debian, Ubuntu or CentOS are all unreasonably big. Alpine does a lot better. People really should move to Alpine. I am disappointed there is no competition on being a “minimal container-oriented distribution”.
  • Build determinism — Currently, almost all Dockerfiles in the wild call out to the network to grab some files while building. This is really bad — it assumes networking, depends on servers being up and assumes files on servers never change. The alternative seems to be checking big files into one’s own repo.
    • The first thing to do would be to have an easy way to disable networking while the container is being built.
    • The next thing would be a “download and compare hash” operation in a build-prep step, so that all dependencies can be downloaded and verified, while the hashes would be checked into the source.
    • Sadly, Alpine linux specifically makes it non-trivial to “just download the package” from outside of Alpine.

 

by moshez at February 01, 2016 05:24 AM

January 27, 2016

Moshe Zadka

Learning Python: The ecosystem

When first learning Python, the tutorial is a pretty good resource to get acquainted with the language and, to some extent, the standard library. I have written before about how to release open source libraries — but it is quite possible that one’s first foray into Python will not be to write a reusable library, but an application to accomplish something — maybe a web application with Django or a tool to send commands to your TV. Much of what I said there will not apply — no need for a README.rst if you are writing a Django app for your personal website!

However, it probably is useful to learn a few tools that the Python eco-system engineered to make life more pleasant. In a perfect world, those would be built-in to Python: the “cargo” to Python’s “Rust”. However, in the world we live in, we must cobble together a good tool-chain from various open source projects. So strap in, and let’s begin!

The first three are cribbed from my “open source” link above, because good code hygiene is always important.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox.

Set tox to put all build artifacts in a build/ top-level directory.

Pex

A tox test-environment of “pex” should result in a Python EXectuable created and put somewhere under “build/”. Running your Python application to actually serve web pages should be as simple as taking that pex and running it without arguments. BoredBot shows an example of how to create such a pex that includes a web application, a Twisted application and a simple loop/sleep-based application. This pex build can take a requirements.txt file with exact dependencies, though it if it is built by tox, you can inline those dependencies directly in the tox file.

Collaboration

If you do collaborate with others on the project, whether it is open source or not, it is best if the collaboration instructions are as easy as possible. Ideally, collaboration instructions should be no more complicated than “clone this, make changes, run tox, possibly do whatever manual verification using ‘./build/my-thing.pex’ and submit a pull request”.

If they are, consider investing some effort into changing the code to be more self-sufficient and make less assumptions about its environment. For example, default to a local SQLite-based database if no “–database” option is specified, and initialize it with whatever your code needs. This will also make it easier to practices the “infinite environment” methodology, since if one file is all it takes to “bring up” an environment, it should be easy enough to run it on a cloud server and allow people to look at it.

by moshez at January 27, 2016 05:45 AM

January 17, 2016

Glyph Lefkowitz

Stop Working So Hard

Recently, I saw this tweet where John Carmack posted to a thread on Hacker News about working hours. As this post propagated a good many bad ideas about working hours, particularly in the software industry, I of course had to reply. After some further back-and-forth on Twitter, Carmack followed up.

First off, thanks to Mr. Carmack for writing such a thorough reply in good faith. I suppose internet arguments have made me a bit cynical in that I didn’t expect that. I still definitely don’t agree, but I think there’s a legitimate analysis of the available evidence there now, at least.

When trying to post this reply to HN, I was told that the comment was too long, and I suppose it is a bit long for a comment. So, without further ado, here are my further thoughts on working hours.

... if only the workers in Greece would ease up a bit, they would get the productivity of Germany. Would you make that statement?

Not as such, no. This is a hugely complex situation mixing together finance, culture, management, international politics, monetary policy, and a bunch of other things. That study, and most of the others I linked to, is interesting in that it confirms the general model of ability-to-work (i.e. “concentration” or “willpower”) as a finite resource that you exhaust throughout the day; not in that “reduction in working hours” is a panacea solution. Average productivity-per-hour-worked would definitely go up.

However, I do believe (and now we are firmly off into interpretation-of-results territory, I have nothing empirical to offer you here) that if the average Greek worker were less stressed to the degree of the average German one, combining issues like both overwork and the presence of a constant catastrophic financial crisis in the news, yes; they’d achieve equivalent productivity.

Total net productivity per worker, discounting for any increases in errors and negative side effects, continues increasing well past 40 hours per week. ... Only when you are so broken down that even when you come back the following day your productivity per hour is significantly impaired, do you open up the possibility of actually reducing your net output.

The trouble here is that you really cannot discount for errors and negative side effects, especially in the long term.

First of all, the effects of overwork (and attendant problems, like sleep deprivation) are cumulative. While productivity on a given day increases past 40 hours per week, if you continue to work more, you productivity will continue to degrade. So, the case where “you come back the following day ... impaired” is pretty common... eventually.

Since none of this epidemiological work tracks individual performance longitudinally there are few conclusive demonstrations of this fact, but lots of compelling indications; in the past, I’ve collected quantitative data on myself (and my reports, back when I used to be a manager) that strongly corroborates this hypothesis. So encouraging someone to work one sixty-hour week might be a completely reasonable trade-off to address a deadline; but building a culture where asking someone to work nights and weekends as a matter of course is inherently destructive. Once you get into the area where people are losing sleep (and for people with other responsibilities, it’s not hard to get to that point) overwork starts impacting stuff like the ability to form long-term memories, which means that not only do you do less work, you also consistently improve less.

Furthermore, errors and negative side effects can have a disproportionate impact.

Let me narrow the field here to two professions I know a bit about and are germane to this discussion; one, health care, which the original article here starts off by referencing, and two, software development, with which we are both familiar (since you already raised the Mythical Man Month).

In medicine, you can do a lot of valuable life-saving work in a continuous 100-hour shift. And in fact residents are often required to do so as a sort of professional hazing ritual. However, you can also make catastrophic mistakes that would cost a person their life; this happens routinely. Study after study confirms this, and minor reforms happen, but residents are still routinely abused and made to work in inhumane conditions that have catastrophic outcomes for their patients.

In software, defects can be extremely expensive to fix. Not only are they hard to fix, they can also be hard to detect. The phenomenon of the Net Negative Producing Programmer also indicates that not only can productivity drop to zero, it can drop below zero. On the anecdotal side, anyone who has had the unfortunate experience of cleaning up after a burnt-out co-worker can attest to this.

There are a great many tasks where inefficiency grows significantly with additional workers involved; the Mythical Man Month problem is real. In cases like these, you are better off with a smaller team of harder working people, even if their productivity-per-hour is somewhat lower.

The specific observation from the Mythical Man Month was that the number of communication links on a fully connected graph of employees increases geometrically whereas additional productivity (in the form of additional workers) increases linearly. If you have a well-designed organization, you can add people without requiring that your communication graph be fully connected.

But of course, you can’t always do that. And specifically you can’t do that when a project is already late: you already figured out how the work is going to be divided. Brooks’ Law is formulated as: “Adding manpower to a late software project makes it later.” This is indubitable. But one of the other famous quotes from this book is “The bearing of a child takes nine months, no matter how many women are assigned.”

The bearing of a child also takes nine months no matter how many hours a day the woman is assigned to work on it. So “in cases like these” my contention is that you are not “better off with ... harder working people”: you’re just screwed. Some projects are impossible and you are better off acknowledging the fact that you made unrealistic estimates and you are going to fail.

You called my post “so wrong, and so potentially destructive”, which leads me to believe that you hold an ideological position that the world would be better if people didn’t work as long. I don’t actually have a particularly strong position there; my point is purely about the effective output of an individual.

I do, in fact, hold such an ideological position, but I’d like to think that said position is strongly justified by the data available to me.

But, I suppose calling it “so potentially destructive” might have seemed glib, if you are really just looking at the microcosm of what one individual might do on one given week at work, and not at the broader cultural implications of this commentary. After all, as this discussion shows, if you are really restricting your commentary to a single person on a single work-week, the case is substantially more ambiguous. So let me explain why I believe it’s harmful, as opposed to merely being incorrect.

First of all, the problem is that you can’t actually ignore the broader cultural implications. This is Hacker News, and you are John Carmack; you are practically a cultural institution yourself, and by using this site you are posting directly into the broader cultural implications of the software industry.

Software development culture, especially in the USA, suffers from a long-standing culture of chronic overwork. Startup developers in their metaphorical (and sometimes literal) garages are lionized and then eventually mythologized for spending so many hours on their programs. Anywhere that it is celebrated, this mythology rapidly metastasizes into a severe problem; the Death March

Note that although the term “death march” is technically general to any project management, it applies “originally and especially in software development”, because this problem is worse in the software industry (although it has been improving in recent years) than almost anywhere else.

So when John Carmack says on Hacker News that “the effective output of an individual” will tend to increase with hours worked, that sends a message to many young and impressionable software developers. This is the exact same phenomenon that makes pop-sci writing terrible: your statement may be, in some limited context, and under some tight constraints, empirically correct, but it doesn’t matter because when you expand the parameters to the full spectrum of these people’s careers, it’s both totally false and also a reinforcement of an existing cognitive bias and cultural trope.

I can’t remember the name of this cognitive bias (and my Google-fu is failing me), but I know it exists. Let me call it the “I’m fine” bias. I know it exists because I have a friend who had the opportunity to go on a flight with NASA (on the Vomit Comet), and one of the more memorable parts of this experience that he related to me was the hypoxia test. The test involved basic math and spatial reasoning skills, but that test wasn’t the point: the real test was that they had to notice and indicate when the oxygen levels were dropping and indicate that to the proctor. Concentrating on the test, many people failed the first few times, because the “I’m fine” bias makes it very hard to notice that you are impaired.

This is true of people who are drunk, or people who are sleep deprived, too. Their abilities are quantifiably impaired, but they have to reach a pretty severe level of impairment before they notice.

So people who are overworked might feel generally bad but they don’t notice their productivity dropping until they’re way over the red line.

Combine this with the fact that most people, especially those already employed as developers, are actually quite hard-working and earnest (laziness is much more common as a rhetorical device than as an actual personality flaw) and you end up in a scenario where a good software development manager is responsible much more for telling people to slow down, to take breaks, and to be more realistic in their estimates, than to speed up, work harder, and put in more hours.

The trouble is this goes against the manager’s instincts as well. When you’re a manager you tend to think of things in terms of resources: hours worked, money to hire people, and so on. So there’s a constant nagging sensation for a manager to encourage people to work more hours in a day, so you can get more output (hours worked) out of your input (hiring budget). The problem here is that while all hours are equal, some hours are more equal than others. Managers have to fight against their own sense that a few more worked hours will be fine, and their employees’ tendency to overwork because they’re not noticing their own burnout, and upper management’s tendency to demand more.

It is into this roiling stew of the relentless impulse to “work, work, work” that we are throwing our commentary about whether it’s a good idea or not to work more hours in the week. The scales are weighted very heavily on one side already - which happens to be the wrong side in the first place - and while we’ve come back from the unethical and illegal brink we were at as an industry in the days of ea_spouse, software developers still generally work far too much.

If we were fighting an existential threat, say an asteroid that would hit the earth in a year, would you really tell everyone involved in the project that they should go home after 35 hours a week, because they are harming the project if they work longer?

Going back to my earlier explanation in this post about the cumulative impact of stress and sleep deprivation - if we were really fighting an existential threat, the equation changes somewhat. Specifically, the part of the equation where people can have meaningful downtime.

In such a situation, I would still want to make sure that people are as well-rested and as reasonably able to focus as they possibly can be. As you’ve already acknowledged, there are “increases in errors” when people are working too much, and we REALLY don’t want the asteroid-targeting program that is going to blow apart the asteroid that will wipe out all life on earth to have “increased errors”.

But there’s also the problem that, faced with such an existential crisis, nobody is really going to be able to go home and enjoy a fine craft beer and spend some time playing with their kids and come back refreshed at 100% the next morning. They’re going to be freaking out constantly about the comet, they’re going to be losing sleep over that whether they’re working or not. So, in such a situation, people should have the option to go home and relax if they’re psychologically capable of doing so, but if the option for spending their time that makes them feel the most sane is working constantly and sleeping under their desk, well, that’s the best one can do in that situation.

This metaphor is itself also misleading and out of place, though. There is also a strong cultural trend in software, especially in the startup ecosystem, to over-inflate the importance of what the company is doing - it is not “changing the world” to create a website for people to order room-service for their dogs - and thereby to catastrophize any threat to that goal. The vast majority of the time, it is inappropriate to either to sacrifice -- or to ask someone else to sacrifice -- health and well-being for short-term gains. Remember, given the cumulative effects of overwork, that’s all you even can get: short-term gains. This sacrifice often has a huge opportunity cost in other areas, as you can’t focus on more important things that might come along later.

In other words, while the overtime situation is complex and delicate in the case of an impending asteroid impact, there’s also the question of whether, at the beginning of Project Blow Up The Asteroid, I want everyone to be burnt out and overworked from their pet-hotel startup website. And in that case, I can say, unequivocally, no. I want them bright-eyed and bushy-tailed for what is sure to be a grueling project, no matter what the overtime policy is, that absolutely needs to happen. I want to make sure they didn’t waste their youth and health on somebody else’s stock valuation.

by Glyph at January 17, 2016 04:06 AM