Planet Twisted

December 10, 2016

Moshe Zadka

On Raising Exceptions in Python

There is a lot of Python code in the wild which does something like:

raise SomeException("Could not fraz the buzz:"
                    "{} is less than {}".format(foo, quux)

This is, in general, a bad idea. Exceptions are not program panics. While they sometimes do terminate the program, or the execution thread with a traceback, they are different in that they can be caught. The code that catches the exception will sometimes have a way to recover: for example, maybe it’s not that important for the application to fraz the buzz if foo is 0. In that case, the code would look like:

try:
    some_function()
except SomeException as e:
    if ???

Oh, right. We do not have direct access to foo. If we formatted better, using repr, at least we could tell the difference between 0 and “0”: but we still would have to start parsing the representation out of the string.

Because of this, in general, it is better to raise exceptions like this:

raise SomeException("Could not fraz the buzz: foo is too small", foo, quux)

This way exception handling has a lot of power: it can introspect foo, introspect quux and introspect the string. If by some reason the exception class is raised and we want to verify the reason, checking string equality, while not ideal, is still better than trying to match string parts or regular expression matching.

When the exception is presented to the user interface, in that case, it will not look as nice. Exceptions, in general, should reach the UI only in extreme circumstances: and in those cases, having something that has as much information is useful for root cause analysis.

by moshez at December 10, 2016 04:42 AM

December 08, 2016

Itamar Turner-Trauring

Don't Get Stuck: 6 ways to get unstuck and code faster

A lot of my time as a programmer, and maybe yours as well, is spent being stuck. My day often goes like this:

  1. Write some code.
  2. Run the tests.
  3. "It failed."
  4. "Why did it fail?"
  5. "I don't know."
  6. "That makes no sense."
  7. "Seriously, what?"
  8. "That's impossible."
  9. "Lets add a print statement here."
  10. "And maybe try poking around with the debugger."
  11. "Oh! That's it!"
  12. "No wait, it isn't."
  13. "Ohhhhhhhh there we go."
  14. Run the tests.
  15. Tests pass.
  16. "Time for snacks!"

Given how much time you can end up wasting in this mode, Kaitlin Duck Sherwood points out that not getting stuck is one of the keys to programmer productivity. Imagine how much time I could save if I skipped steps 5 to 13 in the transcript above.

Not getting stuck will make you far more productive. Here are six ways to keep you from getting stuck:

Recognize when you're stuck

If you don't know you're stuck then you can't do anything about it, so the first step is having some way of measuring progress. Whenever you start a coding task you should have a time estimate in mind.

The time estimates should be short, no more than a few hours or a day, so a bigger project should be broken up into smaller tasks. The estimates don't have to particularly accurate, they just have to be in the right range: a task you estimate at a few hours should not require days of work.

Given an estimate of 4 hours, say, you can then recognize whether you're actually stuck:

  • If it's 10 minutes in and you have no idea what to do, then that's fine, there's plenty more time.
  • If you're 2 hours in and you haven't produced anything, then it's pretty clear you're stuck and need to make some changes.

Comparing actual time spent to the initial estimate tells you if you're making progress, and working in small chunks ensures you recognize problems quickly.

Ask for help

Now that you've recognized you're stuck, the next thing to do is find a solution. The easiest thing to do is talk to a colleague.

This is helpful in multiple ways:

  • You're forced to restate the problem in a way someone else can understand. Sometimes this is sufficient to help you solve the problem, even before they get to answering you.

In fact, this is so useful that sometimes you don't need a person, and talking to a rubber duck will do. I like debugging out loud for this reason, so occasionally I use a #rubberduck Slack channel so I don't distract my coworkers.

  • Your colleague may have an idea you don't, especially if they're experienced. For example, recently I was utterly confused why Java thought that assertThat(0.0, isEqual(-0.0)) was a failing test; it claimed 0.0 wasn't the same as -0.0.

Eventually I shared my confusion, and someone noted expression relies on Double.equals(), and then went and found the Double.equals() API documentation. And indeed, the documentation notes that new Double(0.0).equals(new Double(-0.0)) is false even though in Java 0.0 == -0.0 is true, because reasons.

Use or copy an existing solution

If you or your colleague can't find a solution on your own, you can try using some one else's solution. This can range from the copy/paste-from-StackOverflow fallback (but be careful, sometimes StackOverflow code is wrong or insecure) to copying whole designs.

For example, I built a multicast data distribution protocol. This is not a trivial thing to do, so I found a research paper and copied their design and algorithm. Designing such an algorithm would have been far beyond my abilities, but I didn't have to: someone smarter and more knowledgeable had done the hard work.

Find a workaround

Sometimes you're faced with an important bug you can't fix. Working around it may suffice, however, as you can see in this story shared by reader James (Jason) Harrison:

Several years ago, I was working many late nights on a new Wii game that was going to have gesture recognition. The first part of the game activities went as smoothly as could be expected and then we came to a new level where the player was supposed to bring the Wiimote up and then down quickly. This must have tripped on a bug in the system because this gesture could not be reliably recognized.

Replaying recorded motions demonstrated that the problem wasn’t “just” in the data form the Wiimote or in how the player made the motion but in the system. Instead of being deterministic, the system would work then not work. Looked for variables that were not being initialized, data buffers not being cleared, and all state that could possibly leak from previous inputs.

Unfortunately, all of the searching didn’t find the problem in time. So it was decided to reset the recognition system between inputs. While wasteful, it was the only thing that did fix the system and let us ship the milestone to the publisher. Left a comment in to find the problem later. Never did find it. Game was shipped with this fix.

Drop the feature

If you're working on a feature and it's taking forever, maybe it's time to consider dropping it. Can it wait until the next release? Do you actually need it?

A feature usually lands on the requirements list for a reason, it's true. But a surprising number of features are in the nice-to-have category, especially when it's taking far too long to implement them. If other approaches have failed to get you unstuck, find the person in charge and give them the choice of shipping late or dropping the feature.

Redefine the problem

Finally, another approach you can take is redefining the problem. The way you're approaching the problem may be making it needlessly complicated to solve, so revisiting the problem statement can help get you unstuck.

You can redefine the problem by relaxing some of the constraints. For example, maybe you're having a hard time finding a date picker that matches the website design. If the problem statement is "add a usable, good looking, date picker that matches our website style" then you might spend a while looking and not finding any that are quite right.

Often you can redefine the problem, though, to "find a minimal date picker so we can demo the feature and see if anyone cares." With that more limited scope you can probably use one of the options you initially rejected.

You can also redefine the problem by figuring out the real problem lies elsewhere. Havoc Pennington has a wonderful story about the dangerous "UI team": they will feel their mandate is to build UIs. But software that doesn't have a UI and "just works" is an even better user experience, if you can manage it.

In short, to keep from getting stuck you should:

  1. Break all your work up into small chunks.
  2. Estimate each chunk in advance, and pay attention to your progress against the estimate.
  3. When you recognize you are stuck: ask for help, copy an existing solution, find a workaround, drop the feature or redefine the problem.

I've learned most of this the hard way, over the course of 20 years of being stuck while writing software. If you'd like to avoid the many mistakes I've made as a software engineer during that time, both coding and in my career, sign up for my Software Clown newsletter. You'll get the story of one of my mistakes in your inbox every week and how you can avoid making it.

Avoid my programming mistakes!

Get a weekly email with one of my many software and career mistakes, and how you can avoid it. Here's what readers are saying:

"Are you reading @itamarst's "Software Clown" newsletter? If not, you should be. There's a gem in every issue." - @glyph

"Definitely subscribe if you want to learn some things that Itamar learned the hard way." -- Victor Algaze

I won't share your email with anyone else. Unsubscribe at any time. Powered by ConvertKit

December 08, 2016 05:00 AM

Moshe Zadka

Moshe’z Messaging Preferences

The assumption here is that you have my phone number. If you don’t have my phone number, and you think that’s an oversight on my part, please send me an e-mail at zadka.moshe@gmail.com and ask for it. If you don’t have my phone number because I don’t know you, I am usually pretty responsive on e-mail.

In order of preference:

by moshez at December 08, 2016 03:03 AM

December 05, 2016

Jp Calderone

Twisted Web in 60 Seconds: HTTP/2

Hello, hello. It's been a long time since the last entry in the "Twisted Web in 60 Seconds" series. If you're new to the series and you like this post, I recommend going back and reading the older posts as well.

In this entry, I'll show you how to enable HTTP/2 for your Twisted Web-based site. HTTP/2 is the latest entry in the HTTP family of protocols. It builds on work from Google and others to improve performance (and other) shortcomings of the older HTTP/1.x protocols in wide-spread use today.

Twisted implements HTTP/2 support by building on the general-purpose H2 Python library. In fact, all you have to do to have HTTP/2 for your Twisted Web-based site (starting in Twisted 16.3.0) is install the dependencies:

$ pip install twisted[http2]

Your TLS-based site is now available via HTTP/2! A future version of Twisted will likely extend this to non-TLS sites (which requires the Upgrade: h2c handshake) with no further effort on your part.

If you like this post or others in the Twisted Web in 60 Seconds series, let me know with a donation! I'll post another entry in the series when the counter hits zero. Topic suggestions welcome in the comment section.

by Jean-Paul Calderone (noreply@blogger.com) at December 05, 2016 12:00 PM

December 02, 2016

Moshe Zadka

Don’t Mock the UNIX Filesystem

When writing unit tests, it is good to call functions with “mocks” or “fakes” — objects with equivalent interface but a simple, “fake” implementation. For example, instead of a real socket object, something that has recv() but returns “hello” the first time, and an empty string the second time. This is great! Instead of testing the vagaries of the other side of a socket connection, you can focus on testing your code — and force your code to handle corner cases, like recv() returning partial messages, that happen rarely on the same host (but not so rarely in more complex network environments).

There is one OS interface which it is wise not to mock — the venerable UNIX file system. Mocking the file system is the classic case of low-ROI effort:

  • It is easy to isolate: if functions get a parameter of “which directory to work inside”, tests can use a per-suite temporary directory. Directories are cheap to create and destroy.
  • It is reliable: the file system rarely fails — and if it does, your code is likely to get weird crashes anyway.
  • The surface area is enormous: open(), but also os.open, os.mkdir, os.rename, os.mknod, os.rename, shutil.copytree and others, plus modules calling out to C functions which call out to C’s fopen().

The first two items decrease the Return, since mocking the file system does not make the tests easier to write or the test run more reproducible, while the last one increases the Investment.

Do not mock the file system, or it will mock you back.

by moshez at December 02, 2016 05:34 AM

November 30, 2016

Itamar Turner-Trauring

The Not-So-Passionate Programmer: finding a job when you're just a normal person

When reading programming job postings you'll find many companies that want to hire "passionate programmers". If you're just a normal programmer looking for a normal job this can be pretty discouraging.

What if you're not passionate? What if you don't work on side projects, or code in your spare time?

What if you have a sneaking suspicion that "passionate" is a euphemism for "we want you to work large amounts of unpaid overtime"? Can you really find a job where you don't have to be passionate, where you can just do your job and go home?

The truth is that many companies will hire you even if you don't have "passion". Not to mention that "passion" has very little impact on whether you do your job well.

But since companies do ask for "passion" in job postings and sometimes look for it during interviews, here's what you can do about your lack of "passion" when searching for a job.

Searching for a job

The first thing to do is not worry about it too much. Consider some real job postings for passionate programmers:

  • "[Our company] is looking for Java Engineer who is passionate about solving real world business problems to join our team."
  • "We're looking for a senior developer to play a major role in a team of smart, passionate and driven people."
  • "This role is ideal for somebody who is passionate about building great online apps."

They all say "passionate", yes. But these are all posts from very different kinds of companies, with different customers, different technology stacks, and very different cultures (and they're in two different countries). Whoever wrote the job posting at each company probably didn't think very hard about their choice of words, and if pressed each would probably explain "passionate" differently.

It might be a euphemism for working long hours, but it might also just mean they want to hire competent engineers. If the job looks good otherwise, don't think about it too hard: apply and see how it goes.

Interviewing for a job

Eventually you'll get a job interview at a company that wants "passionate" programmers. A job interview has two aspects: the company is interviewing you, and you are interviewing the company.

When the company is interviewing you they want to find out if you're going to do your job. You need to make a good impression... even if insurance premiums, or content management systems, or internal training or whatever the company does won't be putting beating cartoon hearts in your eyes.

  • First, that means you need to take an interest in the company. Before your interview do some research about the company, and then ask questions about the product during the interview.
  • Second, since you can't muster that crazy love for insurance premiums, focus on what you can provide: emphasize your professional pride in your work, your willingness to get things done and do them right.

At the same time that you're trying to sell yourself to the company you should also be trying to figure out if you want to work for them. Among other things, you want to figure out if the word "passionate" is just a codeword for unpaid overtime.

Ask what a typical workday is like, and what a typical workweek is like. Ask how they do project planning, and how they ensure code ships on time.

Finally, you will sometimes discover that the employees who work at the company are passionate about what they do. If this rubs you the wrong way, you might want to find a different company to work for.

If you're OK with it you'll want to make sure you'll be able to fit in. So try to figure out if they're open to other ways of thinking: how they handle conflicts, how they handle diversity of opinion.

On the job

Eventually you will have a job. Most you'll just have a normal job, with normal co-workers who are just doing their job too.

But sometimes you will end up somewhere where everyone else is passionate and you are not. So long as your coworkers and management value a diversity of opinion, your lack of passion can actually be a positive.

For example, startups are often full of passion for what they're building. Most startups fail, of course, and so every startup has a story about why they are different, why they won't fail. Given the odds that story will usually turn out to be wrong, but passionate employees will keep on believing, or at least won't be willing to contradict the story in public.

As someone who isn't passionate you can provide the necessary sanity checks: "Sure, it's a great product... but we're not getting customers fast enough. Maybe we should figure out what we can change?"

Similarly, passionate programmers often love playing with new technology. But technology can be a distraction, and writing code is often the wrong solution. As someone who isn't passionate you can ensure the company's goals are actually being met... even if that means using existing, boring software instead of writing something new and exciting.

There's nothing wrong with wanting to go home at the end of the day and stop thinking about work. There are many successful software developers who don't work crazy hours and who don't spend their spare time coding.

Join the course: Getting to a Sane Workweek

Don't let your job take over your life. Join over 650 other programmers on the journey to a saner workweek by taking this free 6-part email course. You'll learn how you can work reasonable hours and still succeed in your career a programmer.

I won't send you any spam. Unsubscribe at any time. Powered by ConvertKit

If you would like a job that doesn't overwhelm your life, join my free 6-part email course to learn how you can get to a sane workweek.

November 30, 2016 05:00 AM

November 25, 2016

Twisted Matrix Laboratories

Twisted 16.6.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.6!

The highlights of this release are:
  • The ability to use "python -m twisted" to call the new twist runner,
  • More reliable tests from a more reliable implementation of some things, like IOCP,
  • Fixes for async/await & twisted.internet.defer.ensureDeferred, meaning it's getting closer to prime time!
  • ECDSA support in Conch & ckeygen (which has also been ported to Python 3),
  • Python 3 support for Words' IRC support and twisted.protocols.sip among some smaller modules,
  • Some HTTP/2 server optimisations,
  • and a few bugfixes to boot!
For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

by Amber Brown (noreply@blogger.com) at November 25, 2016 08:06 PM

November 21, 2016

Jack Moffitt

Servo Interview on The Changelog

The Changelog has just published an episode about Servo. It covers the motivations and goals of the project, some aspects of Servo performance and use of the Rust language, and even has a bit about our wonderful community. If your curious about why Servo exists, how we plan to ship it to real users, or what it was like to use Rust before it was stable, I recommend giving it a listen.

by Jack Moffitt (jack@metajack.im) at November 21, 2016 12:00 AM

November 19, 2016

Moshe Zadka

Belt & Suspenders: Why put a .pex file inside a Docker container?

Recently I have been talking about deploying Python, and some people had the reasonable question: if a .pex file is used for isolating dependencies, and a Docker container is used for isolating dependencies, why use both? Isn’t it redundant?

Why use containers?

I really like glyph’s explanation for containers: they isolate not just the filesystem stack but the processes and the network, giving a lot of the power that UNIX was supposed to give but missed out on. Containers isolate the file system, making it easier for code to write/read files from known locations. For example, its log files will be carefully segregated, and can be moved to arbitrary places by the operator without touching the code.

The other part is that none of the reasonable options packages Python and this means that a pex file would still have to be tested with multiple Pythons, and perhaps do some checking at start-up that it is using the right interpreter. If PyPy is the right choice, it is the choice the operator would have to make and implement.

Why use pex?

Containers are an easy sell. They are right on the hype train. But if we use containers, what use is pex?

In order to explain, it is worthwhile comparing a correctly built runtime container that is not using pex, with one that is: (parts that are not relevant have been removed)

ADD wheelhouse /wheelhouse
RUN . /appenv/bin/activate; \
    pip install --no-index -f wheelhouse DeployMe
COPY twist.pex /

Note that in the first option, we are left with extra gunk in the /wheelhouse directory. Note also that we still have to have pip and virtualenv installed in the runtime container. Pex files bring the double-dutch philosophy to its logical conclusion: do even more of the build on the builder side, do even less of it on the runtime side.

by moshez at November 19, 2016 05:11 AM

November 18, 2016

Itamar Turner-Trauring

How I stopped the RSI pain that almost destroyed my programming career

If it hurts to type you'll have a much harder time working as a programmer. Yes, there's voice recognition, but it's just not the same. So when my wrist and arm pain returned soon after starting a new job I was starting to get a little scared.

The last two times this happened I'd had to take months and then years off from programming before the pain went away. Was my career as a programmer going to take another hit?

And then, while biking to work one day, I realized what was going on. I came up with a way to test my theory, tried it out... and the pain went away. It's quite possible the same solution would have worked all those years ago, too: instead of unhappily working as a product manager for a few years I could have been programming.

But before I tell you what I figured out, here's what I tried first.

Failed solution #1: better hardware, better ergonomics, more breaks

When I first got wrist pain bad enough that I couldn't type I started by getting a better keyboard, the Kinesis Advantage. It's expensive, built like a tank and amazingly well designed: because Ctrl, Alt, Space, Enter are on the thumb are you don't end up stretching your hands as much.

As an Emacs user this is important; I basically can't use regular keyboards for anything more than a few minutes these days. I own multiple Kinesis keyboards and would be very sad without them. They've definitely solved one particular kind of pain I used to have due to overstretching.

I reconfigured my desk setup to be more ergonomic (the days I do this via a standing desk). And I also started taking typing breaks: half a minute every few minutes, 10 minutes once an hours. That might have helped, or not.

The pain came and went, and eventually it came and stayed.

Failed solution #2: doctor's orders

I went to see a doctor, and she suggested it was some sort of "-itis", a fancy Latin word saying I was in pain and she wasn't quite sure why. She prescribed a non-steroidal anti-inflammatory (high doses of ibuprofen will do the trick) and occupational therapy.

That didn't help either, though the meds dulled the pain when I took them.

Failed solution #3: alternative physical therapy

Next I tried massage, Yoga, Alexander Technique, and Rolfing. I learned that my muscles were tight and sore, and ended up understanding some ways I could improve my posture. A couple of times during the Alexander Technique classes my whole back suddenly relaxed, an amazing experience: I was obviously doing something wrong with my muscles.

What I learned was useful. My hands are often cold, and all those classes helped me eventually discover that if I relaxed my muscles the right way my hands would warm up. Tense muscles were hurting my circulation.

At the time, however, none of it helped.

Giving up

After six months at home not typing I was no better: I was still in pain.

So I went back to work and got a new role, as a Product Analyst, where I needed to type less and could use voice recognition for dictation. I did this for 2 or 3 years, but I was not happy: I missed programming.

Working part time

At some point during this period I read one of Dr. Sarno's books. His theory is that long periods of pain are not due to actual injury, but rather an emotional problem causing e.g. muscles to tense up or reduced blood flow. There are quite a few people who have had their pain go away by reading one of his books and doing some mental exercises.

I decided to give it a try: release emotional stress, go back to programming, and not worry about pain anymore. Since I wasn't sure I could work full time I took on consulting, and later a part time programming job.

It worked! I was able to type again, with no pain for four years.

The pain comes back

Earlier this year I started another job, with more hours but still slightly less than full time. And then the pain returned.

Why was I in pain again? I wasn't working that many more hours, I was still using a standing desk as I had for the past four years. What was going on?

An epiphany: environmental causes

Biking to work one day the epiphany hit me: Dr. Sarno's theory was that suppressed emotional stress caused the pain by tensing muscles or reducing blood flow. And that seemed to be the case for me at least. But emotional stress wasn't the only way I could end up with tense muscles or reduced blood flow.

The new office I was working in was crazy cold, and a couple of weeks earlier I'd moved my desk right under the air conditioning vent. Cold would definitely reduce blood flow. For that matter, caffeine shrinks blood vessels. And during the four years I'd work part time and pain free I'd been working in a hackerspace with basically no air conditioning.

I started wearing a sweatshirt and hand warmers at work, and I avoided caffeine on the days I went to the office. The pain went away, and so far hasn't come back.

I spent three years unable to work as a programmer, and there's a good chance I could have solved the problem just by wearing warmer clothing.

Lessons learned

If you're suffering from wrist or arm pain:

  1. Start by putting a sweatshirt on: getting warmer may be all you need to solve the problem.
  2. If Emacs key combos are bad for your wrist, consider vi, god-mode, Spacemacs... or the expensive option, a Kinesis Advantage keyboard.
  3. Next, consider improving your posture (standing desks are good for that).
  4. Finally, if you're still in pain after a month or two go read Dr. Sarno's book. (Update: After posting this blog I got a number of emails from people saying "I read that book and my pain quickly went away.")

This may not work for everyone, but I do believe most so-called repetitive strain injury is not actually an injury. If you're in pain, don't give up: you will be able to get back to typing.

By the way, taking so long to figure out why my arms were hurting isn't the only thing I've gotten wrong during my career. So if you want to become a better software engineer, learn how you can avoid my many mistakes as a programmer.

November 18, 2016 05:00 AM

November 16, 2016

Itamar Turner-Trauring

Debugging Software, Sherlock Holmes-style

How many times have you seen software exhibiting completely impossible results? In theory software is completely deterministic, but in practice it often seems capriciously demonic. But all is not lost: the detection methods of Sherlock Holmes can help you discover the hidden order beneath the chaos.

Sherlock Holmes famously stated that "once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." And what is true for (fictional) crimes is also true for software. The basic process you follow to find a problem is:

  1. Come up with a list of potential causes.
  2. Test each potential cause in isolation, ruling them out one by one.
  3. Whatever cause you can't rule out is the likely cause, even if it seems improbable.

To see how this might work in practice, here's a bug my colleague Phil and I encountered over at my day job where we're building microservices architecture.

The Case of the Missing Stats

I was working on a client library, Phil was working on the server. Phil was testing out a new feature where the client would send messages to the server, containing certain statistics. When he ran the client the server did get messages, but the messages only ever had empty lists of stats.

Someone had kidnapped the stats, and we had to find them.

Phil was using the following components, each of which was a potential suspect:

  1. A local server with in-development code.
  2. Python 3.4.
  3. The latest version of the client.
  4. The latest version of the test script.

Eliminating the impossible

Our next step was to isolate each possible cause and falsify it.

Theory #1: the client was broken

The client code had never been used with a real server; perhaps it was buggy? I checked to see if there were unit tests, and there were some checking for existence of stats. Maybe the unit tests were broken though.

We ran the client with Python 3.5 on my computer using the same test script Phil had used and recorded traffic to the production server. Python 3.5 and 3.4 are similar enough that it seemed OK to change that variable at the same time.

The messages sent to the server did include the expected stats. The client was apparently not the problem, nor was the test script.

Theory #2: Python version

We tried Python 2.7, just for kicks; stats were still there.

Theory #3: Phil's computer

Maybe Phil's computer was cursed? Phil gave me an SSH login to his computer, I set up a new environment and ran the client against the production server using the Python 3.4 on his computer.

Once again we saw stats being sent.

Theory #4: the server was broken

You may have noticed that so far we were testing against the production server, and Phil had been testing against his in-development server. The server seemed an unlikely cause, however: the client unilaterally sent messages to the server, so the server version shouldn't have mattered.

However, having eliminated all other causes, that was the next thing to check. We ran the client against Phil's in-development server... and suddenly the stats were missing from the client transmission logs.

We had found the kidnapper. Now we needed to figure out how the crime had been committed.

Recreating the crime

So far we'd assumed that when the client talked to the dev server the messages did not include stats. Now that we could reproduce the problem we noticed that it wasn't that the messages didn't include stats; rather, we were sending fewer messages.

Messages with stats were failing to be sent. A quick check of the logs indicated an encoding error: we were failing to encode messages that had stats, so they were never sent. (We should have checked the logs much much earlier in the process, as it turns out.)

Reading the code suggested the problem: the in-development server was feeding the client bogus data earlier on. When the client tried to send a message to the server that included stats it needed to use some of that bogus data, and it failed to encode and the message got dropped. If the client sent a message to the server with an empty list of stats the bogus data was not needed, so encoding and sending succeeded.

The server turned out to be the culprit after all, even though it seemed to be the most improbable cause at the outset. Or at least, the first order culprit; a root-cause analysis suggested that some problems in our protocol design were the real cause.

You too can be a scientific software detective

Our debugging process could have been better: we didn't really check only one change at a time, and we neglected the obvious step of checking the logs. But the basic process worked:

  1. Isolate a possible cause.
  2. Falsify it, demonstrating it can't be the real cause.
  3. Repeat until only one cause is left.

Got an impossible bug? Put on your imaginary detective hat, stick an imaginary detective pipe in your mouth, and catch that culprit.

November 16, 2016 05:00 AM

November 13, 2016

Moshe Zadka

Deploying with Twisted: Saying Hello

Too Long: Didn’t Read

The build script builds a Docker container, moshez/sayhello:.

$ ./build MY_VERSION
$ docker run --rm -it --publish 8080:8080 \
  moshez/sayhello:MY_VERSION --port 8080

There will be a simple application running on port 8080.

If you own the domain name hello.example.com, you can point it at a machine that the domain resolves to and then run:

$ docker run --rm -it --publish 443:443 \
  moshez/sayhello:MY_VERSION --port le:/srv/www/certs:tcp:443 \
  --empty-file /srv/www/certs/hello.example.com.pem

It will result in the same application running on a secure web site: https://hello.example.com.

 

All source code is available on GitHub.

Introduction

WSGI has been a successful standard. Very successful. It allows people to write Python applications using many frameworks (Django, Pyramid, Flask and Bottle, to name but a few) and deploy using many different servers (uwsgi, gunicorn and Apache).

Twisted makes a good WSGI container. Like Gunicorn, it is pure Python, simplifying deployment. Like Apache, it sports a production-grade web server that does not need a front end.

Modern web applications tend to be complex beasts. In order to be trusted by users, they need to have TLS support, signed by a trusted CA. They also need to transmit a lot of static resources — images, CSS and JavaScript files, even if all HTML is dynamically generated. Deploying them often requires complicated set-ups.

Containers

Container images allow us to package an application with all of its dependencies. They often cause a temptation to use those as the configuration management. However, Dockerfile is a challenging language to write big parts of the application in. People writing WSGI applications probably think Python is a good programming language. The more of the application logic is in Python, the easier it is for a WSGI-based team to master it.

PEX

Pex is a way to package several Python “distributions” (sometimes informally called “Packages”, the things that are hosted by PyPI) into one file, optionally with an entry-point so that running the file will call a pre-defined function. It can take an explicit list of wheels but can also, as in our example here, take arguments compatible with the ones pip takes. The best practice is to give it a list of wheels, and build the wheels with pip wheel.

pkg_resources

The pkg_resources module allows access to files packaged in a distribution in a way that is agnostic to how the distribution was deployed. Specifically, it is possible to install a distribution as a zipped directory, instead of unpacking it into site-packages. The code:pex format relies on this feature of Python, so adherence to using pkg_resources to access data files is important in order to not break code:pex compatibility.

Let’s Encrypt

Let’s Encrypt is a free, automated, and open Certificate Authority. It has invented the ACME protocol in order to make getting secure certificates a simple operation. txacme is an implementation of an ACME client, i.e., something that asks for certificates, for Twisted applications. It uses the server endpoint plugin mechanism in order to allow any application that builds a listening endpoint to support ACME.

Twist

The twist command-line tools allows running any Twisted service plugin. Service plugins allow us to configure a service using Python, a pretty nifty language, while still allowing specific customizations at the point of use via command line parameters.

Putting it all together

Our setup.py files defines a distribution called sayhello. In it, we have three parts:

  • src/sayhello/wsgi.py: A simple Flask-based WSGI application
  • src/sayhello/data/index.html: an HTML file meant to serve as the root
  • src/twisted/plugins/sayhello.py: A Twist plugin

There is also some build infrastructure:

  • build is a Python script to run the build.
  • build.docker is a Dockerfile designed to build pex files, but not run as a production server.
  • run.docker is a Dockerfile designed for production container.

Note that build does not push the resulting container to DockerHub.

Credits

Glyph Lefkowitz has inspired me in his blog about how to build efficient containers. He has also spoken about how deploying applications should be no more than one file copy.

Tristan Seligmann has written txacme.

Amber “Hawkowl” Brown has written “twist”, which is much better at running Twisted-based services than the older “twistd”.

Of course, all mistakes and problems here are completely my responsibility.

by moshez at November 13, 2016 03:38 PM

November 12, 2016

Glyph Lefkowitz

What are we afraid of?

I’m crying as I write this, and I want you to understand why.

Politics is the mind-killer. I hate talking about it; I hate driving a wedge between myself and someone I might be able to participate in a coalition with, however narrow. But, when you ignore politics for long enough, it doesn't just kill the mind; it goes on to kill the rest of the body, as well as anyone standing nearby. So, sometimes one is really obligated to talk about it.

Today, I am in despair. Donald Trump is an unprecedented catastrophe for American politics, in many ways. I find it likely that I will get into some nasty political arguments with his supporters in the years to come. But hopefully, this post is not one of those arguments. This post is for you, hypothetical Trump supporter. I want you to understand why we1 are not just sad, that we are not just defeated, but that we are in more emotional distress than any election has ever provoked for us. I want you to understand that we are afraid for our safety, and for good reason.

I do not believe I can change your views; don’t @ me to argue, because you certainly can’t change mine. My hope is simply that you can read this and at least understand why a higher level level of care and compassion in political discourse than you are used to may now be required. At least soften your tone, and blunt your rhetoric. You already won, and if you rub it in too much, you may be driving people to literally kill themselves.


First let me list the arguments that I’m not making, so you can’t write off my concerns as a repeat of some rhetoric you’ve heard before.

I won’t tell you about how Trump has the support of the American Nazi Party and the Ku Klux Klan; I know that you’ll tell me that he “can’t control who supports him”, and that he denounced2 their support. I won’t tell you about the very real campaign of violence that has been carried out by his supporters in the mere days since his victory; a campaign that has even affected the behavior of children. I know you don’t believe there’s a connection there.

I think these are very real points to be made. But even if I agreed with you completely, that none of this was his fault, that none of this could have been prevented by his campaign, and that in his heart he’s not a hateful racist, I would still be just as scared.


Bear Sterns estimates that there are approximately 20 million illegal immigrants in the United States. Donald Trump’s official position on how to handle this population is mass deportation. He has promised that this will be done “warmly and humanely”, which betrays his total ignorance of how mass resettlements have happened in the past.

By contrast, the total combined number of active and reserve personnel in the United States Armed Forces is a little over 2 million people.

What do you imagine happens when a person is deported? A person who, as an illegal immigrant, very likely gave up everything they have in their home country, and wants to be where they are so badly that they risk arrest every day, just by living where they live? How do you think that millions of them returning to countries where they have no home, no food, and quite likely no money or access to the resources or support that they had while in the United States?

They die. They die of exposure because they are in poverty and all their possessions were just stripped away and they can no longer feed themselves, or because they were already refugees from political violence in their home country, or because their home country kills them at the border because it is a hostile action to suddenly burden an economy with the shock of millions of displaced (and therefore suddenly poor and unemployed, whether they were before or not) people.

A conflict between 20 million people on one side and 2 million (heavily armed) people on the other is not a “police action”. It cannot be done “warmly and humanely”. At best, such an action could be called a massacre. At worst (and more likely) it would be called a civil war. Individual deportees can be sent home without incident, and many have been, but the victims of a mass deportation will know what is waiting for them on the other side of that train ride. At least some of them won’t go quietly.

It doesn’t matter if this is technically enforcing “existing laws”. It doesn’t matter whether you think these people deserve to be in the country or not. This is just a reality of very, very large numbers.

Let’s say, just for the sake of argument, that of the population of immigrants has assimilated so poorly that each one knows only one citizen who will stand up to defend them, once it’s obvious that they will be sent to their deaths. That’s a hypothetical resistance army of 40 million people. Let’s say they are so thoroughly overpowered by the military and police that there are zero casualties on the other side of this. Generously, let’s say that the police and military are incredibly restrained, and do not use unnecessary overwhelming force, and the casualty rate is just 20%; 4 out of 5 people are captured without lethal force, and miraculously nobody else dies in the remaining 16 million who are sent back to their home countries.

That’s 8 million casualties.

6 million Jews died in the Holocaust.


This is why we are afraid. Forget all the troubling things about Trump’s character. Forget the coded racist language, the support of hate groups, and every detail and gaffe that we could quibble over as the usual chum of left/right political struggle in the USA. Forget his deeply concerning relationship with African-Americans, even.

We are afraid because of things that others have said about him, yes. But mainly, we are afraid because, in his own campaign, Trump promised to be 33% worse than Hitler.

I know that there are mechanisms in our democracy to prevent such an atrocity from occurring. But there are also mechanisms to prevent the kind of madman who would propose such a policy from becoming the President, and thus far they’ve all failed.

I’m not all that afraid for myself. I’m not a Muslim. I am a Jew, but despite all the swastikas painted on walls next to Trump’s name and slogans, I don’t think he’s particularly anti-Semitic. Perhaps he will even make a show of punishing anti-Semites, since he has some Jews in his family3.

I don’t even think he’s trying to engineer a massacre; I just know that what he wants to do will cause one. Perhaps, when he sees what is happening as a result of his orders, he will stop. But his character has been so erratic, I honestly have no idea.

I’m not an immigrant, but many in my family are. One of those immigrants is intimately familiar with the use of the word “deportation” as an euphemism for extermination; there’s even a museum about it where she comes from.

Her mother’s name is written in a book there.


In closing, I’d like to share a quote.

The last thing that my great-grandmother said to my grandmother, before she was dragged off to be killed by the Nazis, was this:

Pleure pas, les gens sont bons.

or, in English:

Don’t cry, people are good.

As it turns out, she was right, in a sense; thanks in large part to the help of anonymous strangers, my grandmother managed to escape, and, here I am.


My greatest hope for this upcoming regime change is that I am dramatically catastrophizing; that none of these plans will come to fruition, that the strange story4 I have been told by Trump supporters is in fact true.

But if my fears, if our fears, should come to pass – and the violence already in the streets already is showing that at least some of those fears will – you, my dear conservative, may find yourself at a crossroads. You may see something happening in your state, or your city, or even in your own home. Your children might use a racial slur, or even just tell a joke that you find troubling. You may see someone, even a policeman, beating a Muslim to death. In that moment, you will have a choice: to say something, or not. To be one of the good people, or not.

Please, be one of the good ones.

In the meanwhile, I’m going to try to take great-grandma’s advice.


  1. When I say “we”, I mean, the people that you would call “liberals”, although our politics are often much more complicated than that; the people from “blue states” even though most states are closer to purple than pure blue or pure red; people of color, and immigrants, and yes, Jews. 

  2. Eventually. 

  3. While tacitly allowing continued violence against Muslims, of course. 

  4. “His campaign is really about campaign finance”, “he just said that stuff to get votes, of course he won’t do it”, “they’ll be better off in their home countries”, and a million other justifications. 

by Glyph at November 12, 2016 02:33 AM

November 10, 2016

Itamar Turner-Trauring

Work/Life Balance Will Make You a Better Software Engineer

It's tempting to believe that taking your work home will make you a better software engineer, and that work/life balance will limit your learning.

  • For some software developers programming isn't just a job: it's something to do for fun, sometimes even a reason for being. If you love coding and coding is your job, why not keep working over the weekend? It's more practice of the skills you need.
  • When you don't have the motivation or ability to take work home on the weekends you might feel you're never going to be as good a software engineer as those who do.

But the truth is that if you want to be a good software engineer you shouldn't take your work home.

What makes a good software engineer? The ability to build solutions for hard, complex problems. Here's why spending extra hours on your normal job won't help you do that.

New problems, new solutions

If you have the time and motivation to write software in your free time you could write more software for your job. But that restricts you to a particular kind of problem and limits the solution space you can consider.

If you take your work home you will end up solving the same kinds of problems that you work on during your normal workweek. You'll need to use technologies that meet your employer's business goals, and you'll need to use the same standards of quality your employer expects. But if you take on a personal programming project you'll have no such constraints.

  • If your company has low quality standards, you can learn how to test really well.
  • Or you can write complete hacks just to learn something new.
  • You can use and learn completely different areas of technology.

I once wrote a Python Global Interpreter Lock profiler, using LD_PRELOAD to override the Python process' interactions with operating system locks and the gdb debugger to look at the live program's C stack. It never worked well enough to be truly useful... but building it was very educational.

The additional learning you'll get from working on different projects will make you a better software engineer. But even if you don't have the time or motivation to code at home, fear not: work/life balance can still make you a better software engineer.

Learning other skills

Being a good software engineer isn't just about churning out code. There are many other skills you need, and time spent doing things other than coding can still improve your abilities.

When I was younger and had more free time I spent my evenings studying at a university for a liberals art degree. Among other things I learned how to write: how to find abstractions that mattered, how to marshal evidence, how to explain complex ideas, how to extract nuances from text I read. This has been immensely useful when working on harder problems, where good abstractions are critical and design documents are a must.

These days I'm spending more of my time with my child, and as a side-effect I'm learning other things. For example, explaining the world to a 4-year-old requires the ability to take complex concepts and simplify them to their essential core.

You need a hammock to solve hard problems

Though additional learning will help you, much of the benefit of work/life balance is that you're not working. Hard problems require downtime, time when you're explicitly not thinking about solutions, time for your brain to sort things out in the background. Rich Hickey, the creator of Clojure, has a great talk on the subject called Hammock Driven Development.

The gist is that hard problems require a lot of research, of alternatives and existing solutions and the problem definition, and then a lot of time letting your intuition sort things out on its own. And that takes time, time when you're not actively thinking about the problem.

At one point I was my kid's primary caregiver when I wasn't at work. I'm not up to Hickey's standard of hard problems, and taking care of an infant and toddler wasn't as restful as a hammock. But I still found that time spent not thinking about work was helpful in solving the hard problems I went back to the next day.

Learning to do more with less

The final benefit of work/life balance is attitude: the way you think about your job. If you work extra hours on your normal job you are training yourself to do your work with more time than necessary. To improve as a software engineer you want to learn how to do your work in less time, which is important if you want to take on bigger, harder projects.

Working a reasonable, limited work week will help focus you on becoming a more productive programmer rather than trying to solve problems the hard, slow way.

Given the choice you shouldn't take your work home with you. If you want to keep coding you should have no trouble finding interesting projects to work on, untrammeled by the requirements of your job. If can't or won't code in your free time, that's fine too.

But what if that isn't a choice you can make? What if you don't have work/life balance as a software engineer because of pressure from your boss, or constant emergencies at work? In that case you should sign up for my free 6-part email course, which will show you how to get a to a saner, shorter workweek.

November 10, 2016 05:00 AM

October 30, 2016

Itamar Turner-Trauring

Maintainable Python Applications: a Guide for Skeptical Java Developers

When you've been writing Java for a while switching to Python can make you a little anxious. Not only are you learning a new language with new idioms and tools, you're also dealing with a language with far less built-in safety. No more type checks, no more clear separation between public and private.

It's much easier to learn Python than Java, it's true, but it's also much easier to write unmaintainable code. Can you really build large scale, robust and maintainable applications in Python? I think you can, if you do it right.

The suggestions below will help get you started on a new Python project, or improve an existing project that you're joining. You'll need to keep up the best practices you've used in Java, and learn new tools that will help you write better Python.

Tools and Best Practices

Python 2 and 3

Before you start a new Python project you have to choose which version of the language to support: Python 3 is not backwards-compatible with Python 2. Python 2 is only barely being maintained, and will be end-of-lifed in 2020, so that leaves you with only two options with long term viability:

  1. A hybrid language, the intersection of Python 2 and Python 3. This requires you to understand the subtleties of the differences between the two languages. The best guide I've seen to writing this hybrid language is on the Python Future website.
  2. Python 3 only.

Most popular Python libraries now support Python 3, as do most runtime environments. Unless you need to write a library that will be used by both new and legacy applications it's best to stick to Python 3 only.

However, on OS X you'll need to use Homebrew to install Python 3 (though using Homebrew's Python 2 is also recommended over using the system Python 2). And on Google App Engine you'll need to use the beta Flexible Environment to get Python 3 support.

Static typing

Java enforces types on method parameters, on object attributes, and on variables. To get the equivalent in Python you can use a combination of runtime type checking and static analysis tools.

  • To ensure your classes have the correct types on attributes you can use the attrs library, though it's very useful even if you don't care about type enforcement. This will only do runtime type checking, so you'll need to have decent test coverage.
  • For method attributes and variables, the mypy static type checker, combined with the new Python 3 type annotation syntax, will catch many problems. For Python 2 there is a comment-based syntax as well. The clever folks at Zulip have a nice introductory article about mypy.

Public, private and interfaces

Python lets you do many things Java wouldn't, everything from metaclasses to replacing a method at runtime. But while these more dynamic capabilities can be quite useful, there's nothing wrong with using them sparingly. For example, while Python allows you to set random attributes on a passed in object, usually you shouldn't.

  • As with Java, you typically want to interact with objects using a method-based interface (explicit or implicit), not by randomly mucking with its internals.
  • As with Java code, you want to have a clear separation between public and private parts of your API.
  • And as with Java, you want to be coding to an interface, not to implementation details.

Where Java has explicit and compiler enforced public/private separation, in Python you do this by convention:

  • Private methods and attributes on a class are typically prefixed with an "_".
  • The public interface of a module is declared using __all__, e.g. __all__ = ["MyClass", "AnotherClass"]. __all__ also controls what you gets imported when you do from module import *, but wildcard imports are a bad idea. For more details see the relevant Python documentation.

As for interfaces, if you want to explicitly declare them you can use Python's built-in abstract base classes; not quite the same, but they can be used as pseudo-interfaces. Alternatively, the zope.interface package is more powerful and flexible (and the attrs library mentioned above understands it).

Tests

Automated tests are important if you want some assurance your code works. Python has a built-in unittest library that is similar to JUnit, but at a minimum you'll want a more powerful test runner.

  • nose is a test runner for the built-in unittest, with many plugins.
  • pytest is a test runner and framework, supporting the built-in unittest library as well as a more succinct style of testing. It also has numerous plugins.

Other useful tools:

  • Hypothesis lets you write a single function that generates hundreds or thousands of test cases for maximal test coverage.
  • To set up isolated test environments tox is useful; it builds on Python's built-in virtualenv.
  • coverage let's you measure code coverage on your test runs. If you have multiple tox environments, here's a tutorial on combining the resulting code coverage.

More static analysis

In addition to mypy, two other lint tools may prove useful:

  • flake8 is quick, catches a few important bugs, and checks for some standard coding style violations.
  • pylint is much more powerful, slower, and generates massive numbers of false positives. As a result much fewer Python projects use it than flake8. I still recommend using it, but see my blog post on the subject for details on making it usable.

Documentation

You should document your classes and public methods using docstrings. Unless you're using the new type signature syntax you should also document the types of function parameters and results.

Typically Python docstrings are written in reStructuredText format. It's surprisingly difficult to find an example of the standard style, but here's one.

Sphinx is the standard documentation tool for Python, for both prose and generated API docs. It supports reStructuredText API documentation, but also Google-style docstrings.

Editors

A good Python editor or IDE won't be as powerful as the equivalent Java IDE, but it will make your life easier. All of these will do syntax highlighting, code completion, error highlighting, etc.:

  • If you're used to IntelliJ you can use PyCharm.
  • If you're used to Eclipse you can use PyDev.
  • Elpy is a great Emacs mode for Python.
  • Not certain what your best bet is for vim, but python-mode looks plausible.

Writing maintainable Python

In the end, writing maintainable Python is very much like writing maintainable Java. Python has more flexibility, but also more potential for abuse, so Python expects you to be a responsible adult.

You can choose to write bad code, but if you follow the best practices you learned from Java you won't have to. And the tools I've described above will help catch any mistakes you make along the way.

October 30, 2016 04:00 AM

October 29, 2016

Moshe Zadka

Twisted as Your WSGI Container

Introduction

WSGI is a great standard. It has been amazingly successful. In order to describe how successful it is, let me describe life before WSGI. In the beginning, CGI existed. CGI was just a standard for how a web server can run a process — what environment variables to pass, and so forth. In order to write a web-based application, people would write programs that complied with CGI. At that time, Apache’s only competition was commercial web servers, and CGI allowed you to write applications that ran on both. However, starting a process for each request was slow and wasteful.

For Python applications, people wrote mod_python for Apache. It allowed people to write Python programs that ran inside the Apache process, and directly used Apache’s API to access the HTTP request details. Since Apache was the only server that mattered, that was fine. However, as more servers arrived, a standard was needed. mod_wsgi was originally a way to run the same Django application on many servers. However, as a side effect, it also allowed the second wave of Python web application frameworks –Paste, Flask and more — to have something to run on. In order to make life easier, Python included wsgiref, a module that implemented a single-thread single-process blocking web server with the WSGI protocol.

Development

Some web frameworks come with their own development web servers that will run their WSGI apps. Some use wsgiref. Almost always those options are carefully documented as “just for development use, do not use in production.” Wouldn’t it be nice to use the same WSGI container in both development and production, eliminating one potential source of reproduction bugs?

For ease of use, it should probably be written in Python. Luckily, “twist web –wsgi” is just such a server. In order to show-case how easy it is to use it, twist-wsgi shows commands to run Django, Flask, Pyramid and Bottle apps as easy as it is to run frameworks’ built-in web server.

Production

In production, using the Twisted WSGI containers come with several advantages. Production-grade SSL support using PyOpenssl and cryptography allows elimination of “SSL terminators”, removing one moving piece from the equation. With third-party extensions like txsni and txacme, it allows modern support for “easy SSL”. The built-in HTTP/2 support, starting with Twisted 16.3, allows better support for parallel requests from modern browsers.

The Twisted web server also has a built-in static file server, allowing the elimination of a “front-end” web server that deals with static files by itself, and passing dynamic requests to the application server.

Twisted is also not limited to web serving. As a full-stack network application, it has support for scheduling repeated tasks, running processes and supporting other protocols (for example, a side-channel for online control). Last but not least, in order to integrate that, the language used is Python. As an example for an integrated solution, the Frankenstenian monster plugin show-cases a combo web application combining 4 frameworks, a static file server and a scheduled task updating a file.

While the goal is not to encourage using four web frameworks and a couple of side services in order to greet the user and tell them what time it is, it is nice that if the need strikes this can all be integrated into one process in one language, without the need to remember how to spell “every 4 seconds” in cron or how to quote a string in the nginx configuration file.

by moshez at October 29, 2016 03:03 PM

Twisted Matrix Laboratories

Twisted 16.5.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.5!

The highlights of this release are:

  • Deferred.addTimeout, for timing out your Deferreds! (contributed by cyli, reviews by adiroiban, theisencouple, manishtomar, markrwilliams)
  • yield from support for Deferreds, in functions wrapped with twisted.internet.defer.ensureDeferred. This will work in Python 3.4, unlike async/await which is 3.5+ (contributed by hawkowl, reviews by markrwilliams, lukasa).
  • The new asyncio interop reactor, which allows Twisted to run on top of the asyncio event loop. This doesn't include any Deferred-Future interop, but stay tuned! (contributed by itamar and hawkowl, reviews by rodrigc, markrwilliams)
  • twisted.internet.cfreactor is now supported on Python 2.7 and Python 3.5+! This is useful for writing pyobjc or Toga applications. (contributed by hawkowl, reviews by glyph, markrwilliams)
  • twisted.python.constants has been split out into constantly on PyPI, and likewise with twisted.python.versions going into the PyPI package incremental. Twisted now uses these external packages, which will be shared with other projects (like Klein). (contributed by hawkowl, reviews by glyph, markrwilliams)
  • Many new Python 3 modules, including twisted.pair, twisted.python.zippath, twisted.spread.pb, and more parts of Conch! (contributed by rodrigc, hawkowl, glyph, berdario, & others, reviews by acabhishek942, rodrigc, & others)
  • Many bug fixes and cleanups!
  • 260+ closed tickets overall.

    For more information, check the NEWS file (link provided below).

    You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

    Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

    Twisted Regards,
    Amber Brown (HawkOwl)

    PS: I wrote a blog post about Twisted's progress in 2016! https://atleastfornow.net/blog/marching-ever-forward/

    by Amber Brown (noreply@blogger.com) at October 29, 2016 07:11 AM

    October 27, 2016

    Glyph Lefkowitz

    What Am Container

    Perhaps you are a software developer.

    Perhaps, as a developer, you have recently become familiar with the term "containers".

    Perhaps you have heard containers described as something like "LXC, but better", "an application-level interface to cgroups" or "like virtual machines, but lightweight", or perhaps (even less usefully), a function call. You've probably heard of "docker"; do you wonder whether a container is the same as, different from, or part of an Docker?

    Are you are bewildered by the blisteringly fast-paced world of "containers"? Maybe you have no trouble understanding what they are - in fact you might be familiar with a half a dozen orchestration systems and container runtimes already - but frustrated because this seems like a whole lot of work and you just don't see what the point of it all is?

    If so, this article is for you.

    I'd like to lay out what exactly the point of "containers" are, why people are so excited about them, what makes the ecosystem around them so confusing. Unlike my previous writing on the topic, I'm not going to assume you know anything about the ecosystem in general; just that you have a basic understanding of how UNIX-like operating systems separate processes, files, and networks.1


    At the dawn of time, a computer was a single-tasking machine. Somehow, you'd load your program into main memory, and then you'd turn it on; it would run the program, and (if you're lucky) spit out some output onto paper tape.

    When a program running on such a computer looked around itself, it could "see" the core memory of the computer it was running on, any attached devices, including consoles, printers, teletypes, or (later) networking equipment. This was of course very powerful - the program had full control of everything attached to the computer - but also somewhat limiting.

    This mode of addressing hardware is limiting because it meant that programs would break the instant you moved them to a new computer. They had to be re-written to accommodate new amounts and types of memory, new sizes and brands of storage, new types of networks. If the program had to contain within itself the full knowledge of every piece of hardware that it might ever interact with, it would be very expensive indeed.

    Also, if all the resources of a computer were dedicated to one program, then you couldn't run a second program without stomping all over the first one - crashing it by mangling its structures in memory, deleting its data by overwriting its data on disk.

    So, programmers cleverly devised a way of indirecting, or "virtualizing", access to hardware resources. Instead of a program simply addressing all the memory in the whole computer, it got its own little space where it could address its own memory - an address space, if you will. If a program wanted more memory, it would ask a supervising program - what we today call a "kernel" - to give it some more memory. This made programs much simpler: instead of memorizing the address offsets where a particular machine kept its memory, a program would simply begin by saying "hey operating system, give me some memory", and then it would access the memory in its own little virtual area.

    In other words: memory allocation is just virtual RAM.

    Virtualizing memory - i.e. ephemeral storage - wasn't enough; in order to save and transfer data, programs also had to virtualize disk - i.e. persistent storage. Whereas a whole-computer program would just seek to position 0 on the disk and start writing data to it however it pleased, a program writing to a virtualized disk - or, as we might call it today, a "file" - first needed to request a file from the operating system.

    In other words: file systems are just virtual disks.

    Networking was treated in a similar way. Rather than addressing the entire network connection at once, each program could allocate a little slice of the network - a "port". That way a program could, instead of consuming all network traffic destined for the entire machine, ask the operating system to just deliver it all the traffic for, say, port number seven.

    In other words: listening ports are just virtual network cards.


    Getting bored by all this obvious stuff yet? Good. One of the things that frustrates me the most about containers is that they are an incredibly obvious idea that is just a logical continuation of a trend that all programmers are intimately familiar with.


    All of these different virtual resources exist for the same reason: as I said earlier, if two programs need the same resource to function properly, and they both try to use it without coordinating, they'll both break horribly.2

    UNIX-like operating systems more or less virtualize RAM correctly. When one program grabs some RAM, nobody else - modulo super-powered administrative debugging tools - gets to use it without talking to that program. It's extremely clear which memory belongs to which process. If programs want to use shared memory, there is a very specific, opt-in protocol for doing so; it is basically impossible for it to happen by accident.

    However, the abstractions we use for disks (filesystems) and network cards (listening ports and addresses) are significantly more limited. Every program on the computer sees the same file-system. The program itself, and the data the program stores, both live on the same file-system. Every program on the computer can see the same network information, can query everything about it, and can receive arbitrary connections. Permissions can remove certain parts of the filesystem from view (i.e. programs can opt-out) but it is far less clear which program "owns" certain parts of the filesystem; access must be carefully controlled, and sometimes mediated by administrators.

    In particular, the way that UNIX manages filesystems creates an environment where "installing" a program requires manipulating state in the same place (the filesystem) where other programs might require different state. Popular package managers on UNIX-like systems (APT, RPM, and so on) rarely have a way to separate program installation even by convention, let alone by strict enforcement. If you want to do that, you have to re-compile the software with ./configure --prefix to hard-code a new location. And, fundamentally, this is why the package managers don't support installing to a different place: if the program can tell the difference between different installation locations, then it will, because its developers thought it should go in one place on the file system, and why not hard code it? It works on their machine.


    In order to address this shortcoming of the UNIX process model, the concept of "virtualization" became popular. The idea of virtualization is simple: you write a program which emulates an entire computer, with its own storage media, network devices, and then you install an operating system on it. This completely resolves the over-sharing of resources: a process inside a virtual machine is in a very real sense running on a different computer than programs running on a different virtual machine on the same physical device.

    However, virtualiztion is also an extremly heavy-weight blunt instrument. Since virtual machines are running operating systems designed for physical machines, they have tons of redundant hardware-management code; enormous amounts of operating system data which could be shared with the host, but since it's in the form of a disk image totally managed by the virtual machine's operating system, the host can't really peek inside to optimize anything. It also makes other kinds of intentional resource sharing very hard: any software to manage the host needs to be installed on the host, since if it is installed on the guest it won't have full access to the host's hardware.

    I hate using the term "heavy-weight" when I'm talking about software - it's often bandied about as a content-free criticism - but the difference in overhead between running a virtual machine and a process is the difference between gigabytes and kilobytes; somewhere between 4-6 orders of magnitude. That's a huge difference.

    This means that you need to treat virtual machines as multi-purpose, since one VM is too big to run just a single small program. Which means you often have to manage them almost as if they were physical harware.


    When we run a program on a UNIX-like operating system, and by so running it, grant it its very own address space, we call the entity that we just created a "process".

    This is how to understand a "container".

    A "container" is what we get when we run a program and give it not just its own memory, but its own whole virtual filesystem and its own whole virtual network card.

    The metaphor to processes isn't perfect, because a container can contain multiple processes with different memory spaces that share a single filesystem. But this is also where some of the "container ecosystem" fervor begins to creep in - this is why people interested in containers will religiously exhort you to treat a container as a single application, not to run multiple things inside it, not to SSH into it, and so on. This is because the whole point of containers is that they are lightweight - far closer in overhead to the size of a process than that of a virtual machine.

    A process inside a container, if it queries the operating system, will see a computer where only it is running, where it owns the entire filesystem, and where any mounted disks were explicitly put there by the administrator who ran the container. In other words, if it wants to share data with another application, it has to be given the shared data; opt-in, not opt-out, the same way that memory-sharing is opt-in in a UNIX-like system.


    So why is this so exciting?

    In a sense, it really is just a lower-overhead way to run a virtual machine, as long as it shares the same kernel. That's not super exciting, by itself.

    The reason that containers are more exciting than processes is the same reason that using a filesystem is more exciting than having to use a whole disk: sharing state always, inevitably, leads to brokenness. Opt-in is better than opt-out.

    When you give a program a whole filesystem to itself, sharing any data explicitly, you eliminate even the possibility that some other program scribbling on a shared area of the filesystem might break it. You don't need package managers any more, only package installers; by removing the other functions of package managers (inventory, removal) they can be radically simplified, and less complexity means less brokenness.

    When you give a program an entire network address to itself, exposing any ports explicitly, you eliminate even the possibility that some rogue program will expose a security hole by listening on a port you weren't expecting. You eliminate the possibility that it might clash with other programs on the same host, hard-coding the same port numbers or auto-discovering the same addresses.


    In addition to the exciting things on the run-time side, containers - or rather, the things you run to get containers, "images"3, present some compelling improvements to the build-time side.

    On Linux and Windows, building a software artifact for distribution to end-users can be quite challenging. It's challenging because it's not clear how to specify that you depend on certain other software being installed; it's not clear what to do if you have conflicting versions of that software that may not be the same as the versions already available on the user's computer. It's not clear where to put things on the filesystem. On Linux, this often just means getting all of your software from your operating system distributor.

    You'll notice I said "Linux and Windows"; not the usual (linux, windows, mac) big-3 desktop platforms, and I didn't say anything about mobile OSes. That's because on macOS, Android, iOS, and Windows Metro, applications already run in their own containers. The rules of macOS containers are a bit weird, and very different from Docker containers, but if you have a Mac you can check out ~/Library/Containers to see the view of the world that the applications you're running can see. iOS looks much the same.

    This is something that doesn't get discussed a lot in the container ecosystem, partially because everyone is developing technology at such a breakneck pace, but in many ways Linux server-side containerization is just a continuation of a trend that started on mainframe operating systems in the 1970s and has already been picked up in full force by mobile operating systems.

    When one builds an image, one is building a picture of the entire filesystem that the container will see, so an image is a complete artifact. By contrast, a package for a Linux package manager is just a fragment of a program, leaving out all of its dependencies, to be integrated later. If an image runs on your machine, it will (except in some extremely unusual circumstances) run on the target machine, because everything it needs to run is fully included.

    Because you build all the software an image requires into the image itself, there are some implications for server management. You no longer need to apply security updates to a machine - they get applied to one application at a time, and they get applied as a normal process of deploying new code. Since there's only one update process, which is "delete the old container, run a new one with a new image", updates can roll out much faster, because you can build an image, run tests for the image with the security updates applied, and be confident that it won't break anything. No more scheduling maintenance windows, or managing reboots (at least for security updates to applications and libraries; kernel updates are a different kettle of fish).


    That's why it's exciting. So why's it all so confusing?5

    Fundamentally the confusion is caused by there just being way too many tools. Why so many tools? Once you've accepted that your software should live in images, none of the old tools work any more. Almost every administrative, monitoring, or management tool for UNIX-like OSes depends intimately upon the ability to promiscuously share the entire filesystem with every other program running on it. Containers break these assumptions, and so new tools need to be built. Nobody really agrees on how those tools should work, and a wide variety of forces ranging from competitive pressure to personality conflicts make it difficult for the panoply of container vendors to collaborate perfectly4.

    Many companies whose core business has nothing to do with infrastructure have gone through this reasoning process:

    1. Containers are so much better than processes, we need to start using them right away, even if there's some tooling pain in adopting them.
    2. The old tools don't work.
    3. The new tools from the tool vendors aren't ready.
    4. The new tools from the community don't work for our use-case.
    5. Time to write our own tool, just for our use-case and nobody else's! (Which causes problem #3 for somebody else, of course...)

    A less fundamental reason is too much focus on scale. If you're running a small-scale web application which has a stable user-base that you don't expect a lot of growth in, there are many great reasons to adopt containers as opposed to automating your operations; and in fact, if you keep things simple, the very fact that your software runs in a container might obviate the need for a system-management solution like Chef, Ansible, Puppet, or Salt. You should totally adopt them and try to ignore the more complex and involved parts of running an orchestration system.

    However, containers are even more useful at significant scale, which means that companies which have significant scaling problems invest in containers heavily and write about them prolifically. Many guides and tutorials on containers assume that you expect to be running a multi-million-node cluster with fully automated continuous deployment, blue-green zero-downtime deploys, a 1000-person operations team. It's great if you've got all that stuff, but building each of those components is a non-trivial investment.


    So, where does that leave you, my dear reader?

    You should absolutely be adopting "container technology", which is to say, you should probably at least be using Docker to build your software. But there are other, radically different container systems - like Sandstorm - which might make sense for you, depending on what kind of services you create. And of course there's a huge ecosystem of other tools you might want to use; too many to mention, although I will shout out to my own employer's docker-as-a-service Carina, which delivered this blog post, among other things, to you.

    You shouldn't feel as though you need to do containers absolutely "the right way", or that the value of containerization is derived from adopting every single tool that you can all at once. The value of containers comes from four very simple things:

    1. It reduces the overhead and increases the performance of co-locating multiple applications on the same hardware,
    2. It forces you to explicitly call out any shared state or required resources,
    3. It creates a complete build pipeline that results in a software artifact that can be run without special installation or set-up instructions (at least, on the "software installation" side; you still might require configuration, of course), and
    4. It gives you a way to test exactly what you're deploying.

    These benefits can combine and interact in surprising and interesting ways, and can be enhanced with a wide and growing variety of tools. But underneath all the hype and the buzz, the very real benefit of containerization is basically just that it is fixing a very old design flaw in UNIX.

    Containers let you share less state, and shared mutable state is the root of all evil.


    1. If you have a more sophisticated understanding of memory, disks, and networks, you'll notice that everything I'm saying here is patently false, and betrays an overly simplistic understanding of the development of UNIX and the complexities of physical hardware and driver software. Please believe that I know this; this is an alternate history of the version of UNIX that was developed on platonically ideal hardware. The messy co-evolution of UNIX, preemptive multitasking, hardware offload for networks, magnetic secondary storage, and so on, is far too large to fit into the margins of this post. 

    2. When programs break horribly like this, it's called "multithreading". I have written some software to help you avoid it. 

    3. One runs an "executable" to get a process; one runs an "image" to get a container. 

    4. Although the container ecosystem is famously acrimonious, companies in it do actually collaborate better than the tech press sometimes give them credit for; the Open Container Project is a significant extraction of common technology from multiple vendors, many of whom are also competitors, to facilitate a technical substrate that is best for the community. 

    5. If it doesn't seem confusing to you, consider this absolute gem from the hilarious folks over at CircleCI. 

    by Glyph at October 27, 2016 09:23 AM

    October 22, 2016

    Glyph Lefkowitz

    docker run glyph/rproxy

    Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    $ mkdir -p conf/certificates;
    $ cat > conf/rproxy.ini << EOF;
    > [rproxy]
    > certificates=certificates
    > http_ports=80
    > https_ports=443
    > [hosts]
    > mysite.com_host=<other container host>
    > mysite.com_port=8080
    > EOF
    $ docker run --restart=always -v "$(pwd)"/conf:/conf \
        -p 80:80 -p 443:443 \
        glyph/rproxy;
    

    There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

    Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

    by Glyph at October 22, 2016 08:12 PM

    October 19, 2016

    Itamar Turner-Trauring

    Why Pylint is both useful and unusable, and how you can actually use it

    This is a story about a tool that caught a production-impacting bug the day before we released the code. This is also the story of a tool no one uses, and for good reason. By the time you're done reading you'll see why this tool is useful, why it's unusable, and how you can actually use it with your Python project.

    (Not a Python programmer? The same problems and solutions are likely apply to tools in your ecosystem as well.)

    Pylint saves the day

    If you're coding in Haskell the compiler's got your back. If you're coding in Java the compiler will usually lend a helping hand. But if you're coding in a dynamic language like Python or Ruby you're on your own: you don't have a compiler to catch bugs for you.

    The next best thing is a lint tool that uses heuristics to catch bugs in your code. One such tool is Pylint, and here's how I started using it.

    One day at work we realized our builds had been consistently failing for a few days, and it wasn't the usual intermittent failures. After a few days of investigating, my colleague Tom Prince discovered the problem. It was Python code that looked something like this:

    for volume in get_volumes():
        do_something(volume)
    
    for volme in get_other_volumes():
        do_something_else(volume)
    

    Notice the typo in the second for loop. Combined with the fact that Python leaks variables from blocks, the last value of volume from the first for loop was used for every iteration of the second loop.

    To see if we could prevent these problems in the future I tried Pylint, re-introduced the bug... and indeed it caught the problem. I then looked at the rest of the output to see what else it had found.

    What it had found was a serious bug. It was in code I had written a few days earlier, and the bug completely broke an important feature we were going to ship to users the very next day. Here's a heavily simplified minimal reproducer for the bug:

    list_of_printers = []
    for i in [1, 2, 3]:
        def printer():
            print(i)
        list_of_printers.append(printer)
    
    for func in list_of_printers:
        func()
    

    The intended result of this reproducer is to print:

    1
    2
    3
    

    But what will actually get printed with this code is:

    3
    3
    3
    

    When you define a nested function in Python that refers to a variable in the outside scope it binds not the value of a variable but the variable itself. In this case that means the i inside printer() ended up always getting the last value of the variable i in the for loop.

    And luckily Pylint caught that bug before it shipped; pretty great, right?

    Why no one uses Pylint

    Pylint is useful, but many projects don't use it. For example, I went and checked just now, and neither Twisted nor Django nor Flask nor Sphinx seem to use Pylint. Why wouldn't these large, sophisticated Python projects use a tool that would automatically catch bugs for them?

    One problem is that it's slow, but that's not the real problem; you can always just run it on the CI system with the other slow tests. The real problem is the amount of output.

    Here's what I mean: I ran pylint on a checkout of Twisted and the resulting output was 28,000 lines of output (at which point pylint crashed, but I'll assume that's fixed in newer releases). Let me say that again: 28,000 errors or warnings.

    That's insane.

    And to be fair Twisted has a coding standard that doesn't match the Python mainstream, but massive amounts of noise has been my experience with other projects as well. Pylint has a lot of useful errors... but also a whole lot of utterly useless garbage assumptions about how your code should look. And fundamentally it treats them all the same; e.g. there's a distinction between warnings and errors but in practice both useful and useless stuff is in the warning category.

    For example:

    W:675, 0: Class has no __init__ method (no-init)

    That's not a useful warning. Now imagine a few thousand of those.

    How you should use Pylint

    So here we have a tool that is potentially useful, but unusable in practice. What to do? Luckily Pylint has some functionality that can help: you can configure it with a whitelist of lint checks.

    First, setup Pylint to do nothing:

    1. Make a list of all the features you plausibly want to enable from the Pylint docs and configure .pylintrc to whitelist them.
    2. Comment them all out.

    At this point Pylint will do no checks. Next:

    1. Uncomment a small batch of checks, and run pylint.
    2. If the resulting errors are real problems, fix them. If the errors are utter garbage, delete those checks from the configuration.

    At this point you have a small number of probably useful checks that are passing: you can run pylint and you only will be told about new problems. In other words, you have a useful tool.

    Repeat this process a few times, or once a week, enabling a new batch of checks each time until you run out of patience or you run out of Pylint checks to enable.

    The end result will be something like this configuration or this configuration; both projects are open source under the Apache 2.0 license, so you can use those as a starting point.

    Go forth and lint

    Here's my challenge to you: if you're a Python programmer, go setup Pylint on a project today. It'll take an hour to get some minimal checks going, and one day it will save you from a production-impacting bug. If you're not a Python programmer you can probably find some equivalent tool for your language; go set that up.

    And if you're the author of a lint tool, please, try to come up with better defaults. It's better to catch 60% of bugs and have 10,000 software projects using your tool than to catch 70% of bugs and have almost no one use it.

    October 19, 2016 04:00 AM

    Glyph Lefkowitz

    docker run glyph/rproxy

    Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    $ mkdir -p conf/certificates;
    $ cat > conf/rproxy.ini << EOF;
    > [rproxy]
    > certs=certificates
    > http_ports=80
    > https_ports=443
    > [hosts]
    > mysite.com_host=<other container host>
    > mysite.com_port=8080
    > EOF
    $ docker run --restart=always -v "$(pwd)"/conf:/conf \
        -p 80:80 -p 443:443 \
        glyph/rproxy;
    

    There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

    Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

    by Glyph at October 19, 2016 12:32 AM

    October 18, 2016

    Glyph Lefkowitz

    As some of you may have guessed from the unintentional recent flurry of activity on my Twitter account, twitter feed, the service I used to use to post blog links automatically, is getting end-of-lifed. I've switched to dlvr.it for the time being, unless they send another unsolicited tweetstorm out on my behalf...

    Sorry about the noise! In the interests of putting some actual content here, maybe you would be interested to know that I was recently interviewed for PyDev of the Week?

    by Glyph at October 18, 2016 08:37 PM

    October 15, 2016

    Jonathan Lange

    servant-template: production-ready Haskell web services in 5 minutes

    If you want to write a web API in Haskell, then you should start by using my new cookiecutter template at https://github.com/jml/servant-template. It’ll get you a production-ready web service in 5 minutes or less.

    Whenever you start any new web service and you actually care about getting it working and available to users, it’s very useful to have:

    • logging
    • monitoring
    • continuous integration
    • tests
    • deployment
    • command-line parsing

    These are largely boring, but nearly essential. Logs and monitoring give you visibility into the code’s behaviour in production, tests and continuous integration help you make sure you don’t break it, and, of course, you need some way of actually shipping code to users. As an engineer who cares deeply about running code in production, these are pretty much the bare minimum for me to be able to deploy something to my users.

    The cookiecutter template at gh:jml/servant-template creates a simple Haskell web API service that does all of these things:

    As the name suggests, all of this enables writing a servant server. Servant lets you declaring web APIs at the type-level and then using those API specifications to write servers. It’s hard to overstate just how useful it is for writing RESTful APIs.

    Get started with:

    $ cookiecutter gh:jml/servant-template
    project_name [awesome-service]: awesome-service
    ...
    $ cd awesome-service
    $ stack test
    $ make image
    ...
    sha256:30e4c9a5f29a2c4caa44e226859dd094c6ac9d297de0d1d2024e8a981a7c8f86
    awesome-service:unversioned
    $ docker run awesome-service:latest --help
    awesome-service - TODO fill this in
    
    Usage: awesome-service --port PORT [--access-logs ARG] [--log-level ARG]
                           [--ghc-metrics]
      One line description of project
    
    Available options:
      -h,--help                Show this help text
      --port PORT              Port to listen on
      --access-logs ARG        How to log HTTP access
      --log-level ARG          Minimum severity for log messages
      --ghc-metrics            Export GHC metrics. Requires running with +RTS.
    $ docker run -p 8080:80 awesome-service --port 80
    [2016-10-16T20:50:07.983292987000] [Informational] Listening on :80
    

    For this to work, you’ll need to have Docker installed on your system. I’ve tested it on my Mac with Docker Machine, but haven’t yet with Linux.

    You might have to run stack docker pull before make image, if you haven’t already used stack to build things from within Docker.

    Once it’s up and running, you can browse to http://localhost:8080/ (or http://$(docker-machine ip):8080/) if you’re on a Mac, and you’ll see a simple HTML page describing the API and giving you a link to the /metrics page, which is where all the Prometheus metrics are exported.

    There you have it, a production-ready web service. At least for some values of “production-ready”.

    Of course, the API it offers is really simple. You can make it your own by editing the API definition and the server implementation to make it really your own. Note these two are in separate libraries to make it easier to generate client code.

    The template comes with a test suite that uses servant-quickcheck to guarantee that none of your endpoints return 500s, take longer than 100ms to serve, and that all the 201s include Location headers.

    If you’re so inclined, you could push the created Docker image to a repository somewhere—it’s around 25MB when built. Then, people could use it and no one would have to know that it’s Haskell, they’d just notice a fast web service that works.

    As the README says, I’ve made a few questionable decisions when building this. If you disagree, or think I could have done anything better I’d love to know. If you use this to build something cool, or even something silly, please let me know on Twitter.

    by jml at October 15, 2016 11:00 PM

    October 14, 2016

    Itamar Turner-Trauring

    How to find a programming job you won't hate

    Somewhere out there is a company that wants to hire you as a software engineer. Working for that company is a salesperson whose incentives were set by an incompetent yet highly compensated upper management. The salesperson has just made a sale, and in return for a large commission has promised the new customer twice the features in half the time.

    The team that wants to hire you will spend the next three months working evenings and weekends. And then, with a job badly done, they'll move on to the next doomed project.

    You don't want to work for this company, and you shouldn't waste your time applying there.

    When you're looking for a new programming job you want to find it quickly:

    • If your current job sucks you want to find a new place before you hit the unfortunate gotta-quit-today moment.
    • If you're not working you don't want your savings to run out. You have been saving money, right?
    • Either way, looking for a job is no fun.

    Assuming you can afford to be choosy, you'll want to speed up the process by filtering out as many companies as possible in advance. There are many useful ways to filter your list down: your technical interests, the kinds of company you want to work for, location.

    In this post, however, I'd like to talk about ways to filter out companies you'd hate. That is, companies with terrible work conditions.

    Talk to your friends

    Some companies have an bad reputation, some have a great reputation. But once a company is big enough different teams can end up with very different work environments.

    Talking to someone who actually works at a company will give you much better insight about how things work more locally. They can tell you which groups to avoid, and which groups have great leadership.

    For example, Amazon does not have a very good reputation as a workplace, but I know someone who enjoys his job there and his very reasonable working hours.

    Glassdoor

    For companies where you don't have contacts Glassdoor can be a great resource. Glassdoor is a site that lets employees post anonymous salaries and reviews of their company.

    The information is anonymous, so you have to be a little skeptical, especially when there's only a few reviews. And you need to pay attention to the reviewer's role, location, and the year it was posted. Once you take all that into account the reviews can often be very informative.

    During my last job search I found one company in the healthcare area with many complaints of long working hours. One of Glassdoor's features is a way for a company to reply to reviews. In this case the CEO himself answered, explaining that they work hard because "sick patients can't wait."

    Personally I'd rather not work for someone who confuses working long hours with increased output or productivity.

    Read company materials

    After you've checked out Glassdoor the next thing to look at is the job posting itself, along with the company's website. These are often written by people other than the engineering team, but you can still learn a lot from them.

    Sometimes you'll get the sense the company is actually a great place to work for. For example, Memrise has this to say in their Software Engineering postings:

    If you aren’t completely confident that you fit our exact criteria, please get in touch immediately. Humility is a wonderful thing and we’re not interested in hiring ‘rockstars’ or ‘ninjas’.

    On the other hand, consider a job post I found for an Automation Test Engineer. First we learn:

    Must be able to execute scripts during off hours if required.

    This is peculiar; if they're automated why does a person need to run them manually? Later on we read:

    This isn’t the job for someone looking for a traditional 8-5 position, but it’s a great role for someone who is hungry for a terrific opportunity in a fast-paced, state of the art environment.

    Apparently they consider working 8-5 traditional, they will work their employees much longer hours, and they think they're "state of the art" even though they haven't heard of cron.

    Notice, by the way, that it's worth reading all of a company's job postings. Other job postings from the same company are less informative about working conditions than the one I just quoted.

    Interviews

    Finally, if a company has passed the previous filters and you've gotten an interview, make sure you ask about working conditions. Tactfully, of course, and once you've demonstrated your value, but if you don't ask you won't know until it's too late. Here are some sample questions to get you started:

    • What's your typical work day like?
    • How many hours do you end up working?
    • How do you manage project deadlines?

    Depending on the question you might want to ask individual contributors rather than managers. But I've had managers tell me outright they want employees to work really long hours.

    --

    There are many bad software jobs out there. But you don't need to work evenings or weekends to succeed as a programmer.

    If you want to find a programming job with a sane workweek, a job you'll actually enjoy, sign up for the free email course below for more tips and tricks.

    October 14, 2016 04:00 AM

    October 09, 2016

    Thomas Vander Stichele

    Puppet/puppetdb/storeconfigs validation issues

    Over the past year I’ve chipped away at setting up new servers for apestaart and managing the deployment in puppet as opposed to a by now years old manual single server configuration that would be hard to replicate if the drives fail (one of which did recently, making this more urgent).

    It’s been a while since I felt like I was good enough at puppet to love and hate it in equal parts, but mostly manage to control a deployment of around ten servers at a previous job.

    Things were progressing an hour or two here and there at a time, and accelerated when a friend in our collective was launching a new business for which I wanted to make sure he had a decent redundancy setup.

    I was saving the hardest part for last – setting up Nagios monitoring with Matthias Saou’s puppet-nagios module, which needs External Resources and storeconfigs working.

    Even on the previous server setup based on CentOS 6, that was a pain to set up – needing MySQL and ruby’s ActiveRecord. But it sorta worked.

    It seems that for newer puppet setups, you’re now supposed to use something called PuppetDB, which is not in fact a database on its own as the name suggests, but requires another database. Of course, it chose to need a different one – Postgres. Oh, and PuppetDB itself is in Java – now you get the cost of two runtimes when you use puppet!

    So, to add useful Nagios monitoring to my puppet deploys, which without it are quite happy to be simple puppet apply runs from a local git checkout on each server, I now need storedconfigs which needs puppetdb which pulls in Java and Postgres. And that’s just so a system that handles distributed configuration can actually be told about the results of that distributed configuration and create a useful feedback cycle allowing it to do useful things to the observed result.

    Since I test these deployments on local vagrant/VirtualBox machines, I had to double their RAM because of this – even just the puppetdb java server by default starts with 192MB reserved out of the box.

    But enough complaining about these expensive changes – at least there was a working puppetdb module that managed to set things up well enough.

    It was easy enough to get the first host monitored, and apart from some minor changes (like updating the default Nagios config template from 3.x to 4.x), I had a familiar Nagios view working showing results from the server running Nagios itself. Success!

    But all runs from the other vm’s did not trigger adding any exported resources, and I couldn’t find anything wrong in the logs. In fact, I could not find /var/log/puppetdb/puppetdb.log at all…

    fun with utf-8

    After a long night of experimenting and head scratching, I chased down a first clue in /var/log/messages saying puppet-master[17702]: Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB

    I traced that down to puppetdb/char_encoding.rb, and with my limited ruby skills, I got a dump of the offending byte sequence by adding this code:

    Puppet.warning "Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB"
    File.open('/tmp/ruby', 'w') { |file| file.write(str) }
    Puppet.warning "THOMAS: is here"

    (I tend to use my name in debugging to have something easy to grep for, and I wanted some verification that the File dump wasn’t triggering any errors)
    It took a little time at 3AM to remember where these /tmp files end up thanks to systemd, but once found, I saw it was a json blob with a command to “replace catalog”. That could explain why my puppetdb didn’t have any catalogs for other hosts. But file told me this was a plain ASCII file, so that didn’t help me narrow it down.

    I brute forced it by just checking my whole puppet tree:


    find . -type f -exec file {} \; > /tmp/puppetfile
    grep -v ASCII /tmp/puppetfile | grep -v git

    This turned up a few UTF-8 candidates. Googling around, I was reminded about how terrible utf-8 handling was in ruby 1.8, and saw information that puppet recommended using ASCII only in most of the manifests and files to avoid issues.

    It turned out to be a config from a webalizer module:

    webalizer/templates/webalizer.conf.erb: UTF-8 Unicode text

    While it was written by a Jesús with a unicode name, the file itself didn’t have his name in it, and I couldn’t obviously find where the UTF-8 chars were hiding. One StackOverflow post later, I had nailed it down – UTF-8 spaces!

    00004ba0 2e 0a 23 c2 a0 4e 6f 74 65 20 66 6f 72 20 74 68 |..#..Note for th|
    00004bb0 69 73 20 74 6f 20 77 6f 72 6b 20 79 6f 75 20 6e |is to work you n|

    The offending character is c2 a0 – the non-breaking space

    I have no idea how that slipped into a comment in a config file, but I changed the spaces and got rid of the error.

    Puppet’s error was vague, did not provide any context whatsoever (Where do the bytes come from? Dump the part that is parseable? Dump the hex representation? Tell me the position in it where the problem is?), did not give any indication of the potential impact, and in a sea of spurious puppet warnings that you simply have to live with, is easy to miss. One down.

    However, still no catalogs on the server, so still only one host being monitored. What next?

    users, groups, and permissions

    Chasing my next lead turned out to be my own fault. After turning off SELinux temporarily, checking all permissions on all puppetdb files to make sure that they were group-owned by puppetdb and writable for puppet, I took the last step of switching to that user role and trying to write the log file myself. And it failed. Huh? And then id told me why – while /var/log/puppetdb/ was group-writeable and owned by puppetdb group, my puppetdb user was actually in the www-data group.

    It turns out that I had tried to move some uids and gids around after the automatic assignment puppet does gave different results on two hosts (a problem I still don’t have a satisfying answer for, as I don’t want to hard-code uids/gids for system accounts in other people’s modules), and clearly I did one of them wrong.

    I think a server that for whatever reason cannot log should simply not start, as this is a critical error if you want a defensive system.

    After fixing that properly, I now had a puppetdb log file.

    resource titles

    Now I was staring at an actual exception:

    2016-10-09 14:39:33,957 ERROR [c.p.p.command] [85bae55f-671c-43cf-9a54-c149cede
    c659] [replace catalog] Fatal error on attempt 0
    java.lang.IllegalArgumentException: Resource '{:type "File", :title "/var/lib/p
    uppet/concat/thomas_vimrc/fragments/75_thomas_vimrc-\" allow adding additional
    config through .vimrc.local_if filereadable(glob(\"~_.vimrc.local\"))_\tsource
    ~_.vimrc.local_endif_"}' has an invalid tag 'thomas:vimrc-" allow adding additi
    onal config through .vimrc.local
    if filereadable(glob("~/.vimrc.local"))
    source ~/.vimrc.local
    endif
    '. Tags must match the pattern /\A[a-z0-9_][a-z0-9_:\-.]*\Z/.
    at com.puppetlabs.puppetdb.catalogs$validate_resources.invoke(catalogs.
    clj:331) ~[na:na]

    Given the name of the command (replace catalog), I felt certain this was going to be the problem standing between me and multiple hosts being monitored.

    The problem was a few levels deep, but essentially I had code creating fragments of vimrc files using the concat module, and was naming the resources with file content as part of the title. That’s not a great idea, admittedly, but no other part of puppet had ever complained about it before. Even the files on my file system that store the fragments, which get their filename from these titles, happily stored with a double quote in its name.

    So yet again, puppet’s lax approach to specifying types of variables at any of its layers (hiera, puppet code, ruby code, ruby templates, puppetdb) in any of its data formats (yaml, json, bytes for strings without encoding information) triggers errors somewhere in the stack without informing whatever triggered that error (ie, the agent run on the client didn’t complain or fail).

    Once again, puppet has given me plenty of reasons to hate it with a passion, tipping the balance.

    I couldn’t imagine doing server management without a tool like puppet. But you love it when you don’t have to tweak it much, and you hate it when you’re actually making extensive changes. Hopefully after today I can get back to the loving it part.

    flattr this!

    by Thomas at October 09, 2016 08:31 PM

    October 07, 2016

    Itamar Turner-Trauring

    More learning, less time: how to quickly gather new tools and techniques

    Update: Added newsletters to the list.

    Have you ever worked hard to solve a problem, only to discover a few weeks later an existing design pattern that was even better than your solution? Or built an internal tool, only to discover an existing tool that already solved the problem?

    To be a good software engineer you need a good toolbox. That means software tools you can use when appropriate, design patterns so you don't have to reinvent the wheel, testing techniques... the list goes on. Learning all existing tools and techniques is impossible, and just keeping up with every newly announced library would be a full time job.

    How do you learn what you need to know to succeed at your work? And how can you do so without spending a huge amount of your free time reading and programming just to keep up?

    A broad toolbox, the easy way

    To understand how you can build your toolbox, consider the different levels of knowledge you can have. You can be an expert on a subject, or you can have some basic understanding, or you might just have a vague awareness that the subject exists.

    For our purposes building awareness is the most important of the three. You will never be an expert in everything, and even basic understanding takes some time. But broad awareness takes much less effort: you just need to remember small amounts of information about each tool or technique.

    You don't need to be an expert on a tool or technique, or even use it at all. As long as you know a tool exists you'll be able to learn more about it when you need to.

    For example, there is a tool named Logstash that moves server logs around. That's pretty much all you have to remember about it, and it takes just 3 seconds to read that previous sentence. Maybe you'll never use that information... or maybe one day you'll need to get logs from a cluster of machines to a centralized location. At that point you'll remember the name "Logstash", look it up, and have the motivation to actually go read the documentation and play around with it.

    Design patterns and other techniques take a bit more effort to gain useful awareness, but still, awareness is usually all you need. For example, property-based testing is hugely useful. But all it takes is a little reading to gain awareness, even if it will take more work to actually use it.

    The more tools and techniques you are aware of the more potential solutions you will have to the problems you encounter while programming. Being aware of a broad range of tools and techniques is hugely valuable and easy to achieve.

    Building your toolbox

    How do you build your toolbox? How do you find the tools and techniques you need to be aware of? Here are three ways to do so quickly and efficiently.

    Newsletters

    A great way to learn new tools and techniques are newsletters like Ruby Weekly. There are newsletters on many languages and topics, from DevOps to PostgreSQL.

    Newsletters typically include not just links but also short descriptions, so you can skim them and gain awareness even without reading all the articles. In contrast, sites like Reddit or Hacker News only include links, so you gain less information unless you spend more time reading.

    The downside of newsletters is that they focus on the new. You won't hear about a classic design pattern or a standard tool unless someone happens to write a new blog post about it. You should therefore rely on additional sources as well.

    Conference proceedings

    Another broader source of tools and techniques are conferences. Conference talks are chosen by a committee with some understanding of the conference subject. Often they can be quite competitive: I believe the main US Python conference accepts only a third of proposals. And good conferences will aim for a broad range of talks, within the limits of their target audience. As a result conferences are a great way to discover relevant, useful tools and techniques, both new and old.

    Of course, going to a conference can be expensive and time consuming. Luckily you don't have to go to the conference to benefit.

    Just follow this quick procedure:

    1. Find a conference relevant to your interests. E.g. if you're a Ruby developer find a conference like RubyConf.
    2. Skim the talk descriptions; they're pretty much always online.
    3. If something sounds really interesting, there's a decent chance you can find a recording of the talk, or at least the slides.
    4. Mostly however you just need to see what people are talking about and make a mental note of things that sound useful or interesting.

    For example, skimming the RubyConf 2016 program I see there's something called OpenStruct for dynamic data objects, FactoryGirl which is apparently a testing-related library, a library for writing video games, an explanation of hooks and so on. I'm not really a Ruby programmer, but if I ever want to write a video game in Ruby I'll go find that talk.

    Meetups and user groups

    Much like conferences, meetups are a great way to learn about a topic. And much like conferences, you don't actually have to go to the meetup to gain awareness.

    For example, the Boston Python Meetup has had talks in recent months about CPython internals, microservices, BeeKeeper which is something for REST APIs, the Plone content management system, etc..

    I've never heard of BeeKeeper before, but now I know its name and subject. That's very little information, gained very quickly... but next time I'm building a REST API with Python I can go look it up and see if it's useful.

    If you don't know what a "REST API" is, well, that's another opportunity for growing your awareness: do a Google search and read a paragraph or two. If it's relevant to your job, keep reading. Otherwise, make a mental note and move on.

    Book catalogs

    Since your goal is awareness, not in-depth knowledge, you don't need to read a book to gain something: the title and description may be enough. Technical book publishers are in the business of publishing relevant books, so browsing their catalog can be very educational.

    For example, the Packt book catalog will give you awareness of a long list of tools you might find useful one day. You can see that "Unity" is something you use for game development, "Spark" is something you use for data science, etc.. Spend 20 seconds reading the Spark book description and you'll learn Spark does "statistical data analysis, data visualization, predictive modeling" for "Big Data". If you ever need to do that you now have a starting point for further reading.

    Using your new toolbox

    There are only so many hours in the day, so many days in a year. That means you need to work efficiently, spending your limited time in ways that have the most impact.

    The techniques you've just read do exactly that: you can learn more in less time by spending the minimum necessary to gain awareness. You only need to spend the additional time to gain basic understanding or expertise for those tools and techniques you actually end up using. And having a broad range of tools and techniques means you can get more done at work, without reinventing the wheel every time.

    You don't need to work evenings or weekends to be a successful programmer! This post covers just some of the techniques you can use to be more productive within the limits of a normal working week. To help you get there I'm working on a book, The Programmer's Guide to a Sane Workweek.

    Sign up in the email subscription form below to learn more about the book, and to get notified as I post more tips and tricks on how you can become a better software engineer.

    October 07, 2016 04:00 AM

    September 24, 2016

    Hynek Schlawack

    Sharing Your Labor of Love: PyPI Quick and Dirty

    A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

    by Hynek Schlawack (hs@ox.cx) at September 24, 2016 12:00 PM

    September 17, 2016

    Glyph Lefkowitz

    Hitting The Wall

    I’m an introvert.

    I say that with a full-on appreciation of just how awful thinkpieces on “introverts” are.

    However, I feel compelled to write about this today because of a certain type of social pressure that a certain type of introvert faces. Specifically, I am a high-energy introvert.

    Cementing this piece’s place in the hallowed halls of just awful thinkpieces, allow me to compare my mild cognitive fatigue with the plight of those suffering from chronic illness and disability1. There’s a social phenomenon associated with many chronic illnesses, “but you don’t LOOK sick”, where well-meaning people will look at someone who is suffering, with no obvious symptoms, and imply that they really ought to be able to “be normal”.

    As a high-energy introvert, I frequently participate in social events. I go to meet-ups and conferences and I engage in plenty of public speaking. I am, in a sense, comfortable extemporizing in front of large groups of strangers.

    This all sounds like extroverted behavior, I know. But there’s a key difference.

    Let me posit two axes for personality type: on the X axis, “introvert” to “extrovert”, and on the Y, “low energy” up to “high energy”.

    The X axis describes what kinds of activities give you energy, and the Y axis describes how large your energy reserves are for the other type.

    Notice that I didn’t say which type of activity you enjoy.

    Most people who would self-describe as “introverts” are in the low-energy/introvert quadrant. They have a small amount of energy available for social activities, which they need to frequently re-charge by doing solitary activities. As a result of frequently running out of energy for social activities, they don’t enjoy social activities.

    Most people who would self-describe as “extroverts” are also on the “low-energy” end of the spectrum. They have low levels of patience for solitary activity, and need to re-charge by spending time with friends, going to parties, etc, in order to have the mental fortitude to sit still for a while and focus. Since they can endlessly get more energy from the company of others, they tend to enjoy social activities quite a bit.

    Therefore we have certain behaviors we expect to see from “introverts”. We expect them to be shy, and quiet, and withdrawn. When someone who behaves this way has to bail on a social engagement, this is expected. There’s a certain affordance for it. If you spend a few hours with them, they may be initially friendly but will visibly become uncomfortable and withdrawn.

    This “energy” model of personality is of course an oversimplification - it’s my personal belief that everyone needs some balance of privacy and socialization and solitude and eventually overdoing one or the other will be bad for anyone - but it’s a useful one.

    As a high-energy introvert, my behavior often confuses people. I’ll show up at a week’s worth of professional events, be the life of the party, go out to dinner at all of them, and then disappear for a month. I’m not visibily shy - quite the opposite, I’m a gregarious raconteur. In fact, I quite visibly enjoy the company of friends. So, usually, when I try to explain that I am quite introverted, this claim is met with (quite understandable) skepticism.

    In fact, I am quite functionally what society expects of an “extrovert” - until I hit the wall.


    In endurance sports, one is said to “hit the wall” at the point where all the short-term energy reserves in one’s muscles are exhausted, and there is a sudden, dramatic loss of energy. Regardless, many people enjoy endurance sports; part of the challenge of them is properly managing your energy.

    This is true for me and social situations. I do enjoy social situations quite a bit! But they are nevertheless quite taxing for me, and without prolonged intermissions of solitude, eventually I get to the point where I can no longer behave as a normal social creature without an excruciating level of effort and anxiety.

    Several years ago, I attended a prolonged social event2 where I hit the wall, hard. The event itself was several hours too long for me, involved meeting lots of strangers, and in the lead-up to it I hadn’t had a weekend to myself for a few weeks due to work commitments and family stuff. Towards the end I noticed I was developing a completely flat affect, and had to start very consciously performing even basic body language, like looking at someone while they were talking or smiling. I’d never been so exhausted and numb in my life; at the time I thought I was just stressed from work.

    Afterwards though, I started having a lot of weird nightmares, even during the daytime. This concerned me, since I’d never had such a severe reaction to a social situation, and I didn’t have good language to describe it. It was also a little perplexing that what was effectively a nice party, the first half of which had even been fun for me, would cause such a persistent negative reaction after the fact. After some research, I eventually discovered that such involuntary thoughts are a hallmark of PTSD.

    While I’ve managed to avoid this level of exhaustion before or since, this was a real learning experience for me that the consequences of incorrectly managing my level of social interaction can be quite severe.

    I’d rather not do that again.


    The reason I’m writing this, though3, is not to avoid future anxiety. My social energy reserves are quite large enough, and I now have enough self-knowledge, that it is extremely unlikely I’d ever find myself in that situation again.

    The reason I’m writing is to help people understand that I’m not blowing them off because I don’t like them. Many times now, I’ve declined or bailed an invitation from someone, and later heard that they felt hurt that I was passive-aggressively refusing to be friendly.

    I certainly understand this reaction. After all, if you see someone at a party and they’re clearly having a great time and chatting with everyone, but then when you invite them to do something, they say “sorry, too much social stuff”, that seems like a pretty passive-aggressive way to respond.

    You might even still be skeptical after reading this. “Glyph, if you were really an introvert, surely, I would have seen you looking a little shy and withdrawn. Surely I’d see some evidence of stage fright before your talks.”

    But that’s exactly the problem here: no, you wouldn’t.

    At a social event, since I have lots of energy to begin with, I’ll build up a head of steam on burning said energy that no low-energy introvert would ever risk. If I were to run out of social-interaction-juice, I’d be in the middle of a big crowd telling a long and elaborate story when I find myself exhausted. If I hit the wall in that situation, I can’t feel a little awkward and make excuses and leave; I’ll be stuck creepily faking a smile like a sociopath and frantically looking for a way out of the converstaion for an hour, as the pressure from a large crowd of people rapidly builds up months worth of nightmare fuel from my spiraling energy deficit.

    Given that I know that’s what’s going to happen, you won’t see me when I’m close to that line. You won’t be in at my desk when I silently sit and type for a whole day, or on my couch when I quietly read a book for ten hours at a time. My solitary side is, by definition, hidden.

    But, if I don’t show up to your party, I promise: it’s not you, it’s me.


    1. In all seriousness: this is a comparison of kind and not of degree. I absolutely do not have any illusions that my minor mental issues are a serious disability. They are - by definition, since I do not have a diagnosis - subclinical. I am describing a minor annoyance and frequent miscommunication in this post, not a personal tragedy. 

    2. I’ll try to keep this anonymous, so hopefully you can’t guess - I don’t want to make anyone feel bad about this, since it was my poor time-management and not their (lovely!) event which caused the problem. 

    3. ... aside from the hope that maybe someone else has had trouble explaining the same thing, and this will be a useful resource for them ... 

    by Glyph at September 17, 2016 09:18 PM

    September 16, 2016

    Itamar Turner-Trauring

    Introducing the Programmer's Guide to a Sane Workweek

    I'm working on a book: The Programmer's Guide to a Sane Workweek, a guide to how you can achieve a saner, shorter workweek. If you want to get a free course based on the the book signup in the email subscription at the end of the post. Meanwhile, here's the first excerpt from the book:

    • Are you tired of working evenings and weekends, of late projects and unrealistic deadlines?
    • Do you have children you want to see for more than just an hour in the evening after work?
    • Or do you want more time for side projects or to improve your programming skills?
    • In short, do you want a sane workweek?

    A sane workweek is achievable: for the past 4 years I've been working less than 40 hours a week.

    Soon after my daughter was born I quit my job as a product manager at Google and became a part-time consultant, writing software for clients. I wrote code for 20-something hours each week while our child was in daycare, and I spent the rest of my time taking care of our kid.

    Later I got a job with one of my clients, a startup, where I worked as an employee but had a 28-hour workweek. These days I work at another startup, with a 35-hour workweek.

    I'm not the only software engineer who has chosen to work a saner, shorter workweek. There are contractors who work part-time, spending the rest of their time starting their own business. There are employees with specialized skills who only work two days a week. There are even entrepreneurs who have deliberately created a business that isn't all-consuming.

    Would you like to join us?

    If you're a software developer working crazy hours then this book can help you get to a saner schedule. Of course what makes a schedule sane or crazy won't be the same for me as it is for you. You should spend some time thinking about what exactly it is that you want.

    How much time do you want to spend working each week?

    • 40 hours?
    • 32 hours?
    • 20 hours?
    • Or do you never want to work again?

    Depending on what you want there are different paths you can pursue.

    Some paths to a saner workweek

    Here are some ways you can reduce your workweek; I'll cover them in far more detail in later chapters of the book:

    Normalizing your workweek

    If you're working a lot more than 40 hours a week you always have the option of unilaterally normalizing your hours. That is, reducing your hours down to 40 hours or 45 hours or whatever you think is fair. Chances are your productivity and output will actually increase. You might face problems, however, if your employer cares more about hours "worked" than about output.

    Reducing overhead

    Chances are that the hours your employer counts as your work are just part of the time you spend on your job. In particular, commuting can take another large bite out your free time. Cut down on commuting and long lunch breaks and you've gotten some of that time back without any reduction in the hours your boss cares about.

    Negotiating a shorter workweek at your current job

    If you want a shorter-than-normal workweek you can try to negotiate that at your current job. Your manager doesn't want to replace a valued, trained employee: hiring new people is expensive and risky. That means you have an opening to negotiate shorter hours. This is one of the most common ways software engineers I know have reduced their hours.

    Find a shorter workweek at a new job

    If you're looking for a 40-hour workweek this is mostly about screening for a good company culture as part of your interview process. If you want a shorter-than-normal workweek you will need to negotiate a better job offer. That usually means your salary but you can sometimes negotiate shorter working hours. This path can be tricky; I've managed to do it, but have also been turned down, and I know of other people who have failed. It's easier if you've already worked for the company as a consultant, so they know what they're getting. Alternatively if your previous (ideally, your current) job gave you a shorter workweek you'll have better negotiating leverage.

    Long-term contracts

    Instead of working as an employee you can take on long-term contract work, often through an agency. The contract can specify how many hours you will work, and shorter workweeks are sometimes possible. You can even get paid overtime!

    Consulting

    Instead of taking on long-term work, which is similar in many ways to being an employee, you go out and find project work for yourself. That means you need to spend something like half your time on marketing. By marketing well and providing high value to your clients you can charge high rates, allowing you to work reasonable hours.

    Product business

    All the paths so far involved exchanging money for time, in one form or another. As a software engineer you have another choice: you can make a product once and easily sell that same product multiple times. That means your income is no longer directly tied to how many hours you work. You'll need marketing and other business skills to do so, and you won't just be writing code.

    Early retirement

    Finally, if you don't want to work ever again there is the path of early retirement. That doesn't mean you can't get make money afterwards; it means you no longer have to make a living, you've earned enough that your time is your own. To get there you'll need very low living expenses, and a high saving rate while you're still working. Luckily programmers tend to get paid well.

    Which path will you take?

    Each of these paths has its own set of requirements and trade-offs, so it's worth considering which one fits your needs. At different times of your life you might prefer one path, and later you might prefer another. For example, I've worked as both a consultant and a part-time employee.

    What kind of work environment do you want right now?

    • Do you want to work from your spare bedroom?
    • Do you like having co-workers?
    • Do you want to start your own business?
    • Do you want to just code, or do you want to expand your skills beyond that?

    A later chapter will cover choosing your path in more detail. For now, take a little time to think it through and imagine what your ideal job would be like. Combine that with your weekly hours goal you should get some sense of which path is best for you.

    It won't be easy

    Working a sane workweek is not something corporate culture encourages, at least in the US. That means you won't be following the default, easy path that most workers do: you're going to need to do some work to get to your destination. In later chapters I'll explain how you can acquire the prerequisites for your chosen path, but for now here's a summary:

    • You'll need to get your engineering skills to a place where you're both productive and can work independently. As an employee this will help you negotiate with your employer. As a contractor or consultant it will help get you work.
    • You'll need to reduce your living expenses. You can then afford to work fewer hours, and the larger your savings in the bank the more time you can take to look for a new job. Plus it makes for a better negotiating position.
    • You'll need to be able to negotiate successfully, whether it's with your employer or with clients.
    • Finally, you'll need to have the self-confidence or stubbornness to choose and stick to a path that most people don't take.

    How much do you really want to work a sane workweek? Do you care enough to make the necessary effort?

    It won't be easy, but I think it's worth it.

    Shall we get started? Sign up below to get a free course that will take you through the first steps of your journey.

    September 16, 2016 04:00 AM

    September 15, 2016

    Moshe Zadka

    Post-Object-Oriented Design

    In the beginning, came the so-called “procedural” style. Data was data, and behavior, implemented as procedure, were separate things. Object-oriented design is the idea to bundle data and behavior into a single thing, usually called “classes”. In return for having to tie the two together, the thought went, we would get polymorphism.

    Polymorphism is pretty neat. We send different objects the same message, for example, “turn yourself into a string”, and they respond appropriately — each according to their uniquely defined behavior.

    But what if we could separate the data and beahvior, and still get polymorphism? This is the idea behind post-object-oriented design.

    In Python, we achieve this with two external packages. One is the “attr” package. This package allows a useful way to define bundles of data, that still exhibit the minimum amount of behavior we do want: initialization, string representation, hashing and more.

    The other is the “singledispatch” package (available as functools.singledispatch in Python 3.4+).

    import attr
    import singledispatch
    

    In order to be specific, we imagine a simple protocol. The low-level details of the protocol do not concern us, but we assume some lower-level parsing allows us to communicate in dictionaries back and forth (perhaps serialized/deserialized using JSON).

    Our protocol is one to send changes to a map. The only two messages are “set”, to set a key to a given value, and “delete”, to delete a key.

    messages = (
    {
        'type': 'set',
        'key': 'language',
        'value': 'python'
    },
    {
        'type': 'delete',
        'key': 'human'
    }
    )
    

    We want to represent those as attr-based classes.

    @attr.s
    class Set(object):
        key = attr.ib()
        value = attr.ib()
    
    @attr.s
    class Delete(object):
        key = attr.ib()
    
    print(Set(key='language', value='python'))
    print(Delete(key='human'))
    
    Set(key='language', value='python')
    Delete(key='human')
    

    When incoming dictionaries arrive, we want to convert them to the logical classes. This code could not be simpler, in this example. (The reason is mostly because the protocol is simple.)

    def from_dict(dct):
        tp = dct.pop('type')
        name_to_klass = dict(set=Set, delete=Delete)
        try:
            klass = name_to_klass[tp]
        except KeyError:
            raise ValueError('unknown type', tp)
        return klass(**dct)
    

    Note how we take advantage of the fact that attr-based classes accept correctly-named keyword arguments.

    from_dict(dict(type='set', key='name', value='myname')), from_dict(dict(type='delete', key='data'))
    
    (Set(key='name', value='myname'), Delete(key='data'))
    

    But this was easy! There was no need for polymorphism: we always get one type in (dictionaries), and we consult a mapping to decide which type to produce.

    However, for serialization, we do need polymorphism. Enter our second tool — the singledispatch package. The default function is equivalent to a method defined on “object”: the ultimate super-class. Since we do not want to serialize generic objects, our default implementation errors out.

    @singledispatch.singledispatch
    def to_dict(obj):
        raise TypeError("cannot serialize", obj)
    

    Now, we implement the actual serializers. The names of the functions are not important. To emphasize they should not be used directly, we make them “private” by prepending an underscore.

    @to_dict.register(Set)
    def _to_dict_set(st):
        return dict(type='set', key=st.key, value=st.value)
    
    @to_dict.register(Delete)
    def _to_dict_delete(dlt):
        return dict(type='delete', key=dlt.key)
    

    Indeed, we do not call them directly.

    print(to_dict(Set(key='k', value='v')))
    print(to_dict(Delete(key='kk')))
    
    {'type': 'set', 'value': 'v', 'key': 'k'}
    {'type': 'delete', 'key': 'kk'}
    

    However, arbitrary objects cannot be serialized.

    try:
        to_dict(object())
    except TypeError as e:
        print e
    
    ('cannot serialize', <object object at 0x7fbdb254ac60>)
    

    Now that the structure of adding such an “external method” has been shown, another example can be given: “act on”: applying the changes requested to an in-memory map.

    @singledispatch.singledispatch
    def act_on(command, d):
        raise TypeError("Cannot act on", command)
    
    @act_on.register(Set)
    def act_on_set(st, d):
        d[st.key] = st.value
    
    @act_on.register(Delete)
    def act_on_delete(dlt, d):
        del d[dlt.key]
    
    d = {}
    act_on(Set(key='name', value='woohoo'), d)
    print("After setting")
    print(d)
    act_on(Delete(key='name'), d)
    print("After deleting")
    print(d)
    
    After setting
    {'name': 'woohoo'}
    After deleting
    {}
    

    In this case, we kept the functionality “near” the code. However, note that the functionality could be implemented in a different module: these functions, even though they are polymorphic, follow Python namespace rules. This is useful: several different modules could implement “act_on”: for example, an in-memory map (as we defined above), a module using Redis or a module using a SQL database.

    Actual methods are not completely obsolete. It would still be best to make methods do anything that would require private attribute access. In simple cases, as above, there is no difference between the public interface and the public implementation.

    by moshez at September 15, 2016 06:03 AM

    September 09, 2016

    Itamar Turner-Trauring

    How to choose a side project

    If you're a programmer just starting out you'll often get told to work on side projects, beyond what you do at school or work. But there are so many things you could be doing: what should you be working on? How do you choose a side project you will actually finish? How will you make sure you're learning something?

    Keep in mind that you don't actually have to work on side projects to be a good programmer. I know many successful software engineers who code only at their job and spend their free time on other hobbies. But if you do want to work on software in your spare time there are two different approaches you can take.

    To understand these approaches let's consider a real side project that managed to simultaneously both succeed and fail.

    Long ago, in an Internet far far away

    Back in 2000 my friend Glyph started a small project called Twisted Reality. It was supposed to be a game engine, with the goal of implementing a particularly complex and sophisticated game.

    Since the game had a chat system, and web server, and other means of communication the game grew a networking engine. Glyph and his friends hung out on the Internet Relay Chat (IRC) Python channel and every time someone asked a networking question they'd tell them "use Twisted Reality!" Over time more people would show up needing a small feature added to the networking engine, so they'd submit a patch. That's how I became a Twisted Reality contributor.

    Eventually the networking engine grew so big that Twisted Reality was split into two projects: the Twisted networking framework and the Reality game engine. These days Twisted is used by companies like Apple, Cisco and Yelp, and is still going strong. The game engine has been through multiple rewrites, but the game it was designed for has never been built.

    Approach #1: solving a problem

    The difference between Twisted, a successful side project, and the game that never got written is that Twisted solved a specific, limited problem. If you need to write some networking code in Python then Twisted will help you get it done quickly and well. The game, however, was so ambitious that it was never started: there was always another simulation feature to be added to the game engine first.

    If you are building a side project choose one that solves a specific, limited problem. For example, let's say you feel you're wasting time playing on Facebook when you should be doing homework.

    1. "Build the best time tracking app ever" is neither limited nor specific, nor is it really a problem you're solving.
    2. "I want to keep track of how much time I spend actually working on homework vs. procrastinating" is better, but still not quite problem-driven.
    3. A good problem statement is "I want to prevent myself from visiting Facebook and other specific websites while I'm working on homework." At this point you have a clear sense of what software you're building.

    Why a specific and limited problem?

    • The problem statement will tell you whether you're making progress: are you any closer to solving the problem? Is the work you're doing actually related to the problem at all?
    • By limiting the problem you increase your chances of successfully building something usable. If you finish it and want to keep going, great, add another problem to expand its scope. But start with something small.

    Approach #2: artificial limits

    How do you choose a side project if you don't have any specific problems in mind? The key is to still have constraints and limits so that your project is small, achievable and has a clear goal.

    One great way to do that is to set a time limit. I'm not a fan of hackathons, since they promote the idea that sleeplessness and working crazy hours is a reasonable way to write software. But with a longer time frame building something specific with a time limit can be a great way to create a side project.

    The PyWeek project for example has you build a game in one week, using a theme chosen by the organizers. Building a game isn't solving a problem, but it can still be fun and educational. And the one week limit will ensure you focus your efforts and achieve something concrete.

    Software has no value

    Whether you decide to solve a problem or to set artificial time limits on your side project, the key is having constraints and a clear goal. Software is just a tool, there is no inherent value in producing more; value is produced by solving problems or the entertainment value of a game. A half-solved problem or a half-finished game are valueless, so you want your initial goal to be small and constrained.

    I've learned this the hard way, focusing on the value of my code instead of on the problems it solved. If you want to avoid that and other mistakes I've made over 20 years of writing software check out my career as a Software Clown.

    September 09, 2016 04:00 AM

    August 28, 2016

    Twisted Matrix Laboratories

    Twisted 16.4.0 Released

    On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.4.0.

    The highlights of this release are:
    • twist, a new command line tool for running Twisted plugins, similar to twistd but with a simpler, cleaner interface.
    • A new interface for Protocols, IHandshakeListener, which tells Twisted to tell the Protocol when the TLS handshake has been completed.
    • async/await support for Deferreds, allowing you to write Python 3.5+ coroutines using Twisted
    • Trial can be invoked with "python -m twisted.trial".
    • All Twisted executables (trial, twistd, etc) are now Setuptools console scripts, meaning they will work much better on Windows.
    • 35+ more modules ported to Python 3, and many many cleanups on the way to Python 3 on Windows support.
    • All the security fixes of Twisted 16.3.1 + 16.3.2 (httpoxy, HTTP session identifier strengthening, HTTP+TLS consuming sockets)
    • 240+ closed tickets overall.
    For more information, check the NEWS file (link provided below).

    You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

    Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

    Twisted Regards,
    Amber Brown (HawkOwl)

    PS: Twisted 16.4.1 will be coming soon after this with a patch mitigating SWEET32, by updating the acceptable cipher list.

    by HawkOwl (noreply@blogger.com) at August 28, 2016 01:48 AM

    August 25, 2016

    Itamar Turner-Trauring

    From 10x programmer to 0.1x programmer: creating more with less

    You've heard of the mythical 10x programmers, programmers who can produce ten times as much as us normal humans. If you want to become a better programmer this myth is demoralizing, but it's also not useful: how can you write ten times as much code? On the other hand, consider the 0.1x programmer, a much more useful concept: anyone can choose to write only 10% code as much code as a normal programmer would. As they say in the business world, becoming a 0.1x programmer is actionable.

    Of course writing less code might seem problematic, so let's refine our goal a little. Can you write 10% as much code as you do now and still do just as well at your job, still fixing the same amount of bugs, still implementing the same amount of features? The answer may still be "no", but at least this is a goal you can more easily work towards incrementally.

    Doing more with less code

    How do you do achieve just as much while writing less code?

    1. Use a higher level programming language

    As it turns out many of us are 0.1x programmers without even trying, compared to previous generations of programmers that were stuck with lower-level programming languages. If you don't have to worry about manual memory management or creating a data structure from scratch you can write much less code to achieve the same goal.

    2. Use existing code

    Instead of coding from scratch, use an existing library that achieves the same thing. For example, earlier this week I was looking at the problem of incrementing version numbers in source code and documentation as part of a release. A little searching and I found an open source tool that did exactly what I needed. Because it's been used by many people and improved over time chances are it's better designed, better tested, and less buggy than my first attempt would have been.

    3. Spend some time thinking

    Surprisingly spending more time planning up front can save you time in the long run. If you have 2 days to fix a bug it's worth spending 10% of that time, an hour and half, to think about how to solve it. Chances are the first solution you come up with in the first 5 minutes won't be the best solution, especially if it's a hard problem. Spend an hour more thinking and you might come up with a solution that takes two hours instead of two days.

    4. Good enough features

    Most feature requests have three parts:

    1. The stuff the customer must have.
    2. The stuff that is nice to have but not strictly necessary.
    3. The stuff the customer is willing to admit is not necessary.

    The last category is usually dropped in advance, but you're usually still asked to implement the middle category of things that the customer and product manager really really want but aren't actually strictly necessary. So figure out the real minimum path to implement a feature, deliver it, and much of the time it'll turn out that no one will miss those nice-to-have additions.

    5. Drop the feature altogether

    Some features don't need to be done at all. Some features are better done a completely different way than requested.

    Instead of saying "yes, I'll do that" to every feature request, make sure you understand why someone needs the feature, and always consider alternatives. If you come up with a faster, superior idea the customer or product manager will usually be happy to go along with your suggestion.

    6. Drop the product altogether

    Sometimes your whole product is not worth doing: it will have no customers, will garner no interest. Spending months and months on a product no one will ever use is a waste of time, not to mention depressing.

    Lean Startup is one methodology for dealing with this: before you spend any time developing the product you do the minimal work possible to figure out if it's worth doing in the first place.

    Conclusion

    Your goal as programmer is not to write code, your goal is to solve problems. From low-level programming decisions to high-level business decisions there are many ways you can solve problems with less code. So don't start with "how do I write this code?", start with "how do I solve this problem?" Sometimes you'll do better not solving the problem at all, or redefining it. As you get better at solving problems with less code you will find yourself becoming more productive, especially if you start looking at the big picture.

    Being productive is a great help if you're tired of working crazy hours. Want a shorter workweek? Check out The Programmer's Guide to a Sane Workweek.

    August 25, 2016 04:00 AM

    Moshe Zadka

    Time Series Data

    When operating computers, we are often exposed to so-called “time series”. Whether it is database latency, page fault rate or total memory used, these are all exposed as numbers that are usually sampled at frequent intervals.

    However, not only computer engineers are exposed to such data. It is worthwhile to know what other disciplines are exposed to such data, and what they do with it. “Earth sciences” (geology, climate, etc.) have a lot of numbers, and often need to analyze trends and make predictions. Sometimes these predictions have, literally, billions dollars’ worth of decision hinging on them. It is worthwhile to read some of the textbooks for students of those disciplines to see how to approach those series.

    Another discipline that needs to visually inspect time series data is physicians. EKG data is often vital to analyze patients’ health — and especially when compared to their historical records. For that, that data needs to be saved. A lot of EKG research has been done on how to compress numerical data, but still keep it “visually the same”. While the research on that is not as rigorous, and not as settled, as the trend analysis in geology, it is still useful to look into. Indeed, even the basics are already better than so-called “roll-ups”, which preserve none of the visual distinction of the data, flattening peaks and filling hills while keeping a score of “standard deviation” that is not as helpful as is usually hoped for.

    by moshez at August 25, 2016 03:50 AM

    August 24, 2016

    Hynek Schlawack

    Hardening Your Web Server’s SSL Ciphers

    There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

    by Hynek Schlawack (hs@ox.cx) at August 24, 2016 03:40 PM

    August 22, 2016

    Hynek Schlawack

    Better Python Object Serialization

    The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects – for example to JSON in web APIs and structured logs.

    by Hynek Schlawack (hs@ox.cx) at August 22, 2016 12:30 PM

    August 20, 2016

    Moshe Zadka

    Extension API: An exercise in a negative case study

    I was idly contemplating implementing a new Jupyter kernel. Luckily, they try to provide facility to make it easier. Unfortunately, they made a number of suboptimal choices in their API. Fortunately, those mistakes are both common and easily avoidable.

    Subclassing as API

    They suggest subclassing IPython.kernel.zmq.kernelbase.Kernel. Errr…not “suggest”. It is a “required step”. The reason is probably that this class already implements 21 methods. When you subclass, make sure to not use any of these names, or things will break randomly. If you do not want to subclass, good luck figuring out what the assumption that the system makes about these 21 methods because there is no interface or even prose documentation.

    The return statement in their example is particularly illuminating:

            return {'status': 'ok',
                    # The base class increments the execution count
                    'execution_count': self.execution_count,
                    'payload': [],
                    'user_expressions': {},
                   }
    

    Note the comment “base class increments the execution count”. This is a classic code smell: this seems like this would be needed in every single overrider, which means it really belongs in the helper class, not in every kernel.

    None

    The signature for the example do_execute is:

        def do_execute(self, code, silent, store_history=True, 
                       user_expressions=None,
                       allow_stdin=False):
    

    Of course, this means that user_expressions will sometimes be a dictionary and sometimes None. It is likely that the code will be written to anticipate one or the other, and will fail in interesting ways if None is actually sent.

    Optional Overrides

    As described in this section there are also ways to make the kernel better with optional overrides. The convention used, which is nowhere explained, is that do_ methods mean you should override to make a better kernel. Nowhere it is explained why there is no default history implementation, or where to get one, or why a simple stupid implementation is wrong.

    Dictionaries

     

    All overrides return dictionaries, which get serialized directly into the underlying communication platform. This is a poor abstraction, especially when the documentation is direct links to the underlying protocol. When wrapping a protocol, it is much nicer to use an Interface as the documentation of what is assumed — and define an attr.s-based class to allow returning something which is automatically the correct type, and will fail in nice ways if a parameter is forgotten.

    Summary

    If you are providing an API, here are a few positive lessons based on the issues above:

    • You should expect interfaces, not subclasses. Use composition, not subclassing.If you want to provide a default implementation in composition, just check for a return of NotImplemeted(), and use the default.
    • Do the work of abstracting your customers from the need to use dictionaries and unwrap automatically. Use attr.s to avoid customer boilerplate.
    • Send all arguments. Isolate your customers from the need to come up with sane defaults.
    • As much as possible, try to have your interfaces be side-effect free. Instead of asking the customer to directly make a change, allow the customer to make the “needed change” be part of the return type. This will let the customers test their class much more easily.

    by moshez at August 20, 2016 06:56 PM

    August 19, 2016

    Twisted Matrix Laboratories

    Twisted 16.3.2 Released

    On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.3.2.

    This is a bug fix & security fix release, and is recommended for all users of Twisted. The fixes are:
    • A bugfix for a HTTP/2 edge case, (included in 16.3.1)
    • Fix for CVE-2008-7317 (generating potentially guessable HTTP session identifiers) (included in 16.3.1)
    • Fix for CVE-2008-7318 (sending secure session cookies over insecured connections) (included in 16.3.1)
    • Fix for CVE-2016-1000111 (http://httpoxy.org/) (included in 16.3.1)
    • Twisted's HTTP server, when operating over TLS, would not cleanly close sockets, causing it to build up CLOSE_WAIT sockets until it would eventually run out of file descriptors.
    For more information, check the NEWS file (link provided below).

    You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available at on GitHub.

    Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

    Twisted Regards,
    Amber Brown (HawkOwl)

    by HawkOwl (noreply@blogger.com) at August 19, 2016 09:45 AM

    August 18, 2016

    Jonathan Lange

    Patterns are half-formed code

    If “technology is stuff that doesn’t work yet”[1], then patterns are code we don’t know how to write yet.

    In the Go Programming Language, the authors show how to iterate over elements in a map, sorted by keys:

    To enumerate the key/value pairs in order, we must sort the keys explicitly, for instances, using the Strings function from the sort package if the keys are strings. This is a common pattern.

    —Go Programming Language, Alan A. A. Donovan & Brian W. Kernighan, p94

    The pattern is illustrated by the following code:

    import "sort"
    
    var names []string
    for name := range ages {
        name = append(names, name)
    }
    sort.Strings(names)
    for _, name := range names {
        fmt.Printf("%s\t%d\n", name, ages[name])
    }
    

    Peter Norvig calls this an informal design pattern: something referred to by name (“iterate through items in a map in order of keys”) and re-implemented from scratch each time it’s needed.

    Informal patterns have their place but they are a larval form of knowledge, stuck halfway between intuition and formal understanding. When we see a recognize a pattern, our next step should always be to ask, “can we make it go away?”

    Patterns are one way of expressing “how to” knowledge [2] but we have another, better way: code. Source code is a formal expression of “how to” knowledge that we can execute, test, manipulate, verify, compose, and re-use. Encoding “how to” knowledge is largely what programming is [3]. We talk about replacing people with programs precisely because we take the knowledge about how to do their job and encode it such that even a machine can understand it.

    So how can we encode the knowledge of iterating through the items in a map in order of keys? How can we replace this pattern with code?

    We can start by following Peter Norvig’s example and reach for a dynamic programming language, such as Python:

    names = []
    for name in ages:
        names.append(name)
    names.sort()
    for name in names:
        print("{}\t{}".format(name, ages[name]))
    

    This is a very literal translation of the first snippet. A more idiomatic approach would look like:

    names = sorted(ages.keys())
    for name in names:
        print("{}\t{}".format(name, ages[name])
    

    To turn this into a formal pattern, we need to extract a function that takes a map and returns a list of pairs of (key, value) in sorted order, like so:

    def sorted_items(d):
        result = []
        sorted_keys = sorted(d.keys())
        for k in sorted_keys:
            result.append((k, d[k]))
        return result
    
    for name, age in sorted_items(ages):
        print("{}\t{}".format(name, age))
    

    The pattern has become a function. Instead of a name or a description, it has an identifier, a True Name that gives us power over the thing. When we invoke it we don’t need to comment our code to indicate that we are using a pattern because the name sorted_items makes it clear. If we choose, we can test it, optimize it, or perhaps even prove its correctness.

    If we figure out a better way of doing it, such as:

    def sorted_items(d):
        return [(k, d[k]) for k in sorted(d.keys())]
    

    Then we only have to change one place.

    And if we are willing to tolerate a slight change in behavior,

    def sorted_items(d):
        return sorted(d.items())
    

    Then we might not need the function at all.

    It was being able to write code like this that drew me towards Python and away from Java, way back in 2001. It wasn’t just that I could get more done in fewer lines—although that helped—it was that I could write what I meant.

    Of course, these days I’d much rather write:

    import Data.List (sort)
    import qualified Data.HashMap as Map
    
    sortedItems :: (Ord k, Ord v) => Map.Map k v -> [(k, v)]
    sortedItems d = sort (Map.toList d)
    

    But that’s another story.

    [1]Bran Ferren, via Douglas Adams
    [2]Patterns can also contain “when to”, “why to”, “why not to”, and “how much” knowledge, but they _always_ contain “how to” knowledge.
    [3]The excellent SICP lectures open with the insight that what we call “computer science” might be the very beginning of a science of “how to” knowledge.

    by Jonathan Lange at August 18, 2016 05:00 PM

    Itamar Turner-Trauring

    Less stress, more productivity: why working fewer hours is better for you and your employer

    Update: This post got to #1 on Hacker News and the /r/programming subreddit, and had over 40,000 views. Given that level of interest in the subject I've decided to write The Programmer's Guide to a Sane Workweek.

    There's always too much work to be done on software projects, too many features to implement, too many bugs to fix. Some days you're just not going through the backlog fast enough, you're not producing enough code, and it's taking too long to fix a seemingly-impossible bug. And to make things worse you're wasting time in pointless meetings instead of getting work done.

    Once it gets bad enough you can find yourself always scrambling, working overtime just to keep up. Pretty soon it's just expected, and you need to be available to answer emails at all hours even when there are no emergencies. You're tired and burnt out and there's still just as much work as before.

    The real solution is not working even harder or even longer, but rather the complete opposite: working fewer hours.

    Some caveats first:

    • The more experienced you are the better this will work. If this is your first year working after school you may need to just deal with it until you can find a better job, which you should do ASAP.
    • Working fewer hours is effectively a new deal you are negotiating with your employer. If you're living from paycheck to paycheck you have no negotiating leverage, so the first thing you need to do is make sure you have some savings in the bank.

    Fewer hours, more productivity

    Why does working longer hours not improve the situation? Because working longer makes you less productive at the same time that it encourages bad practices by your boss. Working fewer hours does the opposite.

    1. A shorter work-week improves your ability to focus

    As I've discussed before, working while tired is counter-productive. It takes longer and longer to solve problems, and you very quickly hit the point of diminishing returns. And working consistently for long hours is even worse for your mental focus, since you will quickly burn out.

    Long hours: "It's 5 o'clock and I should be done with work, but I just need to finish this problem, just one more try," you tell yourself. But being tired it actually takes you another three hours to solve. The next day you go to work tired and unfocused.

    Shorter hours: "It's 5 o'clock and I wish I had this fixed, but I guess I'll try tomorrow morning." The next morning, refreshed, you solve the problem in 10 minutes.

    2. A shorter work-week promotes smarter solutions

    Working longer hours encourages bad programming habits: you start thinking that the way to solve problems is just forcing yourself to get through the work. But programming is all about automation, about building abstractions to reduce work. Often you can get huge reductions in effort by figuring out a better way to implement an API, or that a particular piece of functionality is not actually necessary.

    Let's imagine your boss hands you a task that must ship to your customer in 2 weeks. And you estimate that optimistically it will take you 3 weeks to implement.

    Long hours: "This needs to ship in two weeks, but I think it's 120 hours to complete... so I guess I'm working evenings and weekends again." You end up even more burnt out, and probably the feature will still ship late.

    Shorter hours: "I've got two weeks, but this is way too much work. What can I do to reduce the scope? Guess I'll spend a couple hours thinking about it."

    And soon: "Oh, if I do this restructuring I can get 80% of the feature done in one week, and that'll probably keep the customer happy until I finish the rest. And even if I underestimated I've still got the second week to get that part done."

    3. A shorter work-week discourages bad management practices

    If your response to any issue is to work longer hours you are encouraging bad management practices. You are effectively telling your manager that your time is not valuable, and that they need not prioritize accordingly.

    Long hours: If your manager isn't sure whether you should go to a meeting, they might tell themselves that "it might waste an hour of time, but they'll just work an extra hour in the evening to make it up." If your manager can't decide between two features, they'll just hand you both instead of making a hard decision.

    Shorter hours: With shorter hours your time becomes more scarce and valuable. If your manager is at all reasonable less important meetings will get skipped and more important features will be prioritized.

    Getting to fewer hours

    A short work-week mean different things to different people. One programmer I know made clear when she started a job at a startup that she worked 40-45 hours a week and that's it. Everyone else worked much longer hours, but that was her personal limit. Personally I have negotiated a 35-hour work week.

    Whatever the number that makes sense to you, the key is to clearly explain your limits and then stick to them. Tell you manager "I am going to be working a 40-hour work week, unless it's a real emergency." Once you've explained your limits you need to stick to them: no answering emails after hours, no agreeing to do just one little thing on the weekend.

    And then you need to prove yourself by still being productive, and making sure that when you are working you are working. Spending a couple hours a day at work watching cat videos probably won't go well with shorter hours.

    There are companies where this won't fly, of course, where management is so bad or norms are so out of whack that even a 40-hour work week by a productive team member won't be acceptable. In those cases you need to look for a new job, and as part of the interview figure out the work culture and project management practices of prospective employers. Do people work short hours or long hours? Is everything always on fire or do projects get delivered on time?

    Whether you're negotiating your hours at your existing job or at a new job, you'll do better the more experienced and skilled of a programmer you are. If you want to learn how to get there check out The Programmer's Guide to a Sane Workweek.

    August 18, 2016 04:00 AM