Planet Twisted

February 05, 2016

Twisted Matrix Laboratories

Jan '16 SFC Sponsored Development (HawkOwl)

Hi everyone!

The Twisted Fellowship, for this time, has come to an end, and as such, here is my final report.

Tickets reviewed/merged:
  • #8140 (merged): DummyRequest is out of sync with the real Request
  • #7943 (reviewed, merged): Remove usage of microdom
  • #8132 (reviewed, merged): Port twisted.web.vhost to Python 3
  • #7993 (reviewed, merged): Port twisted.web.wsgi to Python 3
  • #8173 (merged): Twisted does not have a code of conduct

Tickets triaged:
  • (braid) #66 - Drop the Fedora 17 and Fedora 18 builders
  • (braid) #22 - Properly install twisted's trac plugins
  • - #7813 (closed, already done) - twisted.trial.test.test_doctest should be ported to python3
Tickets worked on:
  • (braid) #168 (done on branch) - Install Trac GitHub plugin
  • (braid) #169 (done on branch) - In 'staging' GitHub repo add the webhook to poke the 'staging' Trac
  • (braid) #167 (done on branch) - Create staging Trac
  • (braid) #139 (done on branch) - Manage git mirror repo used by Trac using braid
  • (braid) #1 (done on branch) - Use virtualenvs in deployment
  • (braid) #164 (work in progress) - Migrate Twisted SVN accounts to GitHub twisted/twisted
  • (braid) #178 (work in progress) - Migrate to BuildBot Nine
  • (braid) #142 (work in progress) - Migrate IRC announcements from Kenaan
  • (braid) #185 (merged) - Add dns entry for staging.twistedmatrix.com
Even though we didn't get the Git migration done in time for the end of the fellowship, I am happy to report that it is in a much closer and much better known state than before. If you would like to assist in getting some of the work done above reviewed and merged, drop by https://github.com/twisted-infra/braid/pulls !

- Amber

by HawkOwl (noreply@blogger.com) at February 05, 2016 04:37 PM

February 02, 2016

Twisted Matrix Laboratories

January 2016 - SFC Sponsored Development

This is my report for the work done in January 2016 as part of the Twisted Maintainer Fellowship program.
It is my last report of the Twisted Maintainer Fellowship 2015 program.

With this fellowship the review queue size was reduced and the review round-trips were done much quicker.
This fellowship has produced the Git/GitHub migration plan but has failed to finalize its execution.

Tickets reviewed and merged

* #7671 - It is way too hard to specify a trust root combining multiple certificates, especially to HTTP
* #7993 - Port twisted.web.wsgi to Python 3
* #8140 - twisted.web.test.requesthelper.DummyRequest is out of sync with the real Request
* #8148 - Deprecate twisted.protocols.mice
* #8173 - Twisted does not have a code of conduct
* #8180 - Conch integration tests fail because DSA is deprecated in OpenSSH 7.
* #8187 - Use a less ancient OpenSSL method in twisted.test.test_sslverify

Tickets reviewed and not merged yet

* #7889 - replace win32api.OpenProcess and win32api.FormatMessage with cffi
* #8150 - twisted.internet.ssl.KeyPair should provide loadPEM
* #8159 - twisted.internet._win32serialport incompatible with pyserial 3.x
* #8169 - t.w.static.addSlash does not work on Python 3
* #8188 - Advertise H2 via ALPN/NPN when available.

Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at February 02, 2016 01:18 PM

February 01, 2016

Moshe Zadka

Docker: Are we there yet?

Obviously, we know the answer. This blog is intended to allow me to have an easy place to point people when they ask me “so what’s wrong with Docker”?

[To clarify, I use Docker myself, and it is pretty neat. All the more reason missing features annoy me.]

Docker itself:

  • User namespaces — slated to land in February 2016, so pretty close.
  • Temporary adds/squashing — currently “closed” and suggests people use work-arounds.
  • Dockerfile syntax is limited — this is related to the issue above, but there are a lot of missing features in Dockerfile (for example, a simple form of “reuse” other than chaining). No clear idea when it will be possible to actually implement the build in terms of an API, because there is no link to an issue or PR.

Tooling:

  • Image size — Minimal versions of Debian, Ubuntu or CentOS are all unreasonably big. Alpine does a lot better. People really should move to Alpine. I am disappointed there is no competition on being a “minimal container-oriented distribution”.
  • Build determinism — Currently, almost all Dockerfiles in the wild call out to the network to grab some files while building. This is really bad — it assumes networking, depends on servers being up and assumes files on servers never change. The alternative seems to be checking big files into one’s own repo.
    • The first thing to do would be to have an easy way to disable networking while the container is being built.
    • The next thing would be a “download and compare hash” operation in a build-prep step, so that all dependencies can be downloaded and verified, while the hashes would be checked into the source.
    • Sadly, Alpine linux specifically makes it non-trivial to “just download the package” from outside of Alpine.

 

by moshez at February 01, 2016 05:24 AM

January 27, 2016

Moshe Zadka

Learning Python: The ecosystem

When first learning Python, the tutorial is a pretty good resource to get acquainted with the language and, to some extent, the standard library. I have written before about how to release open source libraries — but it is quite possible that one’s first foray into Python will not be to write a reusable library, but an application to accomplish something — maybe a web application with Django or a tool to send commands to your TV. Much of what I said there will not apply — no need for a README.rst if you are writing a Django app for your personal website!

However, it probably is useful to learn a few tools that the Python eco-system engineered to make life more pleasant. In a perfect world, those would be built-in to Python: the “cargo” to Python’s “Rust”. However, in the world we live in, we must cobble together a good tool-chain from various open source projects. So strap in, and let’s begin!

The first three are cribbed from my “open source” link above, because good code hygiene is always important.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox.

Set tox to put all build artifacts in a build/ top-level directory.

Pex

A tox test-environment of “pex” should result in a Python EXectuable created and put somewhere under “build/”. Running your Python application to actually serve web pages should be as simple as taking that pex and running it without arguments. BoredBot shows an example of how to create such a pex that includes a web application, a Twisted application and a simple loop/sleep-based application. This pex build can take a requirements.txt file with exact dependencies, though it if it is built by tox, you can inline those dependencies directly in the tox file.

Collaboration

If you do collaborate with others on the project, whether it is open source or not, it is best if the collaboration instructions are as easy as possible. Ideally, collaboration instructions should be no more complicated than “clone this, make changes, run tox, possibly do whatever manual verification using ‘./build/my-thing.pex’ and submit a pull request”.

If they are, consider investing some effort into changing the code to be more self-sufficient and make less assumptions about its environment. For example, default to a local SQLite-based database if no “–database” option is specified, and initialize it with whatever your code needs. This will also make it easier to practices the “infinite environment” methodology, since if one file is all it takes to “bring up” an environment, it should be easy enough to run it on a cloud server and allow people to look at it.

by moshez at January 27, 2016 05:45 AM

January 17, 2016

Glyph Lefkowitz

Stop Working So Hard

Recently, I saw this tweet where John Carmack posted to a thread on Hacker News about working hours. As this post propagated a good many bad ideas about working hours, particularly in the software industry, I of course had to reply. After some further back-and-forth on Twitter, Carmack followed up.

First off, thanks to Mr. Carmack for writing such a thorough reply in good faith. I suppose internet arguments have made me a bit cynical in that I didn’t expect that. I still definitely don’t agree, but I think there’s a legitimate analysis of the available evidence there now, at least.

When trying to post this reply to HN, I was told that the comment was too long, and I suppose it is a bit long for a comment. So, without further ado, here are my further thoughts on working hours.

... if only the workers in Greece would ease up a bit, they would get the productivity of Germany. Would you make that statement?

Not as such, no. This is a hugely complex situation mixing together finance, culture, management, international politics, monetary policy, and a bunch of other things. That study, and most of the others I linked to, is interesting in that it confirms the general model of ability-to-work (i.e. “concentration” or “willpower”) as a finite resource that you exhaust throughout the day; not in that “reduction in working hours” is a panacea solution. Average productivity-per-hour-worked would definitely go up.

However, I do believe (and now we are firmly off into interpretation-of-results territory, I have nothing empirical to offer you here) that if the average Greek worker were less stressed to the degree of the average German one, combining issues like both overwork and the presence of a constant catastrophic financial crisis in the news, yes; they’d achieve equivalent productivity.

Total net productivity per worker, discounting for any increases in errors and negative side effects, continues increasing well past 40 hours per week. ... Only when you are so broken down that even when you come back the following day your productivity per hour is significantly impaired, do you open up the possibility of actually reducing your net output.

The trouble here is that you really cannot discount for errors and negative side effects, especially in the long term.

First of all, the effects of overwork (and attendant problems, like sleep deprivation) are cumulative. While productivity on a given day increases past 40 hours per week, if you continue to work more, you productivity will continue to degrade. So, the case where “you come back the following day ... impaired” is pretty common... eventually.

Since none of this epidemiological work tracks individual performance longitudinally there are few conclusive demonstrations of this fact, but lots of compelling indications; in the past, I’ve collected quantitative data on myself (and my reports, back when I used to be a manager) that strongly corroborates this hypothesis. So encouraging someone to work one sixty-hour week might be a completely reasonable trade-off to address a deadline; but building a culture where asking someone to work nights and weekends as a matter of course is inherently destructive. Once you get into the area where people are losing sleep (and for people with other responsibilities, it’s not hard to get to that point) overwork starts impacting stuff like the ability to form long-term memories, which means that not only do you do less work, you also consistently improve less.

Furthermore, errors and negative side effects can have a disproportionate impact.

Let me narrow the field here to two professions I know a bit about and are germane to this discussion; one, health care, which the original article here starts off by referencing, and two, software development, with which we are both familiar (since you already raised the Mythical Man Month).

In medicine, you can do a lot of valuable life-saving work in a continuous 100-hour shift. And in fact residents are often required to do so as a sort of professional hazing ritual. However, you can also make catastrophic mistakes that would cost a person their life; this happens routinely. Study after study confirms this, and minor reforms happen, but residents are still routinely abused and made to work in inhumane conditions that have catastrophic outcomes for their patients.

In software, defects can be extremely expensive to fix. Not only are they hard to fix, they can also be hard to detect. The phenomenon of the Net Negative Producing Programmer also indicates that not only can productivity drop to zero, it can drop below zero. On the anecdotal side, anyone who has had the unfortunate experience of cleaning up after a burnt-out co-worker can attest to this.

There are a great many tasks where inefficiency grows significantly with additional workers involved; the Mythical Man Month problem is real. In cases like these, you are better off with a smaller team of harder working people, even if their productivity-per-hour is somewhat lower.

The specific observation from the Mythical Man Month was that the number of communication links on a fully connected graph of employees increases geometrically whereas additional productivity (in the form of additional workers) increases linearly. If you have a well-designed organization, you can add people without requiring that your communication graph be fully connected.

But of course, you can’t always do that. And specifically you can’t do that when a project is already late: you already figured out how the work is going to be divided. Brooks’ Law is formulated as: “Adding manpower to a late software project makes it later.” This is indubitable. But one of the other famous quotes from this book is “The bearing of a child takes nine months, no matter how many women are assigned.”

The bearing of a child also takes nine months no matter how many hours a day the woman is assigned to work on it. So “in cases like these” my contention is that you are not “better off with ... harder working people”: you’re just screwed. Some projects are impossible and you are better off acknowledging the fact that you made unrealistic estimates and you are going to fail.

You called my post “so wrong, and so potentially destructive”, which leads me to believe that you hold an ideological position that the world would be better if people didn’t work as long. I don’t actually have a particularly strong position there; my point is purely about the effective output of an individual.

I do, in fact, hold such an ideological position, but I’d like to think that said position is strongly justified by the data available to me.

But, I suppose calling it “so potentially destructive” might have seemed glib, if you are really just looking at the microcosm of what one individual might do on one given week at work, and not at the broader cultural implications of this commentary. After all, as this discussion shows, if you are really restricting your commentary to a single person on a single work-week, the case is substantially more ambiguous. So let me explain why I believe it’s harmful, as opposed to merely being incorrect.

First of all, the problem is that you can’t actually ignore the broader cultural implications. This is Hacker News, and you are John Carmack; you are practically a cultural institution yourself, and by using this site you are posting directly into the broader cultural implications of the software industry.

Software development culture, especially in the USA, suffers from a long-standing culture of chronic overwork. Startup developers in their metaphorical (and sometimes literal) garages are lionized and then eventually mythologized for spending so many hours on their programs. Anywhere that it is celebrated, this mythology rapidly metastasizes into a severe problem; the Death March

Note that although the term “death march” is technically general to any project management, it applies “originally and especially in software development”, because this problem is worse in the software industry (although it has been improving in recent years) than almost anywhere else.

So when John Carmack says on Hacker News that “the effective output of an individual” will tend to increase with hours worked, that sends a message to many young and impressionable software developers. This is the exact same phenomenon that makes pop-sci writing terrible: your statement may be, in some limited context, and under some tight constraints, empirically correct, but it doesn’t matter because when you expand the parameters to the full spectrum of these people’s careers, it’s both totally false and also a reinforcement of an existing cognitive bias and cultural trope.

I can’t remember the name of this cognitive bias (and my Google-fu is failing me), but I know it exists. Let me call it the “I’m fine” bias. I know it exists because I have a friend who had the opportunity to go on a flight with NASA (on the Vomit Comet), and one of the more memorable parts of this experience that he related to me was the hypoxia test. The test involved basic math and spatial reasoning skills, but that test wasn’t the point: the real test was that they had to notice and indicate when the oxygen levels were dropping and indicate that to the proctor. Concentrating on the test, many people failed the first few times, because the “I’m fine” bias makes it very hard to notice that you are impaired.

This is true of people who are drunk, or people who are sleep deprived, too. Their abilities are quantifiably impaired, but they have to reach a pretty severe level of impairment before they notice.

So people who are overworked might feel generally bad but they don’t notice their productivity dropping until they’re way over the red line.

Combine this with the fact that most people, especially those already employed as developers, are actually quite hard-working and earnest (laziness is much more common as a rhetorical device than as an actual personality flaw) and you end up in a scenario where a good software development manager is responsible much more for telling people to slow down, to take breaks, and to be more realistic in their estimates, than to speed up, work harder, and put in more hours.

The trouble is this goes against the manager’s instincts as well. When you’re a manager you tend to think of things in terms of resources: hours worked, money to hire people, and so on. So there’s a constant nagging sensation for a manager to encourage people to work more hours in a day, so you can get more output (hours worked) out of your input (hiring budget). The problem here is that while all hours are equal, some hours are more equal than others. Managers have to fight against their own sense that a few more worked hours will be fine, and their employees’ tendency to overwork because they’re not noticing their own burnout, and upper management’s tendency to demand more.

It is into this roiling stew of the relentless impulse to “work, work, work” that we are throwing our commentary about whether it’s a good idea or not to work more hours in the week. The scales are weighted very heavily on one side already - which happens to be the wrong side in the first place - and while we’ve come back from the unethical and illegal brink we were at as an industry in the days of ea_spouse, software developers still generally work far too much.

If we were fighting an existential threat, say an asteroid that would hit the earth in a year, would you really tell everyone involved in the project that they should go home after 35 hours a week, because they are harming the project if they work longer?

Going back to my earlier explanation in this post about the cumulative impact of stress and sleep deprivation - if we were really fighting an existential threat, the equation changes somewhat. Specifically, the part of the equation where people can have meaningful downtime.

In such a situation, I would still want to make sure that people are as well-rested and as reasonably able to focus as they possibly can be. As you’ve already acknowledged, there are “increases in errors” when people are working too much, and we REALLY don’t want the asteroid-targeting program that is going to blow apart the asteroid that will wipe out all life on earth to have “increased errors”.

But there’s also the problem that, faced with such an existential crisis, nobody is really going to be able to go home and enjoy a fine craft beer and spend some time playing with their kids and come back refreshed at 100% the next morning. They’re going to be freaking out constantly about the comet, they’re going to be losing sleep over that whether they’re working or not. So, in such a situation, people should have the option to go home and relax if they’re psychologically capable of doing so, but if the option for spending their time that makes them feel the most sane is working constantly and sleeping under their desk, well, that’s the best one can do in that situation.

This metaphor is itself also misleading and out of place, though. There is also a strong cultural trend in software, especially in the startup ecosystem, to over-inflate the importance of what the company is doing - it is not “changing the world” to create a website for people to order room-service for their dogs - and thereby to catastrophize any threat to that goal. The vast majority of the time, it is inappropriate to either to sacrifice -- or to ask someone else to sacrifice -- health and well-being for short-term gains. Remember, given the cumulative effects of overwork, that’s all you even can get: short-term gains. This sacrifice often has a huge opportunity cost in other areas, as you can’t focus on more important things that might come along later.

In other words, while the overtime situation is complex and delicate in the case of an impending asteroid impact, there’s also the question of whether, at the beginning of Project Blow Up The Asteroid, I want everyone to be burnt out and overworked from their pet-hotel startup website. And in that case, I can say, unequivocally, no. I want them bright-eyed and bushy-tailed for what is sure to be a grueling project, no matter what the overtime policy is, that absolutely needs to happen. I want to make sure they didn’t waste their youth and health on somebody else’s stock valuation.

by Glyph at January 17, 2016 04:06 AM

January 13, 2016

Hynek Schlawack

hasattr() Considered Harmful

Don’t use Python’s hasattr() unless you’re writing Python 3-only code.

by Hynek Schlawack (hs@ox.cx) at January 13, 2016 12:00 AM

January 09, 2016

Twisted Matrix Laboratories

December 2015 second half - SFC Sponsored Development

This is my report for the work done in the second half of December as part of the 2015 Twisted Maintainer Fellowship program.

Important changes made in these weeks

* The Git migration plan was approved.

Git Migration tasks

* Update and add new tickets in twisted-infra/braid for the Git migration plan
* twisted-infra/braid #164- Plan SVN account migration

Tickets reviewed and merged

* #8129 - twisted.web.http_headers.Headers should work on bytes and Unicode
* #8138 - twisted.conch.endpoints._NewConnectionHelper leaks agent connection
* #8145 - Remove the deprecated gtkreactor
* #8154 - Update doc references to new logger

Tickets reviewed and not merged yet

* #7671 - it is way too hard to specify a trust root combining multiple certificates, especially to HTTP
* #7993 - 'twisted.web.wsgi' should be ported to Python 3
* #8025 - Make Trial work on Windows+Python3
* #8143- HTTP/2 Part 1: New IRequest interface
* #8160 - twisted.conch.ssh.agent is missing some request and response types

Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at January 09, 2016 12:44 PM

January 06, 2016

Hynek Schlawack

January 05, 2016

Glyph Lefkowitz

Taking Issue With Paul Graham’s Premises

Paul Graham has recently penned an essay on income inequality. Holly Wood wrote a pretty good critique of this piece, but it is addressing a huge amount of pre-existing context, as well as ignoring large chunks of the essay that nominally agree.

Eevee has already addressed how the “simplified version” didn’t substantively change anything that the longer one was saying, so I’m not going to touch on that much. However, it’s worth noting that the reason Paul Graham says he wrote that is that he thinks that “adventurous interpretations” are to blame for criticism of his “controversial” writing; in other words, that people are misinterpreting his argument because the conclusion is politically unacceptable.

Personally, I am deeply ambivalent about the political implications of his writing. I believe, strongly, in the power of markets to solve social problems that planning cannot. I don’t think “capitalist” is a slur. But, neither do I believe that markets are inherently good; capitalist economic theory assumes an environment of equal initial opportunity which demonstrably does not exist. I am, personally, very open to ideas like the counter-intuitive suggestion that economic inequality might not be such a bad thing, if the case were well-made. I say this because I want to be clear that what bothers me about Paul Graham’s writing is not its “controversial” content.

What bothers me about Paul Graham’s writing is that the reasoning is desperately sloppy. I sometimes mentor students on their writing, and if this mess were handed to me by one of my mentees, I would tell them to rewrite it from scratch. Although the “thanks” section at the end of each post on his blog implies that he gets editing feedback, it must be so uncritical of his assumptions as to be useless.

Initially, my entirely subjective impression is that Paul Graham is not a credible authority on the topic of income inequality. He doesn’t demonstrate any grasp of its causes, or indeed the substance of any proposed remedy. I would say that he is attacking a straw-man, but he doesn’t even bother to assemble the straw-man first, saying only:

... the thing that strikes me most about the conversations I overhear ...

What are these “conversations” he “overhears”? What remedies are they proposing which would target income inequality by eliminating any possible reward for entrepreneurship? Nothing he’s arguing against sounds like anything I’ve ever heard someone propose, and I spend a lot of time in the sort of conversation that he imagines overhearing.

His claim to credentials in this area doesn’t logically follow, either:

I’ve become an expert on how to increase economic inequality, and I’ve spent the past decade working hard to do it. ... In the real world you can create wealth as well as taking it from others.

This whole passage is intended to read logically as: “I increase economic inequality, which you might assume is bad, but it’s not so bad, because it creates wealth in the process!”.

Hopefully what PG has actually been trying to become an expert in is creating wealth (a net positive), not in increasing economic inequality (a negative, or, at best, neutral by-product of that process). If he is focused on creating wealth, as so much of the essay purports he is, then it does not necessarily follow that the startup founders will be getting richer than their customers.

Of course, many goods and services provide purely subjective utility to their consumers. But in a properly functioning market, the whole point of of engaging in transactions is to improve efficiency.

To borrow from PG’s woodworker metaphor:

A woodworker creates wealth. He makes a chair, and you willingly give him money in return for it.

I might be buying that chair to simply appreciate its chair-ness and bask in the sublime beauty of its potential for being sat-in. But equally likely, I’m buying that chair for my office, where I will sit in it, and produce some value of my own while thusly seated. If the woodworker hadn’t created that chair for me, I’d have to do it myself, and it (presumably) would have been more expensive in terms of time and resources. Therefore, by producing the chair more efficiently, the woodworker would have increased my wealth as well as his own, by increasing the delta between my expenses (which include the chair) and my revenue (generated by tripping the light pythonic or whatever).

Note that “more efficient” doesn’t necessarily mean “lower cost”. Sitting in a chair is a substantial portion of my professional activity. A higher-quality chair that costs the same amount might improve the quality of my sitting experience, which might improve my own productivity at writing code, allowing me to make more income myself.

Even if the value of the chair is purely subjective, it is still an expense, and making it more efficient to make chairs would still increase my net worth.

Therefore, if startups really generated wealth so reliably, rather than simply providing a vehicle for transferring it, we would expect to see decreases in economic inequality, as everyone was able to make the most efficient use of their own time and resources, and was able to make commensurately more money.

... variation in productivity is accelerating ...

Counterpoint: no it isn’t. It’s not even clear that it’s increasing, let alone that its derivative is increasing. This doesn’t appear to be something that much data is collected on, and in the absence of any citation, I have to assume that it is a restatement of the not only false, but harmful, frequently debunked 10x programmer myth.

Most people who get rich tend to be fairly driven.

This sounds obvious: of course, if you “get” rich, you have to be doing something to “get” that way. First of all, this ignores many people who simply are rich, who get their wealth from inheritance or rent-seeking, which I think is discounting a pretty substantial number of rich people.

But it is implicitly making a bolder claim: that people who get rich are more driven than other people; i.e. those who don’t get rich.

In my personal experience, the opposite is true. People who get rich do work hard, and are determined, but really poor people work a lot harder and are a lot more determined. A startup founder who is eating rice and beans to try to keep their burn rate low and their runway long may indeed be making sacrifices and working hard. They may be experiencing emotional turmoil. But implicitly, such a person always has the safety net of high-value skills they can use to go find another job if their attempt doesn’t work out.

But don’t take my word for it; think about it for yourself. Consider a single mother working three minimum-wage jobs and eating rice and beans because that’s the only way she can feed her children. Would you imagine she is less determined and will work less hard to keep her children alive than our earlier hypothetical startup founder would work to keep their valuation high?

One of the most important principles in Silicon Valley is that “you make what you measure.” It means that if you pick some number to focus on, it will tend to improve, but that you have to choose the right number, because only the one you choose will improve

A closely-related principle from outside of Silicon Valley is Goodhart’s Law. It states, “When a measure becomes a target, it ceases to be a good measure”. If you pick some number to focus on, the number as measured will improve, but since it’s often cheaper to subvert the mechanisms for measuring than to actually make progress, the improvement will often be meaningless. It is a dire mistake to assume that as long as you select the right metric in a political process that you can really improve it.

The Silicon Valley version - assuming the number will genuinely increase, and all you have to do is choose the right one - really only works when the things producing the numbers are computers, and the people collecting them have clearly circumscribed reasons not to want to cheat. This is why people tend to select numbers like income inequality to optimize: it gives people a reason to want to avoid cheating.

It’s still possible to get rich by buying politicians (though even that is harder than it was in 1880)

The sunlight foundation published a report in 2014, indicating that the return on investment of political spending is approximately 76,000%. While the sunlight foundation didn't exist in 1880, a similar report in 2009 suggested this number was 22,000% a few years ago, suggesting this number is going up, not down; i.e. over time, it is getting easier, not harder, to get rich by buying politicians.

Meanwhile, the ROI of venture capital, while highly variable, is, on average, at least two orders of magnitude lower than that. While outright “buying” a politican is a silly straw-man, manipulating goverment remains a far more reliable and lucrative source of income than doing anything productive, with technology or otherwise.

The rate at which individuals can create wealth depends on the technology available to them, and that grows exponentially.

In what sense does technology grow “exponentially”? Let’s look at a concrete example of increasing economic output that’s easy to quantify: wheat yield per acre. What does the report have to say about it?

Winter wheat yields have trended higher since 1960. We find that a linear trend is the best fit to actual average yields over that period and that yields have increased at a rate of 0.4 bushel per acre per year...

(emphasis mine)

In other words, when I go looking for actual, quantifiable evidence of the benefit of improving technology, it is solidly linear, not exponential.

What have we learned?

Paul Graham frequently writes essays in which he makes quantifiable, falsifiable claims (technology growth is “exponential”, an “an exponential curve that has been operating for thousands of years”, “there are also a significant number who get rich by creating wealth”) but rarely, if ever, provides any data to back up those claims. When I look for specific examples to test his claims, as with the crop yield examples above, it often seems to me that his claims are exaggerated, entirely imagined, or, worse yet, completely backwards from the truth of the matter.

Graham frequently uses the language of rationality, data, science, empiricism, and mathematics. This is a bad habit shared by many others immersed in Silicon Valley culture. However, simply adopting an unemotional tone and co-opting words like “exponential” and “factor”, or almost-quantifiable weasel words like “most” and “significant”, is no substitute for actually doing the research, assembling the numbers, fitting the curves, and trying to understand if the claims are valid.

This continues to strike me as a real shame, because PG’s CV clearly shows he is an intelligent and determined fellow, and he certainly has a fair amount of money, status, and power at this point. More importantly, his other writings clearly indicate he cares a lot about things like “niceness” and fairness. If he took the trouble to more humbly approach socioeconomic problems like income inequality and poverty, really familiarize himself with existing work in the field, he could put his mind to a solution. He might be able to make some real change. Instead, he continues to use misleading language and rhetorical flourishes to justify decisions he’s already made. In doing so, he remains, regrettably, a blowhard.

by Glyph at January 05, 2016 04:29 PM

December 30, 2015

Twisted Matrix Laboratories

SFC Sponsored Development, Oct-Dec by Amber

Hi everyone,

Along with Adi, I'm supposed to be publishing these too :)

Thanks to our generous sponsors, the Software Freedom Conservancy (of which our part is colloquially known as the Twisted Software Foundation) was able to financially support me as one of the Twisted Fellows. My job is to review tickets, and assist with the Git migration -- this report is mostly about the former, but does have some promising news about the latter.

This groups the tickets under reviewed and triaged -- you can get the full details of my work here, from Trac (which includes both SFC funded ticket reviewing, and non-SFC funded work on code).

Tickets reviewed:

- #5388 (merged): IRI implementation
- #5911 (merged): Support the 'HttpOnly' flag when setting cookies
- #6833: Port twisted.protocols.amp to Python 3
- #7037 (merged): Write test for cftp._cbPutTargetAttrs or remove untested / unused code.
- #7993: Port twisted.web.wsgi to Python 3
- #8082: Document deprecation policy and how to implement a deprecation
- #8107: Add a custom pyflakes runner to ignore Twisted excepted files
- #8116: Deprecate LoopingCall.deferred
- #8121: Port twisted.protocols.htb to Python 3
- #8122 (merged): Final pyflakes cleanup
- #8131 (merged): Port twisted.python.roots to Python 3
- #8132: Port twisted.web.vhost to Python 3
- #8138 (merged): twisted.conch.endpoints._NewConnectionHelper leaks agent connection

Tickets triaged:

- #1436 (invalid): add getCookie/setCookie to web2.http_headers.Headers
- #1920 (invalid): dav.resource.py: RedirectResponse takes a full href, not a uri
- #2342 (invalid): scgi client doesn't handle content-length
- #2343 (invalid): createCGIEnvironment should refuse if path segments have a /
- #2544 (invalid): twisted.web2.channel.[s]cgi contain multiple uses of a function that doesn't exist.
- #3329 (invalid): HTTP's #(...) syntax allows null contents
- #4081 (wontfix): tap2rpm should allow setting RPM dependency information
- #4082 (wontfix): tap2rpm should read configuration from the .tac file it packages.
- #4230 (invalid): Correct exception text in twisted.web2.http_headers:HeaderHandler
- #4426 (wontfix): tap2deb should generate LSB headers for DependencyBasedBoot
- #5875 (wontfix): tap2rpm assumes $PREFIX is /usr
- #6081 (wontfix): Reject unicode in http_headers
- #6168 (wontfix): Enforce type (bytes) of header name and values in twisted.web.http_headers
- #6178 (closed): Port twisted.web.util to Python 3
- #6920 (wontfix): twisted.scripts.test.test_tap2deb.TestTap2DEB.test_basicOperation fails (on Gentoo?)
- #7497 (closed): twistd produces confusing behavior under Python3
- #8007 (closed): API docs try to load urchin.js over http
- #8036: Document how the new logging system should be used and tested in new code

Other
- With others, assembled and proposed a Git Migration Plan, which has now been approved.

Expect to see more in the coming weeks about the Git migration!

- Amber Brown
TSF Fellow & Twisted Release Manager

by HawkOwl (noreply@blogger.com) at December 30, 2015 12:47 PM

Twisted 15.5 Released

(This was queued but not actually posted, sorry everybody!)

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.5!

The sixth (!!) release in 2015 has quite a few goodies in it -- incrementalism is the name of the game here, and everything is just a little better than it was before. Some of the highlights of this release are:

  • Python 3.5 support on POSIX was added, and Python 2.6 support was dropped. We also only support x64 Python on Windows 7 now.
  • More than nine additional modules have been ported to Python 3, ranging from Twisted Web's Agent and downloadPage, twisted.python.logfile, and many others, as well as...
  • twistd is ported to Python 3, and its first plugin, web, is ported.
  • twisted.python.url, a new URL/IRI abstraction, has been introduced to answer the question "just what IS a URL" in Twisted, once and for all.
  • NPN and ALPN support has been added to Twisted's TLS implementation, paving the way for HTTP/2.
  • Conch now supports the DH group14-sha1 and group-exchange-sha256 key exchange algorithms, as well as hmac-sha2-256 and hmac-sha2-512 MAC algorithms. Conch also works nicer with newer OpenSSH implementations.
  • Twisted's IRC support now has a sendCommand() method, which enables the use of sending messages with tags.
  • 55+ closed tickets overall.
For more information, check the NEWS file (link provided below).

You can find the downloads at on PyPI (or alternatively our website) .
The NEWS file is also available at https://github.com/twisted/twisted/blob/twisted-15.5.0/NEWS.

Also worth noting is the two Twisted fellows -- Adi Roiban and myself -- who have been able to dedicate time to reviewing tickets and generally pushing things along in the process. We're funded by the Software Freedom Conservancy (through what is known as the "Twisted Software Foundation"), which is, in turn, funded by donators and sponsors -- potentially like you! If you would like to know how you can assist in the continued funding of the Fellowship program, see our website: https://twistedmatrix.com/trac/wiki/TwistedSoftwareFoundation#BenefitsofSponsorship

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber "Hawkie" Brown
Twisted Release Manager, Twisted Fellow

by HawkOwl (noreply@blogger.com) at December 30, 2015 12:47 PM

December 2015 first half - SFC Sponsored Development


This is my report for the work done in the first half of Decemeber as part of the 2015 Twisted Maintainer Fellowship program.

Important changes made in these weeks:


* The Git migration plan was sent for approval
* Port of the AMP protocol to Python 3.

Tickets reviewed and merged


* #4811 - @unittest.expectedFailure decorator breaks trial
* #5704 - mkdir -p for FilePath
* #6833 - Port twisted.protocols.amp to Python 3
* #7998 - twisted.conch.ssh.keys should be ported to Python 3
* #8115 - Some Twisted tests use the deprecated "assertEquals" and similar
* #8119 - Remove gadfly support / tests from ADBAPI
* #8125 - LoopingCall.withCount with interval 0
* #8136 - Remove the deprecated twisted.web.http.Request.headers/request_headers
* #8139 - None of the ADBAPI tests actually run

Tickets reviewed and not merged yet


* #6757 - SSL server endpoints don't provide a way to verify client certificates
* #7460 - Add HTTP 2.0 support to twisted.web
* #7993 - 'twisted.web.wsgi' should be ported to Python 3
* #8107 - Add a custom pyflakes runner to ignore Twisted excepted files
* #8124 - Update @deprecated to work with properties and __init__
* #8129 - twisted.web.http_headers.Headers should work on bytes and Unicode
* #8137 - twisted.web server breaks with non-IP addresses
* #8138 - twisted.conch.endpoints._NewConnectionHelper leaks agent connection


Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at December 30, 2015 10:59 AM

November 2015 week 3 and 4 - SFC Sponsored Development


This is my report for the work done in the last weeks of November as part of the 2015 Twisted Maintainer Fellowship program.

We continue to have a small waiting line for the review queue. In general the initial review response time was below 24 hours, and delayed mainly due to time zones.

Important changes made in these weeks:

* Release of Twisted 15.5.0
* Serial port support was ported to Python 3.
* SSH client and server now support SHA2 HMAC.

Tickets reviewed and merged

#6082 - Make test_updateWithKeywords work on Python 3
#6842 - Dead link in documentation of t.i._dumbwin32proc
#6978 - rst markup bugs in docs
#7668 - Links to Dive into Python in the testing documentation are dead
#7881 - Misleading example in Deferred Intro
#7944 - Typo in docs/core/howto/logger.rst
#7791 - Error in options howto doc
#8050 - Release Twisted 15.5
#8070 - Twisted web client IPV6 DNS lookup failed
#8099 - Port twisted.internet.serial to Python 3.
#8101 - Malformed HTTP Header causes exception in twisted.web.http server
#8102 - Malformed HTTP method causes exception in twisted.web.http server/Resource.render
#8104 - Fix documentation of default thread pool size
#8106 - Add Python 3.5 to our tox config.
#8108 - conch: support hmac-sha2-256 and hmac-sha2-512

Tickets reviewed and not merged yet

#5789 - Remove usage of UserDict
#6757 - SSL server endpoints don't provide a way to verify client certificates
#7050 - Remove long deprecated _bwHack parameter
#7460 - HTTP2 inital patch
#7879 - documentation incorrectly says that Agent does not do certificate validation in Agent
#7926 - TLSMemoryBIOProtocol.loseConnection does not work properly
#7998 - twisted.conch.ssh.keys should be ported to Python 3
#8009 - twisted.web.twcgi should be ported to Python 3
#8077 - Session cookie under python3 never found
#8115 - Some Twisted tests use the deprecated "assertEquals" and similar
#8125 - LoopingCall.withCount with interval 0
#8126 - Improvements to Twisted’s SMTP server
#8127 - Support a user defined facility when using syslog
#8128 - ESMTP class can't be usefully subclassed to add extensions
#8129 - twisted.web.http_headers.Headers should work on bytes and Unicode

Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at December 30, 2015 10:59 AM

November 2015 week 1 and 2 - SFC Sponsored Development

This is my report for the work done in the first 2 weeks of November as part of the 2015 Twisted Maintainer Fellowship program.
Important changes made in these 2 weeks:
  • NPN and ALPN support for the TLS transport.
  • Conch SSH client with diffie-hellman-group-exchange-sha256 support and support for latest release of OpenSSH server.
  • Prepare twisted.web for HTTP/2 supoprt.
  • Start working on replacing PyCrypto usage with cryptography.
  • Twisted review queues continues to be small and to have a review response time of less than 2 days.

Tickets reviewed and merged:

  • #5976 - Remove assertRaises backport from Trial
  • #6980 - doc-build should not build documentation with warnings.
  • #7672 - Add diffie-hellman-group-exchange-sha256 key exchange algorithm to conch
  • #7860 - NPN and APNL support in Twisted.
  • #8068 - twisted.python.urlpath.URLPath doesn't respect the default path of '/'
  • #8072 - twisted.python.urlpath attributes are not mutables
  • #8091 - detox support
  • #8094 - Remove IStreamClientEndpointStringParser.
  • #8095 - tap2deb/tap2rpm tests should not produce deprecation warnings
  • #8096 - Jelly leaks set deprecation warnings.
  • #8097 - MSN leaked deprecation warnings.
  • #8100 - Use MSG_KEX_DH_GEX_REQUEST in conch SSH client

Tickets reviewed and not merged yet:

  • #6757 - Get client certificates for SSL server endpoints
  • #7413 - Migrate conch.ssh to PyCA cryptography.
  • #7998 - twisted.conch.ssh.keys Python 3 port
  • #8025 - Make Trial work on Windows+Python3
  • #8036 - Document the usage of the new logging system
  • #8054 - Twisted's README and other top-level documents should be in rst format
  • #8086 - eventFlatten not preserving the format
  • #8090 - Thread pool does not create enough workers to handle multiple invocations of calllInThread
  • #8099 - Port twisted.internet.serial to Python 3.
  • #8105 - t.protocol.basic.IntNStringReceiver.dataReceived inconsistency
Thanks to theSoftware Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at December 30, 2015 10:59 AM

October 2015 - SFC Sponsored Development

October 2015 - SFC Sponsored Development

This is my first post from a series of post dedicated to my work as part of the 2015-2016 Twisted Maintainer Fellowship program.

The Twisted project via Software Freedom Conservancy were kind enought to contract me as a consultant to help with the review queue and the Git migration.
I am sharing this Twisted Maintainer Fellowship program with Amber aka Hawkie aka hawkowl.

Below is the report for October 2015.

Tickets reviewed and merged:

  • #7994 - twisted.python.urlpath update to work with bytes
  • #7717 - Support for diffie-hellman-group14-sha1 in conch.ssh
  • #8042 - Python 3.5 support in Ubuntu 15.10
  • #8028 - Improve error handling in twisted.cred.checkers.FilePasswordDB.
  • #6197 - Port twisted.web.client.downloadPage to Python 3
  • #7650 - Support IPv6 address literals in IAgent implementations
Tickets reviewd and not yet merged:
  • #6757 - SSL server endpoints client certificate retrieval
  • #7926 - Improve TLSMemoryBIOProtocol.loseConnection
  • #8036 - Document the usage and testing of the new logging system in Twisted own code.
  • #7985 - Use sdist to build Twisted distributables.
  • #7860 - Add support for NPN and ALPN
  • #8025 - Have Trial on Windows with Python 3
  • #7826 - Python2.6 cleaup in Trial.
  • #5976 - Python2.6 cleanp in Trial.
  • #8068 - twisted.python.urlpath Python 3 regression.
For the Git migration part I have investigated the current infrastructure and created tickets for each part of the infrastructure which require Git migration work.

All Git migration work is planed to be deployed and documented using the twisted-infra/braid infrastructure help tool and developed using a Vagrant VM. All task are attached to the dedicated Git migration milestone.

No changed related to the Git migration were applied yet to the Twisted infrastructure.

I have also tried to engage with Twisted contributors to get some feedback regarding how we can improve the contribution and the review process.

During the last month, and after a very long time we manage to get the review queue to zero.

We don't have a way to measure the review response time, but during the last month we should also see better response times with review requests not waiting more than 3 or 4 days in the queue, many of which were replied in less than 34 hours.

Thanks to the Software Freedom Conservancy and all of the sponsors who made this possible, as well as to all the other Twisted developers who helped out by writing or reviewing code.

by Adi Roiban (noreply@blogger.com) at December 30, 2015 10:59 AM

December 06, 2015

Moshe Zadka

Big O for the working programmer

Let’s say you’re writing code, and you plan out one function. You try and figure out what are the constraints on the algorithmic efficiency of this function — how good should your algorithm be? This depends, of course on many things. But less than you’d think. First, let’s assume you are ok with around a billion operations (conveniently, modern Gigahertz processors do about a billion low-level operations per second, so it’s a good first assumption.)

If your algorithm is O(n), that means n can’t be much bigger than a billion.

O(n**2) — n should be no more than the root of a billion — around 30,000.

O(n**3) — n should be no more than the third root of a billion — a thousand.

O(fib(n)) — n should be no more than 43

O(2**n) — a billion is right around 2**30, so n should be no more than 30.

O(n!) — n should be no more than 12

OK, now let’s assume you’re the NSA. You can fund a million cores, and your algorithm parallelizes perfectly. How do these numbers change?

O(n) — trillion

O(n**2) — 30 million

O(n**3) — hundred thousand

O(fib(n)) — 71

O(2**n) — 50

O(n!) — 16

You will notice that the difference between someone with a Raspberry PI and a nation-state espionage agency is important for O(n)/O(n**2) algorithms, but quickly becomes meaningless for the higher order ones. You will also notice log(n) was not really factored in — even in a billion, it would mean the difference between handling a billion and handling a hundred million.

This table is useful to memorize for a quick gut check — “is this algorithm feasible”? If the sizes you plan to attack are way smaller, than yes, if way bigger than no, and if “right around” — that’s where you might need to go to a lower-level language or micro-optimize to get it to work. It is useful to know these things before starting to implement: regardless of whether the code is going to run on a smartwatch or on an NSA data center.

Conveniently, these numbers are also useful for memory performance — whether you need to support a one GB device or if you have a terabyte of memory.

 

 

by moshez at December 06, 2015 05:01 AM

November 12, 2015

Glyph Lefkowitz

Your Text Editor Is Malware

Are you a programmer? Do you use a text editor? Do you install any 3rd-party functionality into that text editor?

If you use Vim, you’ve probably installed a few vimballs from vim.org, a website only available over HTTP. Vimballs are fairly opaque; if you’ve installed one, chances are you didn’t audit the code.

If you use Emacs, you’ve probably installed some packages from ELPA or MELPA using package.el; in Emacs’s default configuration, ELPA is accessed over HTTP, and until recently MELPA’s documentation recommended HTTP as well.

When you install un-signed code into your editor that you downloaded over an unencrypted, unauthenticated transport like HTTP, you might as well be installing malware. This is not a joke or exaggeration: you really might be.1 You have no assurance that you’re not being exploited by someone on your local network, by someone on your ISP’s network, the NSA, the CIA, or whoever else.

The solution for Vim is relatively simple: use vim-plug, which fetches stuff from GitHub exclusively via HTTPS. I haven’t audited it conclusively but its relatively small codebase includes lots of https:// and no http:// or git://2 that I could see.

I’m relatively proud of my track record of being a staunch advocate for improved security in text editor package installation. I’d like to think I contributed a little to the fact that MELPA is now available over HTTPS and instructs you to use HTTPS URLs.

But the situation still isn’t very good in Emacs-land. Even if you manage to get your package sources from an authenticated source over HTTPS, it doesn’t matter, because Emacs won’t verify TLS.

Although package signing is implemented, practically speaking, none of the packages are signed.3 Therefore, you absolutely cannot trust package signing to save you. Plus, even if the packages were signed, why is it the NSA’s business which packages you’re installing, anyway? TLS is shorthand for The Least Security (that is acceptable); whatever other security mechanisms, like package signing, are employed, you should always at least have HTTPS.

With that, here’s my unfortunately surprise-filled step-by-step guide to actually securing Emacs downloads, on Windows, Mac, and Linux.

Step 1: Make Sure Your Package Sources Are HTTPS Only

By default, Emacs ships with its package-archives list as '(("gnu" . "http://elpa.gnu.org/packages/")), which is obviously no good. You will want to both add MELPA (which you surely have done anyway, since it’s where all the actually useful packages are) and change the ELPA URL itself to be HTTPS. Use M-x customize-variable to change package-archives to:

1
2
`(("gnu" . "https://elpa.gnu.org/packages/")
  ("melpa" . "https://melpa.org/packages/"))

Step 2: Turn On TLS Trust Checking

There’s another custom variable in Emacs, tls-checktrust, which checks trust on TLS connections. Go ahead and turn that on, again, via M-x customize-variable tls-checktrust.

Step 3: Set Your Trust Roots

Now that you’ve told Emacs to check that the peer’s certificate is valid, Emacs can’t successfully fetch HTTPS URLs any more, because Emacs does not distribute trust root certificates. Although the set of cabforum certificates are already probably on your computer in various forms, you still have to acquire them in a format usable by Emacs somehow. There are a variety of ways, but in the interests of brevity and cross-platform compatibility, my preferred mechanism is to get the certifi package from PyPI, with python -m pip install --user certifi or similar. (A tutorial on installing Python packages is a little out of scope for this post, but hopefully my little website about this will help you get started.)

At this point, M-x customize-variable fails us, and we need to start just writing elisp code; we need to set tls-program to a string computed from the output of running a program, and if we want this to work on Windows we can’t use Bourne shell escapes. Instead, do something like this in your .emacs or wherever you like to put your start-up elisp:4

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(let ((trustfile
       (replace-regexp-in-string
        "\\\\" "/"
        (replace-regexp-in-string
         "\n" ""
         (shell-command-to-string "python -m certifi")))))
  (setq tls-program
        (list
         (format "gnutls-cli%s --x509cafile %s -p %%p %%h"
                 (if (eq window-system 'w32) ".exe" "") trustfile))))

This will run gnutls-cli on UNIX, and gnutls-cli.exe on Windows.

You’ll need to install the gnutls-cli command line tool, which of course varies per platform:

  • On OS X, of course, Homebrew is the best way to go about this: brew install gnutls will install it.
  • On Windows, the only way I know of to get GnuTLS itself over TLS is to go directly to this mirror. Download one of these binaries and unzip it next to Emacs in its bin directory.
  • On Debian (or derivatives), apt-get install gnutls-bin
  • On Fedora (or derivatives), yum install gnutls-utils

Great! Now we’ve got all the pieces we need: a tool to make TLS connections, certificates to verify against, and Emacs configuration to make it do those things. We’re done, right?

Wrong!

Step 4: TRUST NO ONE

It turns out there are two ways to tell Emacs to really actually really secure the connection (really), but before I tell you the second one or why you need it, let’s first construct a little test to see if the connection is being properly secured. If we make a bad connection, we want it to fail. Let’s make sure it does.

This little snippet of elisp will use the helpful BadSSL.com site to give you some known-bad and known-good certificates (assuming nobody’s snooping on your connection):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(let ((bad-hosts
       (loop for bad
             in `("https://wrong.host.badssl.com/"
                  "https://self-signed.badssl.com/")
             if (condition-case e
                    (url-retrieve
                     bad (lambda (retrieved) t))
                  (error nil))
             collect bad)))
  (if bad-hosts
      (error (format "tls misconfigured; retrieved %s ok"
                     bad-hosts))
    (url-retrieve "https://badssl.com"
                  (lambda (retrieved) t))))

If you evaluate it and you get an error, either your trust roots aren’t set up right and you can’t connect to a valid site, or Emacs is still blithely trusting bad certificates. Why might it do that?

Step 5: Configure the Other TLS Verifier

One of Emacs’s compile-time options is whether to link in GnuTLS or not. If GnuTLS is not linked in, it will use whatever TLS program you give it (which might be gnutls-cli or openssl s_client, but since only the most recent version of openssl s_client can even attempt to verify certificates, I’d recommend against it). That is what’s configured via tls-checktrust and tls-program above.

However, if GnuTLS is compiled in, it will totally ignore those custom variables, and honor a different set: gnutls-verify-error and gnutls-trustfiles. To make matters worse, installing the packages which supply the gnutls-cli program also install the packages which might satisfy Emacs’s dynamic linking against the GnuTLS library, which means this code path could get silently turned on because you tried to activate the other one.

To give these variables the correct values as well, we can re-visit the previous trust setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(let ((trustfile
       (replace-regexp-in-string
        "\\\\" "/"
        (replace-regexp-in-string
         "\n" ""
         (shell-command-to-string "python -m certifi")))))
  (setq tls-program
        (list
         (format "gnutls-cli%s --x509cafile %s -p %%p %%h"
                 (if (eq window-system 'w32) ".exe" "") trustfile)))
  (setq gnutls-verify-error t)
  (setq gnutls-trustfiles (list trustfile)))

Now it ought to be set up properly. Try the example again from Step 4 and it ought to work. It probably will. Except, um...

Appendix A: Windows is Weird

Presently, the official Windows builds of Emacs seem to be linked against version 3.3 of GnuTLS rather than the latest 3.4. You might need to download the latest micro-version of 3.3 instead. As far as I can tell, it’s supposed to work with the command-line tools (and maybe it will for you) but for me, for some reason, Emacs could not parse gnutls-cli.exe’s output no matter what I did. This does not appear to be a universal experience, others have reported success; your mileage may vary.

Conclusion

We nerds sometimes mock the “normals” for not being as security-savvy as we are. Even if we’re considerate enough not to voice these reactions, when we hear someone got malware on their Windows machine, we think “should have used a UNIX, not Windows”. Or “should have been up to date on your patches”, or something along those lines.

Yet, nerdy tools that download and execute code - Emacs in particular - are shockingly careless about running arbitrary unverified code from the Internet. And we are often equally shockingly careless to use them, when we should know better.

If you’re an Emacs user and you didn’t fully understand this post, or you couldn’t get parts of it to work, stop using package.el until you can get the hang of it. Get a friend to help you get your environment configured properly. Since a disproportionate number of Emacs users are programmers or sysadmins, you are a high-value target, and you are risking not only your own safety but that of your users if you don’t double-check that your editor packages are coming from at least cursorily authenticated sources.

If you use another programmer’s text editor or nerdy development tool that is routinely installing software onto your system, make sure that if it’s at least securing those installations with properly verified TLS.


  1. Technically speaking of course you might always be installing malware; no defense is perfect. And HTTPS is a fairly weak one at that. But is significantly stronger than “no defense at all”. 

  2. Never, ever, clone a repository using git:// URLs. As explained in the documentation: “The native transport (i.e. git:// URL) does no authentication and should be used with caution on unsecured networks.”. You might have heard that git uses a “cryptographic hash function” and thought that had something to do with security: it doesn’t. If you want security you need signed commits, and even then you can never really be sure

  3. Plus, MELPA accepts packages on the (plain-text-only) Wiki, which may be edited by anyone, and from CVS servers, although they’d like to stop that. You should probably be less worried about this, because that’s a link between two datacenters, than about the link between you and MELPA, which is residential or business internet at best, and coffee-shop WiFi at worst. But still maybe be a bit worried about it and go comment on that bug. 

  4. Yes, that let is a hint that this is about to get more interesting... 

by Glyph at November 12, 2015 08:51 AM

October 21, 2015

Hynek Schlawack

Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

by Hynek Schlawack (hs@ox.cx) at October 21, 2015 12:30 PM

October 19, 2015

Hynek Schlawack

Testing & Packaging

How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!).

by Hynek Schlawack (hs@ox.cx) at October 19, 2015 07:00 PM

October 11, 2015

Moshe Zadka

Deploying Python Applications

Related to: 2L2T: DjangoCon FeedbackDeploying Python Applications with Docker – A SuggestionDeploying With DockerSoftware You Can Use

I have tried to make an example of some of the ways to avoid mistakes with BoredBot. Hopefully the README explains the problem domain and motivation, but here is the tl;dr:

Deploying applications is a place where it is possible to do things badly, amazingly badly and “oh God!”. I hope to slightly improve on this consensus with the current state of 2015 technology and get to “well, ok, this does not suck too much”. BoredBot is basically my attempt to show how combining NaCl, NColony, Pex and Docker can get you to a simple image which can be deployed to Amazon ECS, Google Container Engine or your own Docker servers without too much trouble.

There is still a lot BoredBot doesn’t do that I think any good deployment infrastructure should support — staging vs. production, dry-run mode, etc. Pull requests and issues are happily accepted!

by moshez at October 11, 2015 03:52 AM

September 19, 2015

Moshe Zadka

Personal calling card

I have decided it is time to make a new personal calling card for myself. Please let me know what you think! I am reasonably sure that I’m not the only person, but one of a small number of people, who actually unit-tested their card: yep, the code is valid Python and will actually print the right details (if you “pip install attrs”, of course).

Back:

back

Front:

front

by moshez at September 19, 2015 07:47 AM

September 17, 2015

Glyph Lefkowitz

Python Option Types

NULL has, rightly, been called a “billion dollar mistake”. If that is so, then None is a hundred million dollar mistake, at least.

Forgetting, for the moment, about the numerous pitfalls of a C-style NULL, Python’s None has a very significant problem of its own. Of course, the problem is not None itself; the fact that the default return value of a function is None is (in my humble opinion, at least) fine; it’s just a marker that means “nothing to see here, move along”. The problem arises from values which might be None, or might be some other, useful thing.

APIs present values in a number of ways. A value might be exposed as the return value of a method, an attribute of an object, or an entry in a collection data structure such as a list or dictionary. If a value presented in an API might be None or it might be something else, every single client of that API needs to check the type of the value that it’s calling before doing anything.

Since it is rude to use a “simple suite” (a line of code like if x: y() with no newline after the colon), that means the minimum number of lines of code for interacting with your API is now 4: one for the if statement, one for the then clause, and one for the else clause.

Worse than the code-bloat required here, the default behavior, if your forget to do this checking, is that it works sometimes (like when you’re testing it), and that other times (like when you put it into production), you get an unhelpful exception like this:

1
2
3
4
Traceback (most recent call last):
  File "<your code>", line 1, in <module>
    value.method()
AttributeError: 'NoneType' object has no attribute 'method'

Of course NoneType doesn’t have an attribute called method, but why is value a NoneType? Science may never know.

In languages with static type declarations, there’s a concept of an Option type. Simply put, in a language with option types, the API declares its result value as “maybe something, maybe null”, and then if the caller fails to account for the “null” case, it is a compile-time error.

Python doesn’t have this kind of ahead-of-time checking though, so what are we to do? In order of my own personal preference, here are three strategies for getting rid of maybe-None-maybe-not data types in your Python code.

1: Just Say No

Some APIs - especially those that require building deeply complex nested trees of data structures - use None as a way to provide a convenient mechanism for leaving a space for a future value to be filled out. Using such an API sometimes looks like this:

1
2
3
4
5
value = MyValue()
value.foo = 1
value.bar = 2
value.baz = 3
value.do_something()

In this case, the way to get rid of None is simple: just stop doing that. Instead of .foo having an implicit type of “int or None”, just make it always be int, like this:

1
2
3
4
value = MyValue(
    foo=1, bar=2, baz=3
)
value.do_something()

Or, if do_something is the only method you’re going to call with this data structure, opt for the even simpler:

1
do_something_with_value(foo=1, bar=2, baz=3)

If MyValue has dozens of fields that need to be initialized with different subsystems, so you actually want to pass around a partially-initialized value object, consider the Builder pattern, which would make this code look like the following:

1
2
3
4
5
6
builder = MyValueBuilder()
foo = builder.with_foo(1)
bar = foo.with_bar(2)
baz = bar.with_baz(3)
value = baz.build()
value.do_something()

This acknowledges that the partially-constructed MyValueBuilder is a different type than MyValue, and, crucially, if you look at its API documentation, it does not misleadingly appear to support the do_something operation which in fact requires foo, bar, and baz all be initialized.

Wherever possible, just require values of the appropriate type be passed in in the first place, and don’t ever default to None.

2: Make The Library Handle The Different States, Not The Caller

Sometimes, None is a placeholder indicating an implicit state machine, where the states are “initialized” and “not initialized”.

For example, imagine an RPC Client which may or may not be connected. You might have an API you have to use like this:

1
2
3
4
5
6
7
8
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    if self.rpc_client.connection is None:
        self.outbound_messages.append(message)
        self.rpc_client.when_connected(self.flush_outbound_messages)
    else:
        self.rpc_client.send_message(message)

By leaking through the connection attribute, rpc_client is providing an incomplete abstraction and foisting off too much work to its callers. Instead, callers should just have to do this:

1
2
3
4
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    self.rpc_client.send_message(message)

Internally, rpc_client still has to maintain a private _connection attribute which may or may not be present, but by hiding this implementation detail, we centralize the complexity associated with managing that state in one place, rather than polluting every caller with it, which makes for much better API design.

Hopefully you agree that this is a good idea, but this is more what to do rather than how to do it, so here are two strategies for achieving “make the library do it”:

2a: Use Placeholder Implementations

However, rather than using None as the _connection attribute, rpc_client’s internal implementation could instead use a placeholder which provides the same interface. Let’s the expected interface of _connection in this case is just a send method that takes some bytes. We could initialize it initially with this:

1
2
3
4
5
class NotConnectedConnection(object):
    def __init__(self):
        self._buffer = b""
    def send(self, data):
        self._buffer += b

This allows the code within rpc_client itself to blindly call self._connection.send whether it’s actually connected already or not; upon connection, it could un-buffer that data onto the ready connection.

2b: Use an Explicit State Machine

Sometimes, you actually have quite a few states you need to manage, and this starts looking like an ugly proliferation of lots of weird little flags; various values which may be True or False, or None or not-None.

In those cases it’s best to be clear about the fact that there are multiple states, and enumerating the valid transitions between them. Then, expose a method which always has the same signature and return type.

Using a state machine library like ClusterHQ’s “machinist” or my Automat can allow you to automate the process of checking all the states.

Automat, in particular, goes to great lengths to make your objects look like plain old Python objects. Providing an input is just calling a method on your object: my_state_machine.provide_an_input() and receiving an output is just examining its return value. So it’s possible to refactor your code away from having to check for None by using this library.

For example, the connection-handling example above could be dealt with in the RPC client using Automat like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class RPCClient(object):
    _machine = MethodicalMachine()
    @_machine.state()
    def _connected(self):
        "We have a connection."
    @_machine.state()
    def _not_connected(self):
        "We have no connection."
    @_machine.input()
    def send_message(self, message):
        "Send a message now if we're connected, or later if not."
    @_machine.output()
    def _send_message_now(self, message):
        "Send a message immediately."
        # ...
    @_machine.output()
    def _send_message_later(self, message):
        "Enqueue a message for when we are connected."
        # ...
    @_machine.output()
    def _send_queued_messages(self, connection):
        "send all messages enqueued by _send_message_later"
        # ...
    @_machine.input()
    def connection_established(self, connection):
        "A connection was established."
    _connected.upon(send_message, enter=_connected, output=[_send_message_now])
    _not_connected.upon(send_message, enter=_connected,
                        output=[_send_message_later])
    _not_connected.upon(connection_established, enter=_connected,
                        output=[_send_queued_messages])

3: Make The Caller Account For All Cases With Callbacks

The absolute lowest-level way to deal with multiple possible states is to, instead of exposing an attribute that the caller has to retrieve and test, expose a function which takes multiple callbacks, one for each case. This way you can provide clear and immediate error feedback if the caller forgets to handle a case - meaning that they forgot to pass a callback. This is only really suitable if you can’t think of any other way to handle it, but it does at least provide a very clear expectation of the interface.

To re-use our connection-handling logic above, you might do something like this:

1
2
3
4
5
6
7
def try_to_send(self):
    def connection_present(connection):
        connection.send_message(my_message)
    def connection_not_present():
        self.enqueue_message_for_later(my_message)
    self.rpc_client.with_connection(connection_present,
                                    connection_not_present)

Notice that while this is slightly awkward, it has the nice property that the connection_present callback receives the value that it needs, whereas the connection_not_present callback doesn’t receive anything, because there’s nothing for it to receive.

The Zeroth Strategy

Of course, the best strategy, if you can get away with it, may be the non-strategy: refuse the temptation to provide a maybe-None, just raise an exception when you are in a state where you can’t handle. If you intentionally raise a specific, meaningful exception type with a good error message, it will be a lot more pleasant to use your API than if return codes that the caller has to check for pop up all over the place, None or otherwise.

The Principle Of The Thing

The underlying principle here is the same: when designing an API, always provide a consistent interface to your callers. An API is 1 to N: you have 1 API implementation to N callers. As N→∞, it becomes more important that any task that needs performing frequently is performed on the “1” side of the equation, and that you don’t force callers to repeat the same error checking over and over again. None is just one form of this, but it is a particularly egregious form.

by Glyph at September 17, 2015 10:04 PM

September 14, 2015

Twisted Matrix Laboratories

Twisted Devlog - September 2015

Hello everyone! This is something I'm going to try a bit more often -- writing about what's going on with Twisted, what changes are coming, what to expect, and how you can take advantage of what we're doing.

More Python 3

Twisted 15.4 has been released, and with it, the first Trial on Python 3. This means that downstream projects can now use the same testing tools for both Python 2 and Python 3. It is stable enough to be used by the Twisted project on our buildbots, but bugs may exist; if you find something that doesn't work as it should, please let us know by filing a ticket on the issue tracker.

Twisted 15.5 (coming Oct/Nov) will also contain the first release of twistd (the Twisted Daemon runner) on Python 3. The only plugin shipping with it right now is web, with most base features (file serving, running user-specified Resources). WSGI support and distributed web serving has not yet made it in, but the WSGI support is coming soon.

Twisted's porting status is now at ~50% by 'covered lines' (including tests).

More Developer Tools

Developers of Twisted should find the new tox.ini in our source tree useful. It provides a bunch of environments for tox (a testing helper that automatically builds test virtualenvs). Run "tox -l" in your Twisted checkout to see what environments are available.

Removing Python 2.6 Support

Twisted 15.4 was the last release with Python 2.6 support. Now it is not a supported platform and support code is being removed, Twisted 15.5 will no longer be importable on Pythons before 2.7 (for 2.x -- it is unimportable on Py3 versions before 3.3 as well).

Supported Platforms

This year, a handful of new platforms have been added to the fold -- FreeBSD 10.1 (Python 2.7/3.4), RHEL7 (Python 2.7), OS X 10.10 (Python 2.7), Fedora 21/22 (Python 2.7/3.4), Ubuntu 15.04 (Python 2.7/3.4), and Debian 8 (Python 2.7/3.4).

The older platforms which have been removed are RHEL6 (only has 2.6), Debian 6 (only has 2.6), Fedora 17/18/19 (EOL OS), and OS X 10.6 (only has 2.6, EOL OS).

Python 3.5 has now been released, and we will aim to make that a supported platform as soon as it is available in a released OS or official Ubuntu PPA.

Packaging Modernisation

More work is being done on bringing Twisted's packaging into the 21.1st Century.

In Twisted 15.3, we stopped supporting Twisted being packaged as 'subprojects'. There is now only one Twisted package. This has simplified our build tools considerably and made it less confusing when installing Twisted, just giving the one option with the whole distribution, rather than several with fuzzily-defined boundaries and unclear requirements.

Twisted 15.1 was the first release which uses the Setuptools extra_requires feature, where extra dependencies can be installed using pip automatically. For example, to install Twisted and the dependencies required for TLS support, you can use "pip install twisted[tls]". A full list of the optional dependencies that can be installed in this manner can be found in our docs.

It's long been the case that Twisted needs a C compiler to be installed, but that may not be the case for much longer. There is a ticket which is working on moving Twisted's optional C Extensions to another package, _twistedextensions. The base Twisted package will then be installable without a C compiler, and users that require the C extensions provided in _twistedextensions will be able to get it on PyPI.

Since the base Twisted release will be pure-Python, we will now be able to start shipping wheels for both Python 2 (full release) and Python 3 (limited release, containing only the Py3 ported modules).

Additionally, .msi and .exe installers will no longer be released for Windows, making .whls the the only binary installation method for Twisted's C extensions. A CExt .whl will additionally be shipped for Python 2.7 on OS X 10.10 (64bit). No C Extension package will be shipped for Python 3 on any platform, until it contains Python 3 compatible C Extensions.

Call for Sponsors & Volunteers

Twisted's lifeblood is volunteers, either personal or corporate. More people power means that we can get more done, more eyes reviewing code, more ideas and solutions that might solve our unique problems.

Glyph has written more about fundraising for Twisted, and why you should do so here. If you would like to support financially directly, you can donate to Twisted through the Twisted Software Foundation/Software Freedom Conservancy, a 501(c)3 non-profit.

If you would like to volunteer your time to the project, please see our contributors documentation.

Until next time,

Amber Brown (HawkOwl)
Twisted Release Manager

by HawkOwl (noreply@blogger.com) at September 14, 2015 09:11 AM

September 13, 2015

Glyph Lefkowitz

Connecting To A Server On The Internet [Nightmare Difficulty]

The Setup

I recently switched internet service providers, from Comcast to WebPass. Comcast had been progressively upgrading our connection, and I honestly didn’t have any complaints, but WebPass is four times faster. Eight times faster, maybe? It’s so fast that I could demonstrate that my router hardware was the bottleneck and I needed to buy something physically capable of more bits per second. There’s gigabit Ethernet coming straight out of my wall, no modem required. So that’s exciting. The not-exciting part is that WebPass uses what is euphemistically called “carrier grade” NAT to deliver IPv4 service. This means I have no public IPv4 address at all.

This is exactly the grim meathook future that I predicted over a decade ago.

One interesting consequence of having no public IPv4 address is that I could no longer make inbound connections to my home network. When I’m roving around with my laptop, I like to be able to check in with my home servers. Plus, I’m like, a networking dude, or something, so I should know how this stuff works, and being able to connect to networks is a good first step to doing interesting things with them.

The Gear

I’m using OpenWrt (Chaos Calmer) firmware on my home router (although this also works on Barrier Breaker). If you configure its built-in IPv6 stuff naively, connecting to your home network is no problem at all! Every device on your network suddenly has its own public IP address on the Internet.

The Problem

In case it’s not clear: don’t do that. I can guarantee you that all of your computers and some things which aren’t obviously computers (game consoles, “smart” scales, various handheld devices, your “cloud” thermostat, probably your refrigerator, maybe some knives or guns that you didn’t know had internet access: anything that can connect to WiFi) are listening on a surprising array of ports, and every single one of them has some interesting security vulnerability which an attacker could potentially use to do bad stuff. You should of course keep your operating systems and device firmware updated to make sure you’ve always got the latest security patches, but even if you’re perfect at that, many devices simply don’t get updates. So the implicit firewall that NAT gave us is still important; except now we need it as an explicit firewall that actually blocks traffic, and not as an accident of the fact that devices don’t have their own externally-available addresses.

The Requirements

Given these parameters, the desired state of my router (and, I hope, some of yours) is:

  1. Explicitly deny all inbound IPv6 connections to all devices that aren’t expecting them, such as embedded devices and laptops.
  2. Explicitly allow all inbound IPv6 connections to specific devices and ports that are expecting them, like home servers.

Casual Difficulty: Rejecting Incoming Traffic

OpenWrt has a web interface, called “LuCI”, which has a nice firewall configuration application. You’ll want to use it to reject IPv6 inputs, like so:

Reject IPv6 Inputs

It also has a “port forwarding” interface, but forwarding a port is something that only really makes sense in an IPv4 NAT scenario. In an IPv6 scenario you don’t need to “forward” since you already have your own address. Accordingly, the interface populates the various IP address selectors with only your internal IPv4 addresses, and doesn’t work on IPv6 interfaces on the router.

Now we get to the really tricky part: allowing inbound traffic to that known device.

Normal Difficulty: Home Server Naming

First, we need to be able to identify that device. Since residential IPv6 prefix allocations are generally temporary, you can’t rely on your IP address to stay the same forever. This means we need to have some kind of dynamic DNS that updates periodically.

Personally, I have accomplished this with a little script using Apache libcloud that I have run in a crontab. The key part of that script is this command line, which works only on Linux:

1
ip -o -d -6 addr show primary scope global br0

that enumerates all globally-routable IPv6 addresses on a given interface (where the interface is br0 in that script, but may be something else on your computer; most likely eth0).

In my crontab, I have something like this:

1
2
@hourly ./.virtualenvs/libcloud/bin/python .../update_home_dns.py \
    rackspace_us my.dns.user home.mydomain.example.com

And I ran the script once manually to initialize the keyring entry with my API key in it.

Note: I would strongly recommend creating a restricted user that only has access to DNS records for this, if you’re doing it on a general cloud provider like Rackspace or Amazon. On Rackspace, this can be accomplished by going to the “Account” menu at the top right, “User Management”, “Create User”, and selecting the “Custom” radio button in the “Product Access” section. Then, log in as that user, go to the “Account Settings”, and copy the API key.

Nightmare Difficulty: ip6tables suffix matching

Now we can access the address of home.mydomain.example.com from the public Internet, but it doesn’t do us much good; it’s still firewalled off. And since the prefix allocation changes regularly, we can’t simply open up a port in the firewall with a hard-coded address as you might in a regular data-center environment.

However, we can use a quirk of IPv6 addressing to our advantage.

In case you’re not already familiar, let me briefly explain the structure of IPv6 addresses. An IPv6 address is a 128-bit number. That is a really, really big number, which means a really, really big set of addresses; unlike IPv4, your ISP doesn’t need to jealously hoard them and hand them out sparingly, one at a time.

One of the features of IPv6 is the ability to allocate your own individual addresses. When your provider gives you an IPv6 address, they give yo a so-called “prefix allocation”, which is the first half or so - 64 bits - of an address. Devices on your network may then generate their own addresses that begin with those 64 bits.

Devices on your network then decide on their addresses in one of two ways:

  1. If they’re making an outgoing connection, they choose a random 64-bit number and use that as the suffix; a so-called “temporary” address.
  2. If they’re listening for incoming connections, they generate a 64-bit number based on their Ethernet card’s MAC address; a so-called “primary” address.

It is this primary address that the above ip command will print out. (If you see two addresses, and one starts with an f, it’s the other one; this post is already long enough without explaining the significance of that other address.)

Within iptables, the --source and --destination (or -s and -d) matching options are documented as having the syntax “address[/mask]”. The “mask” there is further documented as “... either an ... network mask (for iptables) or a plain number, specifying the number of 1’s at the left side of the network mask.” Normally, sources and destinations are matching network prefixes, because that’s what you would normally want to allow or deny traffic from: a prefix representing a network which in turn represents a specific group of computers. But, given that the IPv6 address for an interface on a given Ethernet card with a given MAC address will have a stable suffix derived from that MAC address, and a prefix who changes based on your current (dynamic) prefix allocation, we can write an ip6tables rule to allow connections based on the suffix by constructing an address that has a mask matching the last 64 bits of the address.

So, let’s say your ip command, above, output an address like 2001:1111:2222:3333:1234:5678:abcd:ef01. This means the stable part of your address is that trailing 1234:5678:abcd:ef01. And that means we want iptables to allow inbound traffic to a specific port. To allow incoming HTTPS on port 443, in the OpenWRT control panel, go to Network → Firewall, click the “Custom Rules” tab, and input a line something like this:

1
2
3
4
5
6
ip6tables -A forwarding_rule \
    -d ::1234:5678:abcd:ef01/::ffff:ffff:ffff:ffff \
    -p tcp -m tcp \
    --dport 443 \
    -m comment --comment "Inbound-HTTPS" \
    -o br-lan -j ACCEPT

This says that we are going to append to the forwarding_rule chain (a chain specific to OpenWRT), a rule matching packets whose destination is our suffix-matching mask with the suffix of the server we want to forward to, for the TCP protocol using the tcp matcher, whose destination port is 443, giving it a comment of “Inbound-HTTPS” so it’s identifiable when we’re looking at the firewall later, sending it out the br-lan interface (again, somewhat specific to OpenWRT), and jumping to the ACCEPT target, because we should let this through.

Game Over

Congratulations! You can now connect to a server on a port, in the wonderful world of tomorrow. I hope you look forward as much as I do to the day when this blog post will be totally irrelevant, and all home routers will simply come with a little UI that lets you connect a little picture of a device to a little picture of a cloud representing the internet, and maybe type a port number in.

by Glyph at September 13, 2015 06:30 AM

September 12, 2015

Glyph Lefkowitz

Software You Can Use

Python has a big problem. While it’s easy and fun to produce software in Python, it’s hard to produce software that people - especially laypeople who are not professional software developers - can use.

In the modern software ecosystem, there are a few places that you might want to use a program:

  1. On a server, by loading a web page.
  2. On a web page, by running it in your browser.
  3. As a command-line tool, on Mac,
  4. ... Windows,
  5. ... or Linux.
  6. As a desktop application, on Mac,
  7. ... Windows,
  8. ... or Linux.
  9. As a mobile application, on iOS,
  10. ... or Android,
  11. ... Or Windows Phone. (Just kidding.)

Out of these 10 scenarios, Python currently has half of a good deployment story for one of them: running an application on a server, as a back-end. This is a serious problem for the future of Python and one we need to figure out how to face as a community.

Even the “good” deployment story is somewhat convoluted, as you need to know about at least some Linux distribution’s package manager, and native dependencies, and Pip, and virtualenv, and wheels, and probably docker too.

If you want to run a Python application in your browser, your best bet right now is probably Brython. However, brython is still in its infancy, and basic faciltiies like preparing your code for production to achieve acceptable start-up performance, and an explanation of how to use libraries are missing. With big chunks like that missing it’s hard to advocate for Brython’s use in production.

Moving on to scenario 3, this may be one of the best-supported configurations; py2app actually works surprisingly well. But it’s still incredibly confusing for new users. Which native objects (dylibs and frameworks and data files) to bundle are options to py2app itself, and not something that can be handled automatically by libraries. So if, for example, PyGame depends on SDL.framework or libSDL.dylib, you as an application developer need to understand how to figure that out and specify that list.

On Windows, the situation gets worse. To work as a Windows exectuable, you need to bundle the Python interpreter, but unlike in an OS X application, you can’t just copy in a whole directory. So you end up needing a tool like PyInstaller or cx_Freeze. PyInstaller hasn’t seen a release in the last 2 years; it doesn’t support Python 3. It also doesn’t work: if I try to package the most basic Twisted program possible, with pyinstaller 2.1 I get “no module named zope.interface”, and if I try to package it with pyinstaller trunk, I get “no module named itertools”. cx_Freeze similarly can’t figure out how to include zope.interface no matter what I tell it to do. This problem isn’t specific to libraries that I use; most Python projects will run into it.

py2exe, on the other hand, only supports Python 3.3+, and so is unusable with a lot of important python libraries.

For a GUI application for Linux, you might have some small hope of building a distro-specific package that users could install, but that would involve using a distro-specific toolchain that had nothing to do with Python, and you need to repeat that work for Debian, Ubuntu, Fedora, and whatever other distros you want to support.

All of these same tools are what I would use to build a stand-alone command-line executable for Windows, Mac, or Linux, and they all break down in similar ways.

In the mobile space, there is absolutely zero tooling included with the language to even get started there. It might be possible to use Kivy to get a build onto iOS or Android. I haven’t had an opportunity to test those. But they still require you to install Homebrew, and a C compiler, and a whole bunch of fairly specific platform tooling to get started, and there are lots of different ways that can go wrong.

So how do other languages stack up?

  1. In JavaScript, if you want an application in the browser, it’s as simple as ... writing some JavaScript.
  2. In JavaScript, if you want a desktop application, you can just grab Electron and be up and running in a few minutes.
  3. In JavaScript, if you want a command-line UNIX tool, you can grab nar and build something self-contained almost immediately.
  4. And of course, in Go, there’s no way to get anything but a fully functional self-contained executable at the end of the build process. Everything is fully redistributable by default.

As a community, Python needs a clear, well-documented, well-supported, modern way to produce build artifacts that are easy to create and easy to share. We need to have this for all popular platforms and the browser. This is a tricky problem: it requires knowledge of lots of fiddly build details.

This wheel has been re-invented, poorly, a dozen or so times. My list above was just a subset. In addition to py2app, py2exe, pyinstaller, cx_Freeze, and the Kivy bundling tools, we’ve also got terrarium, bbfreeze (which is unmaintained), pipsi, pex, and probably some others I don’t know about.

In order to compete with JavaScript and Go for developers’ attention, Python must be able to become an implementation detail and disappear when the user is running the program. This means that some of these tools (terrarium, pipsi, pex) are not suitable for this purpose because they are envelopes for deployment into an environment with an installed Python interpreter.

All of the tools I’m aware of that are trying to provide fully self-contained execution, though (pyinstaller, cx_freeze, bb-freeze, py2app) are poorly designed because they value optimized distribution size over actually working by default. Rather than reading setuptools metadata and discovering the full set of dependencies which have been declared to be required, all of these tools use weird AST-parsing heuristics and buggy path-traversal hacks to try to construct a guess as to the minimal set of files that might be required, then require the poor application developer to fill in the gaps. This means none of them work with namespace packages, none of them work properly with plugin systems or runtime configuration systems; generally, they don’t work correctly with late binding, which is one of Python’s greatest strengths. Of course, a full Python interpreter with the whole standard library is quite large. If we had a tool that worked well but produced very large executables, we could of course start adding an “optimized mode” to try to crunch things down for production.

And all this is to say nothing of the insanely intricate and detailed knowledge that every Python programmer eventually acquires about the C runtime semantics of their chosen platform. When a C compiler is required but missing, most tools still just emit tracebacks. When a shared library goes missing dues to an OS upgrade or package removal, you just see whatever the dynamic linker thinks to report, no explanation of how to fix it or what to do next.

The Python packaging ecosystem has made great strides in the last few years; Pip, in particular, has gone from a buggy and insecure mess to a mostly workable software delivery mechanism for developers. There are still bugs, but they are getting dealt with at a reasonable clip. However, Pip only delivers software to developers, and still requires you to have a Python runtime, a build environment, and tricky command-line tools to get things in place for development. The Python community has effectively no tools to deliver software to users.

To sum up, we need a tool which:

  1. works by default, including with “tricky” packages with namespace packages, data files, and native dependencies
  2. produces useful, actionable error messages when something is missing and the build can’t be completed (like “you don’t have a C compiler installed” or “you need to install Homewbrew and then brew install openssl”)
  3. can produce both command-line and GUI executables for the mac, windows, and linux (and, for bonus points, a web browser)

The bad news is that I don’t have the time to start this project myself, and I’m not sure who does. The worse news is that every day we don’t have this, more and more people are re-writing their user-facing tools and applications in JavaScript or Go or Swift or Java, to suit their target platform, because it is honestly easier to learn an entirely new programming language and toolchain, and rewrite an entire application than to figure out how to build a self-contained executable in Python right now.

The good news, though, is that it’s a simple matter of programming, and that all the core technologies for doing all the really hard things that need to be done (pip, and zipimport and macholib, for example) already exist. It’s just a simple matter of programming: wiring together the metadata from setuptools, determining native dependencies with something like otool or ldd (or whatever the equivalent is on Windows, I still haven’t figured that out myself), pulling them all into a bundle, tacking the Python interpreter on.

by Glyph at September 12, 2015 09:24 AM

September 04, 2015

Twisted Matrix Laboratories

Twisted 15.4 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.4, codenamed "Trial By Fire".

Twisted is continuing to forge ahead, with the highlights of this release being:
  • twisted.positioning, the rest of twisted.internet.endpoints, KQueueReactor (for real this time), and twisted.web.proxy/guard have all been ported to Python 3.
  • Trial has been ported to Python 3! This was made possible by a Python Software Foundation grant.
  • Twisted officially supports several more platforms: Py3.4 on FreeBSD, Py2.7/Py3.4 on Fedora 21/22, and Py2.7 on RHEL7.
  • Python 2.6 is no longer supported, and support for Debian 6, and RHEL6 has been removed because of this.
  • Support for the EOL'd platforms of Fedora 17/18/19 has been removed.
  • Twisted has moved to requiring setuptools for installation.
  • twisted.python.failure.Failure's __repr__ now includes the  exception message.
  • 19 tickets in total closed!
You can find the downloads at https://pypi.python.org/pypi/Twisted (or alternatively http://twistedmatrix.com/trac/wiki/Downloads). The NEWS file can be found at https://github.com/twisted/twisted/blob/trunk/NEWS.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber "Hawkie" Brown

by HawkOwl (noreply@blogger.com) at September 04, 2015 08:02 AM

August 12, 2015

Moshe Zadka

Kushiel’s Legacy (the Phedre Trilogy) — Sort of a book review

Below spoilers for the Kushiel’s Legacy books (first trilogy). If you have not read them, I highly recommend reading before looking below — they are pretty awesome. Technically, there are also spoilers for Lord of the Rings, though if you have not read it by now, I don’t know what else to do.

I want to start by talking about Lord of the Rings. As everyone knows, it tells the story of the failed attempt of the dwarves to restore their ancient homeland in a retrospective. I think they had a side-plot about a couple of short guys throwing a ring into a mountain or something? Anyway, back to the main point of LotR: the dwarves. The story of a people who everyone thinks of as just being greedy, master of goldsmithing and gem cutting, but with the desire to return to a homeland — but who wake an ancient evil upon returning there. Though Tolkien maintained he was not into allegories, this one is a pretty thin facade for talking about the struggles of the Jewish people. The reason for this long introduction to LotR is to explain how I read books: parochially, maybe, if they have Jews in them, the Jewish plot lines stir my heart much more than the story the author tried to write.

Well, I have to say, Jacqueline Carey knows how to write Jews. Admittedly, it took me until the end of Kushiel’s Dart to figure out that the Yeshuites were not supposed to be Jesuits. They are really an interesting take on Jews for Jesus. In a world that lacked Christianity, since Elua and his followers took on the role of taking down the Roman (“Tiberian”) empire, there are no evangelical religions. The Jews end up accepting Jesus as the “Mashiach” (not “Christ”, since the Romans aren’t there to take over the terms), and keep being…pretty Jewish. They are discriminated against in a heart-touching manner in Venice (ok, “La Serenissima”) and, like every good Jews, have fierce arguments about restoring their homeland. Near as I can tell, judging from estimated geography, they decide to make their homeland in Switzerland? Or maybe Lichtenstein? Belgium? Netherlands? In any case, the more devout, as always, pray for salvation while the younger and more practically minded take up the sword, and head up north to make a place where they will be free from discrimination.

All the while, the Jews in the Kushielverse keep studying the Talmud, and the more mystical minded of them take up something suspiciously like Qabbalah and seek the true name of God. They speak “Habiru” among each other, and have stories of the lost tribes. That’s, of course, the most awesome parts — the ten tribes have a “Judaism without Jesus” (what we would call, well, Judaism) since they have not heard of him having been sequestered in…Uganda. I, well, I certainly appreciate the subtle humor there.

[Please note that as far as I know, the author really intended the Yeshuites to add color to the religions in the Kushielverse, and actually fairly little of the plotline involves them — it is really the story of a masochist prostitute in a country where not being polyamorous is a blasphemy…]

by moshez at August 12, 2015 03:16 AM

August 04, 2015

Twisted Matrix Laboratories

Twisted 15.3 Released

On behalf of Twisted Matrix Labs, I am honoured to announce the release of Twisted 15.3.

We're marching confidently into the future, and this release demonstrates this:
  • 10+ modules ported to Python 3 (see NEWS for specifics)
  • Twisted Lore has been removed.
  • Twisted no longer releases as 'subprojects' -- there is only one Twisted, long may it reign.
  • twistd now uses cProfile (instead of Hotshot) as the default profiler.
  • 40+ more tickets closed overall.
You can find the downloads at https://pypi.python.org/pypi/Twisted (or alternatively http://twistedmatrix.com/trac/wiki/Downloads). The NEWS file can be found at https://github.com/twisted/twisted/blob/trunk/NEWS.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber "Hawkie" Brown

by HawkOwl (noreply@blogger.com) at August 04, 2015 07:07 AM

July 17, 2015

Glyph Lefkowitz

Sometimes, You Just Want A Button

I like making computers do stuff. I find that getting a computer to do what I want it to produces a tremendously empowering feeling. I think Python is a great language to use to tell computers what to do.

In my experience, the most empowering feeling (albeit often, admittedly, not the most useful or practical one) is making one’s own computer do something cool. But due to various historical accidents, the practical bent of the Python community mainly means that we get “server-side” or “cloud” computers to do things: in other words, “other people’s computers”.

If you, like me, are a metric standard tech industry hipster, “your own computer” means “Mac OS X”. Many of us have written little command-line tools to automate things, and thank goodness OS X is enough of a UNIX that that works nicely - and it’s often good enough. But sometimes, you just want to press a button and have a thing happen; sometimes a character grid isn’t an expressive enough output device. Sometimes you want to be able to take that same logical core, or some useful open source code, that you would use in the cloud, and instead, put it into an app. Sometimes you want to be able to learn about writing desktop apps while leveraging your existing portfolio of skills. I have done some of this, and intend to do more of it at work, and so I’d like to share with you some things I’ve learned about how to do it.

Back when I was a Linux user, this was a fairly simple proposition. All the world was GTK+ back then, in the blissful interlude between the ancient inconsistent mess when everything was Xt or Motif or Tk or Swing or XForms, and the modern inconsistent mess where everything is Qt or WxWidgets or Swing or some WebKit container with its own pile of gross stylesheet fruit salad.

If you had a little script that you wanted to put a GUI on back then, the process for getting started was apt-get install python-gtk and then just to do something like

1
2
3
4
5
import gtk
w = gtk.Window("My Window")
b = gtk.Button("My Button")
w.add(b)
gtk.main()

and you were pretty much off to the races. If you wanted to, you could load a UI file that you made with glade, if you had some complicated fancy stuff to display, but that was optional and reasonably straightforward to use. I used to do that all the time, for various personal computing tasks. Of course, if I did that today, I’m sure they’d all laugh at me.

So today I’d like to show you how to do the same sort of thing with OS X.

To be clear, this is not a tutorial in Objective C, or a deep dive into Mac application programming. I’m going to make some assumptions about your skills: you’re a relatively experienced Python programmer, you’ve done some introductory Xcode material like the Temperature Converter tutorial, and you either already know or don’t mind figuring out the PyObjC bridged method naming conventions on your own.

If you are starting from scratch, and you don’t have any code yet, and you want to work in Objective C or Swift, the modern Mac development experience is even smoother than what I just described. You don’t need to install any “packages”, just Xcode, and you have a running app before you’ve written a single line of code, because it will give you a working-by-default template.

The problem is, if you’re a Python developer just trying to make a little utility for your own use, once you’ve got this lovely Objective C or Swift application ready for you to populate with interesting stuff, it’s a really confusing challenge to get your Python code in there. You can drag the Python.framework from homebrew into Xcode and then start trying to call some C functions like PyRun_SimpleString to get your program bootstrapped, but then you start running into tons of weird build issues. How do you copy your scripts to the app bundle? How do you invoke distutils? What about shared libraries that you’ve linked against? It’s hard enough to try to jam anything like setuptools or pip into the Xcode build system that you might as well give up and rewrite the logic; it’ll be less effort.

If you’re a Python developer you probably expect tools like virtualenv and pip to “just work”. You expect to be able to put a file into a folder and then import it without necessarily writing a whole build toolchain first. And if you haven’t worked with it a lot, you probably find Xcode’s UI bewildering. Not to mention the fact that you probably already have a text editor that you like already, and don’t want to spend a bunch of time coming up to speed on a new one just to display a button.

Luckily, of course, there’s pyobjc, which lets you write your whole application in Python, skipping the whole Xcode / Objective C / Swift side-show and just add a little UI logic. The problem I’m trying to address here is that “just adding a little UI logic” (rather than designing your program from the ground up as an App) in the Cocoa / ObjC universe is famously and maddeningly obscure. gtk.Window() is a reasonably straightforward first function to call; [[NSWindow alloc] initWithContentRect:styleMask:backing:defer:] not so much.

This is not even to mention the fact that all the documentation, all the tutorials, and all the community resources pretty much expect you to be working with .xibs or storyboards inside the Interface Builder portion of Xcode, and you’re really on your own if you are trying to do everything outside of that. py2app can help with some of the build issues, but it has its own problems you probably don’t want to be tackling if you’re just getting started; it’ll sap all your energy for actually coding.

I should mention before we finally dive in here that if you really just want to display a button, a text area, maybe a few fields, Toga is probably a better option for you than the route that I'm describing in this post; it's certainly a lot less work. What I am assuming here is that you want to be able to present a button at first, but gradually go on to mess around with arbitrary other OS X native APIs to experiment with the things you can do with a desktop that you can't do in the cloud, like displaying local notifications, tracking your location, recording the screen, controlling other applications, and so on.

So, what is an enterprising developer - who is still lazy enough for my rhetorical purposes here - to do? Surprisingly enough, I have a suggestion!

First, you want to make a new, empty Xcode project. So fire up Xcode, go to File, New, Project, and then:

Just-A-Button-1

Just-A-Button-2

Go ahead and give it a Git repository:

Just-A-Button-3

Now that you’ve got a shiny new blank project, you’ll need to create two resources in it: one, a user interface document, and the other, a Python file. So select File, New, and then choose OS X, User Interface, Empty:

Just-A-Button-4

I’m going to call this “MyUI”, but you can call it whatever:

Just-A-Button-5

As you can see, this creates a MyUI.xib file in your project, with nothing in it.

Just-A-Button-6

We want to put a window and a button into it, so, let’s start with that. Search for “window” in the bottom right, and drag the “Window” item onto the canvas in the middle:

Just-A-Button-7

Just-A-Button-8

Now we’ve got a UI file which contains a window. How can we display it? Normally, Xcode sets all this up for you, creating an application bundle, a build step to compile the .xib into a .nib, to copy it into the appropriate location, and code to load it for you. But, as we’ve discussed above, we’re too lazy for all that. Instead, we’re going to create a Python script that compiles the .xib automatically and then loads it.

You can do this with your favorite text editor. The relevant program looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

from Foundation import NSData
from AppKit import NSNib

nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(None, None))

from PyObjCTools.AppHelper import runEventLoop
runEventLoop()

Breaking this down one step at a time, what it’s doing is:

1
2
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

We run Interface Builder Tool, or ibtool, to convert the xib, which is a version-control friendly, XML document that Interface Builder can load, into a nib, which is a binary blob that AppKit can load at runtime. We don’t want to add a manual build step, so for now we can just have this script do its own building as soon as it runs.

Next, we need to load the data we just compiled:

1
nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")

This needs to be loaded into an NSData because that’s how AppKit itself wants it prepared.

Finally, it’s time to load up that window we just drew:

1
2
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(None, None))

This loads an NSNib (the “ib” in its name also refers to “Interface Builder”) with the init... method, and then creates all the objects inside it - in this case, just our Window object - with the instantiate... method. (We don’t care about the bundle to use, or the owner of the file, or the top level objects in the file yet, so we are just leaving those all as None intentionally.)

Finally, runEventLoop() just runs the event loop, allowing things to be displayed.

Now, in a terminal, you can run this program by creating a virtualenv, and doing pip install pyobjc-framework-Cocoa, and then python button.py. You should see your window pop up - although it will not take focus. Congratulations, you’ve made a window pop up!

One minor annoyance: you’re probably used to interrupting programs with ^C on the command line. In this case, the PyObjC helpers catch that signal, so instead you will need to use ^\ to hard-kill it until we can hook up some kind of “quit” functionality. This may cause crash dialogs to pop up if you don’t use virtualenv; you can just ignore them.

Of course, now that we’ve got a window, we probably want to do something with it, and this is where things get tricky. First, let’s just create a button; drag the button onto your window, and save the .xib:

Just-A-Button-9

And now, the moment of truth: how do we make clicking that button do anything? If you’ve ever done any Xcode tutorials or written ObjC code, you know that this is where things get tricky: you need to control-drag an action to a selector. It figures out which selectors are available by magically knowing things about your code, and you can’t just click on a thing in the interface-creation component and say “trust me, this method exists”. Luckily, although the level of documentation for it these days makes it on par with an easter egg, Xcode has support for doing this with Python classes just as it does with Objective C or Swift classes.1

First though, we’ll need to tell Xcode our Python file exists. Go to the “File” menu, and “Add files to...”, and then select your Python file:

Just-A-Button-10

Here’s the best part: you don’t need to use Xcode’s editor at all; Xcode will watch that file for changes. So keep using your favorite text editor and change button.py to look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

from objc import IBAction
from Foundation import NSObject, NSData
from AppKit import NSNib

class Clicker(NSObject):
    @IBAction
    def clickMe_(self, sender):
        print("Clicked!")

the_clicker = Clicker.alloc().init()

nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(the_clicker, None))

from PyObjCTools.AppHelper import runEventLoop
runEventLoop()

In other words, add a Clicker subclass of NSObject, give it a clickMe_ method decorated by objc.IBAction, taking one argument, and then make it do something you can see, like print something. Then, make a global instance of it, and pass it as the owner parameter to NSNib.

At this point it would probably be good to explain a little about what the “file’s owner” is and how loading nibs works.

When you instantiate a Nib in AppKit, you are creating a collection of graphical objects, connected to an object in your program that you construct. In other words, if you had a program that displayed a Person, you’d have a Person.nib and a Person class, and each time you wanted to show a Person you’d instantiate the Nib again with the new Person as the owner of that Nib. In the interface builder, this is represented by the “file’s owner” placeholder.

I am explaining this because if you’re interested in reading this article, you’ve probably been interested enough in Mac programming to do something like the aforementioned Temperature Converter tutorial, but such tutorials almost universally just use the single default “main” nib that gets loaded when the application launches, so although they show you how to do many different things with UI elements, it’s not clear how the elements got there. This, here, is how you make new UI elements get there in the first place.

Back to our clicker example: now that we have a class with a method, we need to tell Xcode that the clicking the button should call that method. So what we’re going to tell Xcode is that we expect this Nib to be owned by an instance of Clicker. To do this, go to MyUI.xib and select the “File’s Owner” (the thing that looks like a transparent cube), to to the “Identity Inspector” (the tiny icon that looks like a driver’s license on the right) and type “Clicker” in the “Class” field at the top.

If you’ve properly added button.py to your project and declared that class (as an NSObject), it should automatically complete as you start typing the name:

Just-A-Button-11

Now you need to connect your clickMe action to the button you already created. If you’ve properly declared your method as an IBAction, you should see it in the list of “received actions” in the “Connections Inspector” of the File’s Owner (the tiny icon that looks like a right-pointing arrow in a circle):

Just-A-Button-12

Drag from the circle to the right of the clickMe: method there to the button you’ve created, and you should see the connection get formed:

Just-A-Button-13

If you save your xib at this point and re-run your python file, you should be able to click on the button and see something happen.

Finally, we want to be able to not just get inputs from the GUI, but also produce outputs. To do this, we want to populate an outlet on our Clicker class with a pointer to an object in the Nib. We can do this by declaring a variable as an objc.IBOutlet(); simply add a from objc import IBOutlet, and change Clicker to read:

1
2
3
4
5
class Clicker(NSObject):
    label = IBOutlet()
    @IBAction
    def clickMe_(self, sender):
        self.label.setStringValue_(u"\N{CHECK MARK}")

In case you’re wondering where setStringValue_ comes from, it’s a method on NSTextField, since labels are NSTextFields.

Then we can place a label into our xib; and we can see it is in fact an NSTextField in the Identity Inspector:

Just-A-Button-14

I’ve pre-filled mine out with a unicode “BALLOT X” character, for style points.

Then, we just need to make sure that the label attribute of Clicker actually points at this value; once again, select the File’s Owner, the Connections Inspector, and (if you declared your IBOutlet correctly), you should see a new “label” outlet. Drag the little circle to the right of that outlet, to the newly-created label object, and you should see the linkage in the same way the action was linked:

Just-A-Button-15

And there you have it! Now you have an application that can open a window, take input, and display output.

This is, as should be clear by now, not really the preferred way of making an application for OS X. There’s no app bundle, so there’s nothing to code-sign. You’ll get weird behavior in certain cases; for example, as you’ve probably already noticed, the window doesn’t come to the front when you launch the app like you might expect it to. But, if you’re a Python programmer, this should provide you with a quick scratch pad where you can test ideas, learn about how interface builder works, and glue a UI to existing Python code quickly before wrestling with complex integration and distribution issues.

Speaking of interfacing with existing Python code, of course you wouldn’t come to this blog and expect to get technical content without just a little bit of Twisted in it. So here’s how you hook up Twisted to an OS X GUI: instead of runEventLoop, you need to run your application like this:

1
2
3
4
5
6
7
# Before importing anything else...
from PyObjCTools.AppHelper import runEventLoop
from twisted.internet.cfreactor import install
reactor = install(runner=runEventLoop)

# ... then later, when you want to run it ...
reactor.run()

In your virtualenv, you’ll want to pip install twisted[osx_platform] to get all the goodies for OS X, including the GUI integration. Since all Twisted callbacks run on the main UI thread, you don’t need to know anything special to do stuff; you can call methods on whatever UI objects you have handy to make changes to them.

Finally, although I definitely don’t have room in this post to talk about all the edge cases here, I will address two particularly annoying ones; often if you’re writing a little app like this, you really want it to take keyboard focus, and by default this window will come up in the background. To fix that, you can do this right before starting the main loop:

1
2
3
4
from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
app = NSApplication.sharedApplication()
app.setActivationPolicy_(NSApplicationActivationPolicyAccessory)
app.activateIgnoringOtherApps_(True)

And also, closing the window won’t quit the application, which might be pretty annoying if you want to get back to using your terminal, so a quick fix for that is:

1
2
3
4
class QuitWhenClosed(NSObject):
    def applicationShouldTerminateAfterLastWindowClosed_(self, app):
        return True
app.setDelegate_(QuitWhenClosed.alloc().init().retain())

(the “retain” is necessary because ObjC is not a garbage collected language, and app.delegate is a weak reference, so the QuitWhenClosed would be immediately freed (and there would be a later crash) if you didn’t hold it in a global variable or call retain() on it.)

You might have other problems with this technique, and this post definitely can’t fit solutions to all of them, but now that you can load a nib, create classes that interface builder can understand, and run a Twisted main loop, you should be able to use the instructions in other tutorials and resources relatively straightforwardly.

Happy hacking!


(Thanks to everyone who helped with this post. Of course my errors are entirely my own, but thanks especially to Ronald Oussoren for his tireless work on pyObjC and py2app over the years, as well as to Mark Eichin and Amber Brown for some proofreading and feedback before this was posted.)


  1. If it didn’t, we could get Xcode to produce the appropriate XML by simply writing appropriately-shaped classes in Objective C - or possibly Swift - and then never bothering to build them, since all Xcode cares about in this part of the process is that it can see the source. My understanding is that’s how it worked when PyObjC was first developed, and it might always become necessary again if Xcode ever stops supporting Python. I have no idea if that’s likely, but unfortunately it seems clear Python isn’t a very popular language for mac apps. 

by Glyph at July 17, 2015 07:52 AM

July 10, 2015

Duncan McGreggor

Mastering matplotlib: Acknowledgments

The Book

Well, after nine months of hard work, the book is finally out! It's available both on Packt's site and Amazon.com. Getting up early every morning to write takes a lot of discipline, it takes even more to say "no" to enticing rabbit holes or herds of Yak with luxurious coats ripe for shaving ... (truth be told, I still did a bit of that).

The team I worked with at Packt was just amazing. Highly professional and deeply supportive, they were a complete pleasure with which to collaborate. It was the best experience I could have hoped for. Thanks, guys!

The technical reviewers for the book were just fantastic. I've stated elsewhere that my one regret was that the process with the reviewers did not have a tighter feedback loop. I would have really enjoyed collaborating with them from the beginning so that some of their really good ideas could have been integrated into the book. Regardless, their feedback as I got it later in the process helped make this book more approachable by readers, more consistent, and more accurate. The reviewers have bios at the beginning of the book -- read them, and look them up! These folks are all amazing!

The one thing that slipped in the final crunch was the acknowledgements, and I hope to make up for that here, as well as through various emails to everyone who provided their support, either directly or indirectly.

Acknowledgments

The first two folks I reached out to when starting the book were both physics professors who had published very nice matplotlib problems -- one set for undergraduate students and another from work at the National Radio Astronomy Observatory. I asked for their permission to adapt these problems to the API chapter, and they graciously granted it. What followed were some very nice conversations about matplotlib, programming, physics, education, and publishing. Thanks to Professor Alan DeWeerd, University of Redlands and Professor Jonathan W. Keohane, Hampden Sydney College. Note that Dr. Keohane has a book coming out in the fall from Yale University Press entitled Classical Electrodynamics -- it will contain examples in matplotlib.

Other examples adapted for use in the API chapter included one by Professor David Bailey, University of Toronto. Though his example didn't make it into the book, it gets full coverage in the Chapter 3 IPython notebook.

For one of the EM examples I needed to derive a particular equation for an electromagnetic field in two wires traveling in opposite directions. It's been nearly 20 years since my post-Army college physics, so I was very grateful for the existence and excellence of SymPy which enabled me to check my work with its symbolic computations. A special thanks to the SymPy creators and maintainers.

Please note that if there are errors in the equations, they are my fault! Not that of the esteemed professors or of SymPy :-)

Many of the examples throughout the book were derived from work done by the matplotlib and Seaborn contributors. The work they have done on the documentation in the past 10 years has been amazing -- the community is truly lucky to have such resources at their fingertips.

In particular, Benjamin Root is an astounding community supporter on the matplotlib mail list, helping users of every level with all of their needs. Benjamin and I had several very nice email exchanges during the writing of this book, and he provided some excellent pointers, as he was finishing his own title for Packt: Interactive Applications Using Matplotlib. It was geophysicist and matplotlib savant Joe Kington who originally put us in touch, and I'd like to thank Joe -- on everyone's behalf -- for his amazing answers to matplotlib and related questions on StackOverflow. Joe inspired many changes and adjustments in the sample code for this book. In fact, I had originally intended to feature his work in the chapter on advanced customization (but ran out of space), since Joe has one of the best examples out there for matplotlib transforms. If you don't believe me, check out his work on stereonets. There are many of us who hope that Joe will be authoring his own matplotlib book in the future ...

Olga Botvinnik, a contributor to Seaborn and PhD candidate at UC San Diego (and BioEng/Math double major at MIT), provided fantastic support for my Seaborn questions. Her knowledge, skills, and spirit of open source will help build the community around Seaborn in the years to come. Thanks, Olga!

While on the topic of matplotlib contributors, I'd like to give a special thanks to John Hunter for his inspiration, hard work, and passionate contributions which made matplotlib a reality. My deepest condolences to his family and friends for their tremendous loss.

Quite possibly the tool that had the single-greatest impact on the authoring of this book was IPython and its notebook feature. This brought back all the best memories from using Mathematica in school. Combined with the Python programming language, I can't imagine a better platform for collaborating on math-related problems or producing teaching materials for the same. These compliments are not limited to the user experience, either: the new architecture using ZeroMQ is a work of art. Nicely done, IPython community! The IPython notebook index for the book is available in the book's Github org here.

In Chapters 7 and 8 I encountered a bit of a crisis when trying to work with Python 3 in cloud environments. What was almost a disaster ended up being rescued by the work that Barry Warsaw and the rest of the Ubuntu team did in Ubuntu 15.04, getting Python 3.4.2 into the release and available on Amazon EC2. You guys saved my bacon!

Chapter 7's fictional case study examining the Landsat 8 data for part of Greenland was based on one of Milos Miljkovic's tutorials from PyData 2014, "Analyzing Satellite Images With Python Scientific Stack". I hope readers have just as much fun working with satellite data as I did. Huge thanks to NASA, USGS, the Landsat 8 teams, and the EROS facility in Sioux Falls, SD.

My favourite section in Chapter 8 was the one on HDF5. This was greatly inspired by Yves Hilpisch's presentation "Out-of-Memory Data Analytics with Python". Many thanks to Yves for putting that together and sharing with the world. We should all be doing more with HDF5.

Finally, and this almost goes without saying, the work that the Python community has done to create Python 3 has been just phenomenal. Guido's vision for the evolution of the language, combined with the efforts of the community, have made something great. I had more fun working on Python 3 than I have had in many years.

by Duncan McGreggor (noreply@blogger.com) at July 10, 2015 09:35 PM

June 27, 2015

Moshe Zadka

ncolony 0.0.2 released

Fixed:

Mostly internal cleanup release: running scripts is now nicer (all through “python -m ncolony <command>”), added Code of Conduct, releasing based on versioneer, cleaned up tox.ini, added HTTP healthchecker.

Available via PyPI and GitHub!

by moshez at June 27, 2015 04:38 AM

June 26, 2015

Moshe Zadka

mainland: the main of your Python

I don’t like console-scripts. Among what I dislike is their magicality and the actual code produced. I mean, the autogenerated Python looks like:

#!/home/moshez/src/mainland/build/toxenv/bin/python2

# -*- coding: utf-8 -*-
import re
import sys

from tox import cmdline

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(cmdline())

Notice the encoding header for an ASCII only file, and the weird windows compatibility maybe generated along with a unix-style header that also contains the name of the path? Truly, a file that only its generator could love.

Then again, I don’t much like what we have in Twisted, with the preamble:

#!/home/moshez/src/ncolony/build/env/bin/python2
# Copyright (c) Twisted Matrix Laboratories.
# See LICENSE for details.
import os, sys

try:
    import _preamble
except ImportError:
    sys.exc_clear()

sys.path.insert(0, os.path.abspath(os.getcwd()))

from twisted.scripts.twistd import run
run()

This time, it’s not auto-generated, it is artisanal. What it lacks in codegen ugliness it makes up in importing an under-module at top-level, and who doesn’t like a little bit of sys.path games?

I will note that furthermore, neither of these options would play nice with something like pex. Worst of all, all of these options, regardless of their internal beauty (or horrible lack thereof), must be named veeeeery caaaaarefully to avoid collision with existing UNIX, Mac or Windows command. Think this is easy? My Ubuntu has two commands called “chromium-browser” and “chromium-bsu” because Google didn’t check what popular games there are on Ubuntu before naming their browser.

Enter the best thing about Python, “-m” which allows executing modules. What “-m” gains in awesomeness, it loses in the horribleness of the two-named-module, where a module is executed twice, once resulting in sys.modules[‘__main__’] and once in sys.modules[real_name], with hilarious consequences for class hierarchies, exceptions and other things relying on identity of defined things.

Luckily, packages will automatically execute “package.__main__” if “python -m package” is called. Ideally, nobody would try to import __main__, but this means packages can contain only one command. Enter ‘mainland‘, which defines a main which will, carefully and accurately, dispatch to the right module’s main without any code generation. It has not been released yet, but is available from GitHub. Pull requests and issues are happily accepted!

Edit: Released! 0.0.1 is up on PyPI and on GitHub

by moshez at June 26, 2015 03:13 AM

June 23, 2015

Moshe Zadka

On Codes of Conducts and “Protection”

Related: In Favor of Niceness, Community, and Civilization

I’ve seen elsewhere (thankfully, not my project, and no, I won’t link to it) that people want the code of conduct to “protect contributors from 3rd parties as well as 3rd parties from contributors. I would like to first suggest the idea that this is…I don’t even know. The US government has literal nuclear bombs, as well as aircraft carriers, and it cannot enforce its law everywhere. The idea that an underfunded OSS project can do literally anything to 3rd parties is ludicrous. The only enforcement mechanism any project can have is “you can’t contribute here” (which, by necessity, only applies to those who wanted to contribute in the first place.)

So why would a bunch of people take on a code of conduct that will only limit them?

Because, to quote the related article above, “the death cry of liberalism is not ‘death to the unbelievers’, it is ‘if you’re nice, you can join our cuddle pile.” Perhaps describing open source projects as “cuddle piles” is somewhat of an exaggeration, but the point remains. A code of conduct defines what “nice enough is”. The cuddle piles? Those conquer the world. WebKit is powering both Android and iPhone browsers, for example, making open source be, literally, the way people see the online world.

Adopting a code of conduct that says that we do not harass, we do not retaliate and we accept all people is a powerful statement. This is how we keep our own garden clear of the pests of prejudice, of hatred. Untended gardens end up wilderness. Well-tended gardens grow until we need to keep a fence around the wild and call it a “preserve”.

When I first saw the Rust code of conduct I thought “cool, but why does an open source project need a code of conduct”? Now I wonder how any open source project will survive without one. It will have to survive without me — in three years, I commit to not contribute to any project without a code of conduct, and I hope others will follow. Any project that does not have a CoC, in the interim, better behave as though it has one.

Twisted is working hard on adopting a code of conduct, and I will check one into NColony soon (a simpler one, appropriate to what is, so far, a one-person show).

by moshez at June 23, 2015 04:22 AM

June 18, 2015

Moshe Zadka

Everything but the code

(Or, “So you want to build a new open-source Python project”)

This is not going to be a “here is a list of options, but you do what is right for you” pandering post. This is going to be a “this is 2015, there are right ways to do things, and here they are” post. This is going to be an opinionated, in-your-face, THIS IS BEST post. That said, if you think that anything here does not represent the best practices in 2015, please do leave a comment. Also, if you have specific opinions on most of these, you’re already ahead of the curve — the main target audience are people new to creating open-source Python projects, who could use a clear guide.

This post will also emphasize the new project. It is not worth it, necessarily, to switch to these things in an existing project — certainly not as a “whole-sale, stop the world and let’s change” thing. But when starting a new project with zero legacy, there is no reason to do things wrong.

tl:dr; Use GitHub, put MIT license in “LICENSE”, README.rst with badges, use py.test and coverage, flake8 for static checking, tox to run tests, package with setuptools, document with sphinx on RTD, Travis CI/Coveralls for continuous integration, SemVer and versioneer for versioning, support Py2+Py3+PyPy, avoid C extensions, avoid requirements.txt.

Where

When publishing kids’ photos, you do it on Facebook, because that’s where everybody is. LinkedIn is where you connect with colleagues and lead your professional life. When publishing projects, put them on GitHub, for exactly the same reason. Have GitHub pull requests be the way contributors propose changes. (There is no need to use the “merge” button on GitHub — it is fine to merge changes via git and push. But the official “how do I submit a change” should be “pull request”, because that’s what people know).

License

Put the license in a file called “LICENSE” at the root. If you do not have a specific reason to choose otherwise, MIT is reasonably permissive and compatible. Otherwise, use something like the license chooser and remember the three most important rules:

  • Don’t create your own license
  • No, really, don’t create your own license
  • Don’t do it

At the end of the license file, you can have a list of the contributors. This is an easy place to credit them. It is a good idea to ask people who send in pull requests to add themselves to the contributor list in their first one (this allows them to spell their name and e-mail exactly the way they want to).

Note that if you use the GPL or LGPL, they will recommend putting it in a file called “COPYING”. Put it in “LICENSE” (the licenses explicitly allow it as an option, and it makes it easier for people to find the license if it always has the same name).

README

The GitHub default is README.md, but README.rst (restructured text) is perfectly supported via Sphinx, and is a better place to put Python-related documentation, because ReST plays better with Pythonic toolchains. It is highly encouraged to put badges on top of the document to link to CI status (usually Travis), ReadTheDocs and PyPI.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using the built-in unittest runner is not a good option — there is a reason the cottage industry of “test runner” evolved. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Distribute your tests alongside your code, by putting them under a subpackage called “tests” of the main package. This allows people who “pip install …” to run the tests, which means sending you bug reports is a lot easier.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox. It is not a bad idea to write a tox clause that builds and tests an installed wheel. This will require including all test code in the deployed package, which is a good idea.

Set tox to put all build artifacts in a build/ top-level directory.

Packaging

Have a setup.py file that uses setuptools. Tox will need it anyway to work correctly.

Structure

It is unlikely that you have a good reason to take more than one top-level name in the package namespace. Barring completely unavoidable name conflicts, your PyPI package name should be the same as your Python package name should be the same as your GitHub project. Your Python package should live at the top-level, not under “src/” or “py/”.

Documentation

Use sphinx for prose documentation. Put it in doc/ with a relevant conf.py. Use either pydoctor or sphinx autodoc for API docs. “Pydoctor” has the potential for nicer docs, sphinx is well integrated with ReadTheDocs. Configure ReadTheDocs to auto-build the documentation every check-in.

Continuous Integration

If you enjoy owning your own machines, or platform diversity in testing really matters, use buildbot. Otherwise, take advantage for free Travis CI and configure your project with a .travis.yml that breaks your tox tests into one test per Travis clause. Integrate with coveralls to have coverage monitored.

Version management

Use SemVer. Take advantage of versioneer to help you manage it.

Release

A full run of “tox” should leave in its wake tested .zip and .whl files. A successful, post-tag run of tox, combined with versioneer, should leave behind tested .zip and .whl. The release script could be as simple as “tox && git tag $1 && (tox || (git tag -d $1;exit 1) && cp …whl and zip locations… dist/”

GPG sign dist/ files, and then use “twine” to upload them to PyPI. Make sure to upload to TestPyPI first, and verify the upload, before uploading to PyPI. Twine is a great tool, but badly documented — among other things, it is hard to find information about .pypirc. “.pypirc” is an ini file, which needs to have the following sections:

.gitignore

  • build — all your build artifacts will go here
  • dist — this is where “ready to release” output will be
  • *.egg?info — this is an artifact of sdist that is really hard to put elsewhere
  • *.pyc — ignore byte-code files
  • .coverage — coverage artifact

Python versions

If all your dependencies support Python 2 and 3, support Python 2 and 3. That will almost certainly require using “six” (or one of its competitors, like “future”). Run your unit tests under both Python 2 and 3. Make sure to run your unit tests under PyPy, as well.

C extensions

Avoid, if possible. Certainly do not use C extensions for performance improvements before (1) making sure they’re needed (2) making sure they’re helpful (3) trying other performance improvements. Ideally structure your C extensions to be optional, and fall back to a slow(er) Python implementation if they are unavailable. If they speed up something more general than your specific needs, consider breaking them out into a C-only project which your Python will depend on.

If using C extensions, regardless of whether to improve performance or integrate with 3rd party libraries, use CFFI.

If C extensions have successfully been avoided, and Python 3 compatibility kept, build universal wheels.

requirements.txt

The only good “requirements.txt” file is a non-existing one. The “setup.py” file should have the dependencies (ideally as weak-versioned as possible, usually just a “>=” for a library that tends not to break backwards compatibility a lot). Tox will maintain the virtualenv needed based on the things in the tox file, and if needing to test against specific versions, this is where specific versions belong. The “requirements.txt” file belongs in Salt-style (Chef, Fab, …) configurations, Docker-style (Vagrant-, etc.) configurations or Pants-style (Buck-, etc.) build scripts when building a Pex. This is the “deployment configuration”, and needs to be decided by a deployer.

If your package has dependencies, they belong in a setup.py. Use extended_dependencies for test-only dependencies. Lists of test dependencies, and reproducible tests, can be configured in tox.ini. Tox will take care of pip-installing relevant packages in a virtualenv when running the tests.

Thanks

Thanks to John A. Booth, Dan Callahan, Robert Collins, Jack Diedrich,  Steve Holden for their reviews and insightful comments. Any mistakes that remain are mine!

by moshez at June 18, 2015 02:31 AM

June 09, 2015

Glyph Lefkowitz

Sorry I Unfollowed You

Since Alex Gaynor wrote his seminal thinkpiece on the subject, “I Hope Twitter Goes Away”, I’ve been wrestling to define my relationship to this often problematic product.

On the one hand, Twitter has provided me with delightful interactions with human beings who I would not otherwise have had the opportunity to meet or interact with. If you are the sort of person who likes following people, four suggestions I’d make on that front are Melissa 🔔, Gary Bernhardt, Eevee and Matt Blaze, all of whom have blogs but none of whom I would have discovered without Twitter.

Twitter has also allowed me to reach a larger audience with my writing than I otherwise would have been able to. Lots of people click on links to this blog from Twitter either from following me directly or from a retweet. (Thank you, retweeters, one and all.)

On the other hand, the effect of using Twitter on my productivity is like having a constant, low-grade headache. While Twitter has never been a particularly bad distraction as measured by hours spent on it (I keep metrics on that, and it’s rarely even in the top 10), I feel like consulting Twitter is something I do when I am stuck, or having to think about something hard. “I’ll just check Twitter” is an easy way to “take a break” right at the moment that I ought to be thinking harder, eliminating distractions, mustering my will to focus.

This has been particularly stark for me as I’ve been trying to get some real writing done over the last couple of weeks and have been consistently drawing a blank. Given that I have a deadline coming up on Wednesday and another next Monday, something had to give.

Or, as Joss Whedon put it, when he quit Twitter:

If I’m going to start writing again, I have to go to the quiet place, and this is the least quiet place I’ve ever been in my life.

I’m an introvert, and using Twitter is more like being at a gigantic, awkward party all the time than any other online space I’ve ever been in.

There’s an irony here. Mostly what people like that I put on Twitter (and yes, I’ve checked) are announcements that link to other things, accomplishments in other areas, like a blog post, or a feature in Twisted, but using Twitter itself is inimical to completing those things.

I’m loath to abandon the positive aspects of Twitter. Some people also use Twitter as a replacement for RSS, and I don’t want to break the way they choose to pay attention to the stuff that I do. And a few of my friends communicate exclusively through direct messages.

The really “good” thing about Twitter is discovery. It enables you to discover people, content, and, eugh, “brands” that appeal to you. I have discovered things that I enjoy many times. The fundamental problem I am facing, which is a little bit hard to admit to oneself, is that I have discovered enough. I have enough games to play, enough books and articles to read, enough podcasts to listen to, enough movies to watch, enough code to write, enough open source libraries to investigate, that I will be busy for years based on what I already know.

For me, using Twitter’s timeline at this point to “discover” more things is like being at a delicious buffet, being so full I’m nauseous, and stuffing my pockets with shrimp “just in case” I’m hungry “when I get home” - and then, of course, not going home.

Even disregarding my desire to produce useful content, if I just want to enjoy consuming content more deeply, I have to take the time to engage with it properly.

So here’s what I’m doing:

  1. I am turning on the “anyone can direct message me” feature. We’ll see how that goes; I may have to turn it off again later. As always, I’d prefer you send email (or text me, if it’s time-critical).
  2. I am unfollowing literally everyone, and will not follow people in the future. Checking my timeline was the main information junk-food I want to avoid.
  3. Since my timeline, rather than mentions and replies, was my main source of distraction, I’ll continue paying attention to mentions and replies (at least for now; I’ll have to see if that becomes a problem in the absence of a timeline).
  4. In order to avoid producing such information junk-food myself, I’m going to try to directly tweet less, and put more things into brief blog posts so I have enough room to express them. I won’t say “not at all”, but most of the things that I put on Twitter would really be better as longer, more thoughtful articles.

Please note that there’s nothing prescriptive here. I’m outlining what I’m doing in the hopes that others might recognize similar problems with themselves - if everyone used Twitter this way, there would hardly be a point to the site.

Also, if I’ve unfollowed you, that doesn’t mean I’m not interested in what you have to say. I already have a way of keeping in touch with people’s more fully-formed ideas: I use Blogtrottr to deliver relevant blog articles to my email. If I previously followed you and you think I might not be reading your blog already (in most cases I believe I already am), please feel free to drop me a line with an RSS link.

by Glyph at June 09, 2015 12:41 AM

June 07, 2015

Twisted Matrix Laboratories

Twisted Fellowship 2015: Call for proposals

On behalf of the Software Freedom Conservancy and the Twisted project I'm happy to announce that we're looking for a paid maintainer for the Twisted project.

Funding a software developer to work as a maintainer will help Twisted grow as a project, and enable Twisted's development community to increase their output of innovative code for the public's benefit. Twisted has strict coding, testing, documentation, and review standards, which ensures excellent code quality, continually improving documentation and code test coverage, and minimal regressions. Code reviews are historically a bottleneck for getting new code merged. A funded maintainer will help alleviate this bottleneck, and speed Twisted's development.

You can read more about the 2015 fellowship at https://twistedmatrix.com/trac/wiki/Fellowship2015

by Itamar Turner-Trauring (noreply@blogger.com) at June 07, 2015 11:12 PM

June 06, 2015

Moshe Zadka

(Somewhat confused) Thoughts on Languages, Eco-systems and Development

There are a few interesting new languages that would be fun to play with: Rust, Julia, LFE, Go and D all have interesting takes on some domain. But ultimately, a language is only as interesting as the things it can do. It used to be that “things it can do” referred to the standard library. I am old enough to remember when “batteries included” was one of the most interesting benefits Python had — the language had dared doing such avant-garde things in ’99 as having a *built-in* url fetching library. That behaved, pretty much, the same on all platform. Out of the box. (It’s hard to convey how amazing this was in ’99.)

This is no longer the case. Now what distinguishes Python as “can do lots of things” is its built-in package management. Python, pip and virtualenv together give the ability to work on multiple Python projects, that need different versions of libraries, without any interference. They are, to a first approximation, 100% reliable. With pip 7.0 supporting caching wheels, virtualenvs are even more disposable (I am starting to think pip uninstall is completely unnecessary). In fact, except for virtualenv itself, it is rare nowadays to install any Python module globally. There are some rusty corners, of course:

  • Uploading packages is…non-trivial. Any time the default tool is so bad that the official docs say “use this 3rd party library instead”, something is wrong.
  • The distinction between “PyPI name” and “Python package name”, while theoretically reasonable, is just annoying and hard to explain
  • (Related) There is no official PyPI-name-conflict-resolution procedure.

I’ll note npm, for example, clearly does better on these three (while doing worse on some things that Python does better). Of the languages mentioned above, it is nice to see that most have out-of-the-box built-in tools for ecosystem creation. Julia’s hilariously piggybacks on GitHub’s. Go’s…well…uses URLs as the equivalent PyPI-level-names. Rust has a pretty decent system in cargo and crates.io (the .toml file is kind of weird, but I’ve seen worse). While there is justifiable excitement about Rust’s approach to memory and thread-safety, it might be that it will win in the end based on having a better internal package management system than the low-level alternatives.

Note: OS package mgmt (yum, apt-get, brew and whatever Windows uses nowadays) is not a good fit for what developers need from a language-level package manager. Locality, quick package refreshes — these matter more to developers than to OS end-users.

by moshez at June 06, 2015 07:55 AM

June 05, 2015

Duncan McGreggor

Scientific Computing and the Joy of Language Interop

The scientific computing platform for Erlang/LFE has just been announced on the LFE blog. Though written in the Erlang Lisp syntax of LFE, it's fully usable from pure Erlang. It wraps the new py library for Erlang/LFE, as well as the ErlPort project. More importantly, though, it wraps Python 3 libs (e.g., math, cmath, statistics, and more to come) and the ever-eminent NumPy and SciPy projects (those are in-progress, with matplotlib and others to follow).

(That LFE blog post is actually a tutorial on how to use lsci for performing polynomial curve-fitting and linear regression, adapted from the previous post on Hy doing the same.)

With the release of lsci, one can now start to easily and efficiently perform computationally intensive calculations in Erlang/LFE (and any other Erlang Core-compatible language, e.g., Elixir, Joxa, etc.) That's super-cool, but it's not quite the point ...

While working on lsci, I found myself experiencing a great deal of joy. It wasn't just the fact that supervision trees in a programming language are insanely great. Nor just the fact that scientific computing in Python is one of the best in any language. It wasn't only being able to use two syntaxes that I love (LFE and Python) cohesively, in the same project. And it wasn't the sum of these either -- you probably see where I'm going with this ;-) The joy of these and many other fantastic aspects of inter-operation between multiple powerful computing systems is truly greater than the sum of its parts.

I've done a bunch of Julia lately and am a huge fan of this language as well. One of the things that Julia provides is explicit interop with Python. Julia is targeted at the world of scientific computing, aiming to be a compelling alternative to Fortran (hurray!), so their recognition of the enormous contribution the Python scientific computing community has made to the industry is quite wonderful to see.

A year or so ago I did some work with Clojure and LFE using Erlang's JInterface. Around the same time I was using LFE on top of  Erjang, calling directly into Java without JInterface. This is the same sort of Joy that users of Jython have, and there are many more examples of languages and tools working to take advantage of the massive resources available in the computing community.

Obviously, language inter-op is not new. Various FFIs have existed for quite some time (I'm a big fan of the Common Lisp CFFI), but what is new (relatively, that is ... as I age, anything in the past 10 years is new) is that we are seeing this not just for programs reaching down into C/C++, but reaching across, to other higher-level languages, taking advantage of their great achievements -- without having to reinvent so many wheels.

When this level of cooperation, credit, etc., is done in the spirit of openness, peer-review, code-reuse, and standing on the shoulders of giants (or enough people to make giants!), we get joy. Beautiful, wonderful coding joy.

And it's so much greater than the sum of the parts :-)


by Duncan McGreggor (noreply@blogger.com) at June 05, 2015 02:53 PM

May 24, 2015

Twisted Matrix Laboratories

Twisted 15.2.1 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.2.1.

This is a bugfix release for the 15.2 series that fixes a regression in the new logging framework.

You can find the downloads on PyPI (or alternatively on the Twisted Matrix Labs website).

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
HawkOwl

by HawkOwl (noreply@blogger.com) at May 24, 2015 12:22 PM

May 23, 2015

Moshe Zadka

Unicode, UTF-8 and you

Unicode is not a panacea. Some people’s names can’t even be written in unicode. However, as far as universal encodings go, it is the best we have got — warts and all. It is the only reasonable way to represent text inside programs, except for very very specialized needs (no, you don’t qualify).

Now, programs are made of libraries, and often there are several layers of abstraction between the library and the program. Sometimes, some weird abstraction layer in the middle will make it hard to convey user configuration into the library’s guts. Code should figure things out itself, most of the time.

So, there are several ways to make dealing with unicode not-horrible.

Unicode internally

I’ve already mentioned it, but it bears repeating. Internal representation should use the language’s built-in type (str in Python 3, String in Java, unicode in Python 2). All formatting, templating, etc. should be, internally, represented as taking unicode parameters and returning unicode results.

Standards

Obviously, when interacting with an external protocol that allows the other side to specify encoding, follow the encoding it specifies. Your program should support, at least, UTF-8, UTF-16, UTF-32 and Latin-1 through Latin-9. When choosing output encoding, choose UTF-8 by default. If there is some way for the user to specify an encoding, allow choosing between that and UTF-16. Anything else should be under “Advanced…” or, possibly, not at all.

Non-standards

When reading input that is not marked with an encoding, attempt to decode as UTF-8, then as UTF-16 (most UTF-16 decoders will auto-detect endianity, but it is pretty easy to hand-hack if people put in the BOM. UTF-8/16 are unlikely to have false positives, so if either succeeds, it’s likely correct. Otherwise, as-ASCII-and-ignore-high-order is often the best that can be done. If it is reasonable, allow user-intervention in specifying the encoding.

When writing output, the default should be UTF-8. If it is non-trivial to allow user specification of the encoding, that is fine. If it is possible, UTF-16 should be offered (and BOM should be prepended to start-of-output). Other encodings are not recommended if there is no way to specify them: the reader will have to guess correctly. At the least, giving the user such options should be hidden behind an “Advanced…” option.

The most popular I/O that does not have explicit encoding, or any way to specify one, is file names on UNIX systems. UTF-8 should be assumed, and reasonably recovered from when it proves false. No other encoding is reasonable (UTF-16 is uniquely unsuitable since UNIX filenames cannot have NULs, and other encodings cannot encode some characters).

by moshez at May 23, 2015 04:04 PM