Planet Twisted

September 04, 2015

Twisted Matrix Laboratories

Twisted 15.4 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.4, codenamed "Trial By Fire".

Twisted is continuing to forge ahead, with the highlights of this release being:
  • twisted.positioning, the rest of twisted.internet.endpoints, KQueueReactor (for real this time), and twisted.web.proxy/guard have all been ported to Python 3.
  • Trial has been ported to Python 3! This was made possible by a Python Software Foundation grant.
  • Twisted officially supports several more platforms: Py3.4 on FreeBSD, Py2.7/Py3.4 on Fedora 21/22, and Py2.7 on RHEL7.
  • Python 2.6 is no longer supported, and support for Debian 6, and RHEL6 has been removed because of this.
  • Support for the EOL'd platforms of Fedora 17/18/19 has been removed.
  • Twisted has moved to requiring setuptools for installation.
  • twisted.python.failure.Failure's __repr__ now includes the  exception message.
  • 19 tickets in total closed!
You can find the downloads at https://pypi.python.org/pypi/Twisted (or alternatively http://twistedmatrix.com/trac/wiki/Downloads). The NEWS file can be found at https://github.com/twisted/twisted/blob/trunk/NEWS.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber "Hawkie" Brown

by HawkOwl (noreply@blogger.com) at September 04, 2015 08:02 AM

August 12, 2015

Moshe Zadka

Kushiel’s Legacy (the Phedre Trilogy) — Sort of a book review

Below spoilers for the Kushiel’s Legacy books (first trilogy). If you have not read them, I highly recommend reading before looking below — they are pretty awesome. Technically, there are also spoilers for Lord of the Rings, though if you have not read it by now, I don’t know what else to do.

I want to start by talking about Lord of the Rings. As everyone knows, it tells the story of the failed attempt of the dwarves to restore their ancient homeland in a retrospective. I think they had a side-plot about a couple of short guys throwing a ring into a mountain or something? Anyway, back to the main point of LotR: the dwarves. The story of a people who everyone thinks of as just being greedy, master of goldsmithing and gem cutting, but with the desire to return to a homeland — but who wake an ancient evil upon returning there. Though Tolkien maintained he was not into allegories, this one is a pretty thin facade for talking about the struggles of the Jewish people. The reason for this long introduction to LotR is to explain how I read books: parochially, maybe, if they have Jews in them, the Jewish plot lines stir my heart much more than the story the author tried to write.

Well, I have to say, Jacqueline Carey knows how to write Jews. Admittedly, it took me until the end of Kushiel’s Dart to figure out that the Yeshuites were not supposed to be Jesuits. They are really an interesting take on Jews for Jesus. In a world that lacked Christianity, since Elua and his followers took on the role of taking down the Roman (“Tiberian”) empire, there are no evangelical religions. The Jews end up accepting Jesus as the “Mashiach” (not “Christ”, since the Romans aren’t there to take over the terms), and keep being…pretty Jewish. They are discriminated against in a heart-touching manner in Venice (ok, “La Serenissima”) and, like every good Jews, have fierce arguments about restoring their homeland. Near as I can tell, judging from estimated geography, they decide to make their homeland in Switzerland? Or maybe Lichtenstein? Belgium? Netherlands? In any case, the more devout, as always, pray for salvation while the younger and more practically minded take up the sword, and head up north to make a place where they will be free from discrimination.

All the while, the Jews in the Kushielverse keep studying the Talmud, and the more mystical minded of them take up something suspiciously like Qabbalah and seek the true name of God. They speak “Habiru” among each other, and have stories of the lost tribes. That’s, of course, the most awesome parts — the ten tribes have a “Judaism without Jesus” (what we would call, well, Judaism) since they have not heard of him having been sequestered in…Uganda. I, well, I certainly appreciate the subtle humor there.

[Please note that as far as I know, the author really intended the Yeshuites to add color to the religions in the Kushielverse, and actually fairly little of the plotline involves them — it is really the story of a masochist prostitute in a country where not being polyamorous is a blasphemy…]

by moshez at August 12, 2015 03:16 AM

August 04, 2015

Twisted Matrix Laboratories

Twisted 15.3 Released

On behalf of Twisted Matrix Labs, I am honoured to announce the release of Twisted 15.3.

We're marching confidently into the future, and this release demonstrates this:
  • 10+ modules ported to Python 3 (see NEWS for specifics)
  • Twisted Lore has been removed.
  • Twisted no longer releases as 'subprojects' -- there is only one Twisted, long may it reign.
  • twistd now uses cProfile (instead of Hotshot) as the default profiler.
  • 40+ more tickets closed overall.
You can find the downloads at https://pypi.python.org/pypi/Twisted (or alternatively http://twistedmatrix.com/trac/wiki/Downloads). The NEWS file can be found at https://github.com/twisted/twisted/blob/trunk/NEWS.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,

Amber "Hawkie" Brown

by HawkOwl (noreply@blogger.com) at August 04, 2015 07:07 AM

July 17, 2015

Glyph Lefkowitz

Sometimes, You Just Want A Button

I like making computers do stuff. I find that getting a computer to do what I want it to produces a tremendously empowering feeling. I think Python is a great language to use to tell computers what to do.

In my experience, the most empowering feeling (albeit often, admittedly, not the most useful or practical one) is making one’s own computer do something cool. But due to various historical accidents, the practical bent of the Python community mainly means that we get “server-side” or “cloud” computers to do things: in other words, “other people’s computers”.

If you, like me, are a metric standard tech industry hipster, “your own computer” means “Mac OS X”. Many of us have written little command-line tools to automate things, and thank goodness OS X is enough of a UNIX that that works nicely - and it’s often good enough. But sometimes, you just want to press a button and have a thing happen; sometimes a character grid isn’t an expressive enough output device. Sometimes you want to be able to take that same logical core, or some useful open source code, that you would use in the cloud, and instead, put it into an app. Sometimes you want to be able to learn about writing desktop apps while leveraging your existing portfolio of skills. I have done some of this, and intend to do more of it at work, and so I’d like to share with you some things I’ve learned about how to do it.

Back when I was a Linux user, this was a fairly simple proposition. All the world was GTK+ back then, in the blissful interlude between the ancient inconsistent mess when everything was Xt or Motif or Tk or Swing or XForms, and the modern inconsistent mess where everything is Qt or WxWidgets or Swing or some WebKit container with its own pile of gross stylesheet fruit salad.

If you had a little script that you wanted to put a GUI on back then, the process for getting started was apt-get install python-gtk and then just to do something like

1
2
3
4
5
import gtk
w = gtk.Window("My Window")
b = gtk.Button("My Button")
w.add(b)
gtk.main()

and you were pretty much off to the races. If you wanted to, you could load a UI file that you made with glade, if you had some complicated fancy stuff to display, but that was optional and reasonably straightforward to use. I used to do that all the time, for various personal computing tasks. Of course, if I did that today, I’m sure they’d all laugh at me.

So today I’d like to show you how to do the same sort of thing with OS X.

To be clear, this is not a tutorial in Objective C, or a deep dive into Mac application programming. I’m going to make some assumptions about your skills: you’re a relatively experienced Python programmer, you’ve done some introductory Xcode material like the Temperature Converter tutorial, and you either already know or don’t mind figuring out the PyObjC bridged method naming conventions on your own.

If you are starting from scratch, and you don’t have any code yet, and you want to work in Objective C or Swift, the modern Mac development experience is even smoother than what I just described. You don’t need to install any “packages”, just Xcode, and you have a running app before you’ve written a single line of code, because it will give you a working-by-default template.

The problem is, if you’re a Python developer just trying to make a little utility for your own use, once you’ve got this lovely Objective C or Swift application ready for you to populate with interesting stuff, it’s a really confusing challenge to get your Python code in there. You can drag the Python.framework from homebrew into Xcode and then start trying to call some C functions like PyRun_SimpleString to get your program bootstrapped, but then you start running into tons of weird build issues. How do you copy your scripts to the app bundle? How do you invoke distutils? What about shared libraries that you’ve linked against? It’s hard enough to try to jam anything like setuptools or pip into the Xcode build system that you might as well give up and rewrite the logic; it’ll be less effort.

If you’re a Python developer you probably expect tools like virtualenv and pip to “just work”. You expect to be able to put a file into a folder and then import it without necessarily writing a whole build toolchain first. And if you haven’t worked with it a lot, you probably find Xcode’s UI bewildering. Not to mention the fact that you probably already have a text editor that you like already, and don’t want to spend a bunch of time coming up to speed on a new one just to display a button.

Luckily, of course, there’s pyobjc, which lets you write your whole application in Python, skipping the whole Xcode / Objective C / Swift side-show and just add a little UI logic. The problem I’m trying to address here is that “just adding a little UI logic” (rather than designing your program from the ground up as an App) in the Cocoa / ObjC universe is famously and maddeningly obscure. gtk.Window() is a reasonably straightforward first function to call; [[NSWindow alloc] initWithContentRect:styleMask:backing:defer:] not so much.

This is not even to mention the fact that all the documentation, all the tutorials, and all the community resources pretty much expect you to be working with .xibs or storyboards inside the Interface Builder portion of Xcode, and you’re really on your own if you are trying to do everything outside of that. py2app can help with some of the build issues, but it has its own problems you probably don’t want to be tackling if you’re just getting started; it’ll sap all your energy for actually coding.

I should mention before we finally dive in here that if you really just want to display a button, a text area, maybe a few fields, Toga is probably a better option for you than the route that I'm describing in this post; it's certainly a lot less work. What I am assuming here is that you want to be able to present a button at first, but gradually go on to mess around with arbitrary other OS X native APIs to experiment with the things you can do with a desktop that you can't do in the cloud, like displaying local notifications, tracking your location, recording the screen, controlling other applications, and so on.

So, what is an enterprising developer - who is still lazy enough for my rhetorical purposes here - to do? Surprisingly enough, I have a suggestion!

First, you want to make a new, empty Xcode project. So fire up Xcode, go to File, New, Project, and then:

Just-A-Button-1

Just-A-Button-2

Go ahead and give it a Git repository:

Just-A-Button-3

Now that you’ve got a shiny new blank project, you’ll need to create two resources in it: one, a user interface document, and the other, a Python file. So select File, New, and then choose OS X, User Interface, Empty:

Just-A-Button-4

I’m going to call this “MyUI”, but you can call it whatever:

Just-A-Button-5

As you can see, this creates a MyUI.xib file in your project, with nothing in it.

Just-A-Button-6

We want to put a window and a button into it, so, let’s start with that. Search for “window” in the bottom right, and drag the “Window” item onto the canvas in the middle:

Just-A-Button-7

Just-A-Button-8

Now we’ve got a UI file which contains a window. How can we display it? Normally, Xcode sets all this up for you, creating an application bundle, a build step to compile the .xib into a .nib, to copy it into the appropriate location, and code to load it for you. But, as we’ve discussed above, we’re too lazy for all that. Instead, we’re going to create a Python script that compiles the .xib automatically and then loads it.

You can do this with your favorite text editor. The relevant program looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

from Foundation import NSData
from AppKit import NSNib

nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(None, None))

from PyObjCTools.AppHelper import runEventLoop
runEventLoop()

Breaking this down one step at a time, what it’s doing is:

1
2
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

We run Interface Builder Tool, or ibtool, to convert the xib, which is a version-control friendly, XML document that Interface Builder can load, into a nib, which is a binary blob that AppKit can load at runtime. We don’t want to add a manual build step, so for now we can just have this script do its own building as soon as it runs.

Next, we need to load the data we just compiled:

1
nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")

This needs to be loaded into an NSData because that’s how AppKit itself wants it prepared.

Finally, it’s time to load up that window we just drew:

1
2
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(None, None))

This loads an NSNib (the “ib” in its name also refers to “Interface Builder”) with the init... method, and then creates all the objects inside it - in this case, just our Window object - with the instantiate... method. (We don’t care about the bundle to use, or the owner of the file, or the top level objects in the file yet, so we are just leaving those all as None intentionally.)

Finally, runEventLoop() just runs the event loop, allowing things to be displayed.

Now, in a terminal, you can run this program by creating a virtualenv, and doing pip install pyobjc-framework-Cocoa, and then python button.py. You should see your window pop up - although it will not take focus. Congratulations, you’ve made a window pop up!

One minor annoyance: you’re probably used to interrupting programs with ^C on the command line. In this case, the PyObjC helpers catch that signal, so instead you will need to use ^\ to hard-kill it until we can hook up some kind of “quit” functionality. This may cause crash dialogs to pop up if you don’t use virtualenv; you can just ignore them.

Of course, now that we’ve got a window, we probably want to do something with it, and this is where things get tricky. First, let’s just create a button; drag the button onto your window, and save the .xib:

Just-A-Button-9

And now, the moment of truth: how do we make clicking that button do anything? If you’ve ever done any Xcode tutorials or written ObjC code, you know that this is where things get tricky: you need to control-drag an action to a selector. It figures out which selectors are available by magically knowing things about your code, and you can’t just click on a thing in the interface-creation component and say “trust me, this method exists”. Luckily, although the level of documentation for it these days makes it on par with an easter egg, Xcode has support for doing this with Python classes just as it does with Objective C or Swift classes.1

First though, we’ll need to tell Xcode our Python file exists. Go to the “File” menu, and “Add files to...”, and then select your Python file:

Just-A-Button-10

Here’s the best part: you don’t need to use Xcode’s editor at all; Xcode will watch that file for changes. So keep using your favorite text editor and change button.py to look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import os
os.system("ibtool MyUI.xib --compile MyUI.nib")

from objc import IBAction
from Foundation import NSObject, NSData
from AppKit import NSNib

class Clicker(NSObject):
    @IBAction
    def clickMe_(self, sender):
        print("Clicked!")

the_clicker = Clicker.alloc().init()

nib_data = NSData.dataWithContentsOfFile_(u"MyUI.nib")
(NSNib.alloc().initWithNibData_bundle_(nib_data, None)
 .instantiateWithOwner_topLevelObjects_(the_clicker, None))

from PyObjCTools.AppHelper import runEventLoop
runEventLoop()

In other words, add a Clicker subclass of NSObject, give it a clickMe_ method decorated by objc.IBAction, taking one argument, and then make it do something you can see, like print something. Then, make a global instance of it, and pass it as the owner parameter to NSNib.

At this point it would probably be good to explain a little about what the “file’s owner” is and how loading nibs works.

When you instantiate a Nib in AppKit, you are creating a collection of graphical objects, connected to an object in your program that you construct. In other words, if you had a program that displayed a Person, you’d have a Person.nib and a Person class, and each time you wanted to show a Person you’d instantiate the Nib again with the new Person as the owner of that Nib. In the interface builder, this is represented by the “file’s owner” placeholder.

I am explaining this because if you’re interested in reading this article, you’ve probably been interested enough in Mac programming to do something like the aforementioned Temperature Converter tutorial, but such tutorials almost universally just use the single default “main” nib that gets loaded when the application launches, so although they show you how to do many different things with UI elements, it’s not clear how the elements got there. This, here, is how you make new UI elements get there in the first place.

Back to our clicker example: now that we have a class with a method, we need to tell Xcode that the clicking the button should call that method. So what we’re going to tell Xcode is that we expect this Nib to be owned by an instance of Clicker. To do this, go to MyUI.xib and select the “File’s Owner” (the thing that looks like a transparent cube), to to the “Identity Inspector” (the tiny icon that looks like a driver’s license on the right) and type “Clicker” in the “Class” field at the top.

If you’ve properly added button.py to your project and declared that class (as an NSObject), it should automatically complete as you start typing the name:

Just-A-Button-11

Now you need to connect your clickMe action to the button you already created. If you’ve properly declared your method as an IBAction, you should see it in the list of “received actions” in the “Connections Inspector” of the File’s Owner (the tiny icon that looks like a right-pointing arrow in a circle):

Just-A-Button-12

Drag from the circle to the right of the clickMe: method there to the button you’ve created, and you should see the connection get formed:

Just-A-Button-13

If you save your xib at this point and re-run your python file, you should be able to click on the button and see something happen.

Finally, we want to be able to not just get inputs from the GUI, but also produce outputs. To do this, we want to populate an outlet on our Clicker class with a pointer to an object in the Nib. We can do this by declaring a variable as an objc.IBOutlet(); simply add a from objc import IBOutlet, and change Clicker to read:

1
2
3
4
5
class Clicker(NSObject):
    label = IBOutlet()
    @IBAction
    def clickMe_(self, sender):
        self.label.setStringValue_(u"\N{CHECK MARK}")

In case you’re wondering where setStringValue_ comes from, it’s a method on NSTextField, since labels are NSTextFields.

Then we can place a label into our xib; and we can see it is in fact an NSTextField in the Identity Inspector:

Just-A-Button-14

I’ve pre-filled mine out with a unicode “BALLOT X” character, for style points.

Then, we just need to make sure that the label attribute of Clicker actually points at this value; once again, select the File’s Owner, the Connections Inspector, and (if you declared your IBOutlet correctly), you should see a new “label” outlet. Drag the little circle to the right of that outlet, to the newly-created label object, and you should see the linkage in the same way the action was linked:

Just-A-Button-15

And there you have it! Now you have an application that can open a window, take input, and display output.

This is, as should be clear by now, not really the preferred way of making an application for OS X. There’s no app bundle, so there’s nothing to code-sign. You’ll get weird behavior in certain cases; for example, as you’ve probably already noticed, the window doesn’t come to the front when you launch the app like you might expect it to. But, if you’re a Python programmer, this should provide you with a quick scratch pad where you can test ideas, learn about how interface builder works, and glue a UI to existing Python code quickly before wrestling with complex integration and distribution issues.

Speaking of interfacing with existing Python code, of course you wouldn’t come to this blog and expect to get technical content without just a little bit of Twisted in it. So here’s how you hook up Twisted to an OS X GUI: instead of runEventLoop, you need to run your application like this:

1
2
3
4
5
6
7
# Before importing anything else...
from PyObjCTools.AppHelper import runEventLoop
from twisted.internet.cfreactor import install
reactor = install(runner=runEventLoop)

# ... then later, when you want to run it ...
reactor.run()

In your virtualenv, you’ll want to pip install twisted[osx_platform] to get all the goodies for OS X, including the GUI integration. Since all Twisted callbacks run on the main UI thread, you don’t need to know anything special to do stuff; you can call methods on whatever UI objects you have handy to make changes to them.

Finally, although I definitely don’t have room in this post to talk about all the edge cases here, I will address two particularly annoying ones; often if you’re writing a little app like this, you really want it to take keyboard focus, and by default this window will come up in the background. To fix that, you can do this right before starting the main loop:

1
2
3
4
from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
app = NSApplication.sharedApplication()
app.setActivationPolicy_(NSApplicationActivationPolicyAccessory)
app.activateIgnoringOtherApps_(True)

And also, closing the window won’t quit the application, which might be pretty annoying if you want to get back to using your terminal, so a quick fix for that is:

1
2
3
4
class QuitWhenClosed(NSObject):
    def applicationShouldTerminateAfterLastWindowClosed_(self, app):
        return True
app.setDelegate_(QuitWhenClosed.alloc().init().retain())

(the “retain” is necessary because ObjC is not a garbage collected language, and app.delegate is a weak reference, so the QuitWhenClosed would be immediately freed (and there would be a later crash) if you didn’t hold it in a global variable or call retain() on it.)

You might have other problems with this technique, and this post definitely can’t fit solutions to all of them, but now that you can load a nib, create classes that interface builder can understand, and run a Twisted main loop, you should be able to use the instructions in other tutorials and resources relatively straightforwardly.

Happy hacking!


(Thanks to everyone who helped with this post. Of course my errors are entirely my own, but thanks especially to Ronald Oussoren for his tireless work on pyObjC and py2app over the years, as well as to Mark Eichin and Amber Brown for some proofreading and feedback before this was posted.)


  1. If it didn’t, we could get Xcode to produce the appropriate XML by simply writing appropriately-shaped classes in Objective C - or possibly Swift - and then never bothering to build them, since all Xcode cares about in this part of the process is that it can see the source. My understanding is that’s how it worked when PyObjC was first developed, and it might always become necessary again if Xcode ever stops supporting Python. I have no idea if that’s likely, but unfortunately it seems clear Python isn’t a very popular language for mac apps. 

by Glyph at July 17, 2015 07:52 AM

July 10, 2015

Duncan McGreggor

Mastering matplotlib: Acknowledgments

The Book

Well, after nine months of hard work, the book is finally out! It's available both on Packt's site and Amazon.com. Getting up early every morning to write takes a lot of discipline, it takes even more to say "no" to enticing rabbit holes or herds of Yak with luxurious coats ripe for shaving ... (truth be told, I still did a bit of that).

The team I worked with at Packt was just amazing. Highly professional and deeply supportive, they were a complete pleasure with which to collaborate. It was the best experience I could have hoped for. Thanks, guys!

The technical reviewers for the book were just fantastic. I've stated elsewhere that my one regret was that the process with the reviewers did not have a tighter feedback loop. I would have really enjoyed collaborating with them from the beginning so that some of their really good ideas could have been integrated into the book. Regardless, their feedback as I got it later in the process helped make this book more approachable by readers, more consistent, and more accurate. The reviewers have bios at the beginning of the book -- read them, and look them up! These folks are all amazing!

The one thing that slipped in the final crunch was the acknowledgements, and I hope to make up for that here, as well as through various emails to everyone who provided their support, either directly or indirectly.

Acknowledgments

The first two folks I reached out to when starting the book were both physics professors who had published very nice matplotlib problems -- one set for undergraduate students and another from work at the National Radio Astronomy Observatory. I asked for their permission to adapt these problems to the API chapter, and they graciously granted it. What followed were some very nice conversations about matplotlib, programming, physics, education, and publishing. Thanks to Professor Alan DeWeerd, University of Redlands and Professor Jonathan W. Keohane, Hampden Sydney College. Note that Dr. Keohane has a book coming out in the fall from Yale University Press entitled Classical Electrodynamics -- it will contain examples in matplotlib.

Other examples adapted for use in the API chapter included one by Professor David Bailey, University of Toronto. Though his example didn't make it into the book, it gets full coverage in the Chapter 3 IPython notebook.

For one of the EM examples I needed to derive a particular equation for an electromagnetic field in two wires traveling in opposite directions. It's been nearly 20 years since my post-Army college physics, so I was very grateful for the existence and excellence of SymPy which enabled me to check my work with its symbolic computations. A special thanks to the SymPy creators and maintainers.

Please note that if there are errors in the equations, they are my fault! Not that of the esteemed professors or of SymPy :-)

Many of the examples throughout the book were derived from work done by the matplotlib and Seaborn contributors. The work they have done on the documentation in the past 10 years has been amazing -- the community is truly lucky to have such resources at their fingertips.

In particular, Benjamin Root is an astounding community supporter on the matplotlib mail list, helping users of every level with all of their needs. Benjamin and I had several very nice email exchanges during the writing of this book, and he provided some excellent pointers, as he was finishing his own title for Packt: Interactive Applications Using Matplotlib. It was geophysicist and matplotlib savant Joe Kington who originally put us in touch, and I'd like to thank Joe -- on everyone's behalf -- for his amazing answers to matplotlib and related questions on StackOverflow. Joe inspired many changes and adjustments in the sample code for this book. In fact, I had originally intended to feature his work in the chapter on advanced customization (but ran out of space), since Joe has one of the best examples out there for matplotlib transforms. If you don't believe me, check out his work on stereonets. There are many of us who hope that Joe will be authoring his own matplotlib book in the future ...

Olga Botvinnik, a contributor to Seaborn and PhD candidate at UC San Diego (and BioEng/Math double major at MIT), provided fantastic support for my Seaborn questions. Her knowledge, skills, and spirit of open source will help build the community around Seaborn in the years to come. Thanks, Olga!

While on the topic of matplotlib contributors, I'd like to give a special thanks to John Hunter for his inspiration, hard work, and passionate contributions which made matplotlib a reality. My deepest condolences to his family and friends for their tremendous loss.

Quite possibly the tool that had the single-greatest impact on the authoring of this book was IPython and its notebook feature. This brought back all the best memories from using Mathematica in school. Combined with the Python programming language, I can't imagine a better platform for collaborating on math-related problems or producing teaching materials for the same. These compliments are not limited to the user experience, either: the new architecture using ZeroMQ is a work of art. Nicely done, IPython community! The IPython notebook index for the book is available in the book's Github org here.

In Chapters 7 and 8 I encountered a bit of a crisis when trying to work with Python 3 in cloud environments. What was almost a disaster ended up being rescued by the work that Barry Warsaw and the rest of the Ubuntu team did in Ubuntu 15.04, getting Python 3.4.2 into the release and available on Amazon EC2. You guys saved my bacon!

Chapter 7's fictional case study examining the Landsat 8 data for part of Greenland was based on one of Milos Miljkovic's tutorials from PyData 2014, "Analyzing Satellite Images With Python Scientific Stack". I hope readers have just as much fun working with satellite data as I did. Huge thanks to NASA, USGS, the Landsat 8 teams, and the EROS facility in Sioux Falls, SD.

My favourite section in Chapter 8 was the one on HDF5. This was greatly inspired by Yves Hilpisch's presentation "Out-of-Memory Data Analytics with Python". Many thanks to Yves for putting that together and sharing with the world. We should all be doing more with HDF5.

Finally, and this almost goes without saying, the work that the Python community has done to create Python 3 has been just phenomenal. Guido's vision for the evolution of the language, combined with the efforts of the community, have made something great. I had more fun working on Python 3 than I have had in many years.

by Duncan McGreggor (noreply@blogger.com) at July 10, 2015 09:35 PM

June 27, 2015

Moshe Zadka

ncolony 0.0.2 released

Fixed:

Mostly internal cleanup release: running scripts is now nicer (all through “python -m ncolony <command>”), added Code of Conduct, releasing based on versioneer, cleaned up tox.ini, added HTTP healthchecker.

Available via PyPI and GitHub!

by moshez at June 27, 2015 04:38 AM

June 26, 2015

Moshe Zadka

mainland: the main of your Python

I don’t like console-scripts. Among what I dislike is their magicality and the actual code produced. I mean, the autogenerated Python looks like:

#!/home/moshez/src/mainland/build/toxenv/bin/python2

# -*- coding: utf-8 -*-
import re
import sys

from tox import cmdline

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(cmdline())

Notice the encoding header for an ASCII only file, and the weird windows compatibility maybe generated along with a unix-style header that also contains the name of the path? Truly, a file that only its generator could love.

Then again, I don’t much like what we have in Twisted, with the preamble:

#!/home/moshez/src/ncolony/build/env/bin/python2
# Copyright (c) Twisted Matrix Laboratories.
# See LICENSE for details.
import os, sys

try:
    import _preamble
except ImportError:
    sys.exc_clear()

sys.path.insert(0, os.path.abspath(os.getcwd()))

from twisted.scripts.twistd import run
run()

This time, it’s not auto-generated, it is artisanal. What it lacks in codegen ugliness it makes up in importing an under-module at top-level, and who doesn’t like a little bit of sys.path games?

I will note that furthermore, neither of these options would play nice with something like pex. Worst of all, all of these options, regardless of their internal beauty (or horrible lack thereof), must be named veeeeery caaaaarefully to avoid collision with existing UNIX, Mac or Windows command. Think this is easy? My Ubuntu has two commands called “chromium-browser” and “chromium-bsu” because Google didn’t check what popular games there are on Ubuntu before naming their browser.

Enter the best thing about Python, “-m” which allows executing modules. What “-m” gains in awesomeness, it loses in the horribleness of the two-named-module, where a module is executed twice, once resulting in sys.modules[‘__main__’] and once in sys.modules[real_name], with hilarious consequences for class hierarchies, exceptions and other things relying on identity of defined things.

Luckily, packages will automatically execute “package.__main__” if “python -m package” is called. Ideally, nobody would try to import __main__, but this means packages can contain only one command. Enter ‘mainland‘, which defines a main which will, carefully and accurately, dispatch to the right module’s main without any code generation. It has not been released yet, but is available from GitHub. Pull requests and issues are happily accepted!

Edit: Released! 0.0.1 is up on PyPI and on GitHub

by moshez at June 26, 2015 03:13 AM

June 23, 2015

Moshe Zadka

On Codes of Conducts and “Protection”

Related: In Favor of Niceness, Community, and Civilization

I’ve seen elsewhere (thankfully, not my project, and no, I won’t link to it) that people want the code of conduct to “protect contributors from 3rd parties as well as 3rd parties from contributors. I would like to first suggest the idea that this is…I don’t even know. The US government has literal nuclear bombs, as well as aircraft carriers, and it cannot enforce its law everywhere. The idea that an underfunded OSS project can do literally anything to 3rd parties is ludicrous. The only enforcement mechanism any project can have is “you can’t contribute here” (which, by necessity, only applies to those who wanted to contribute in the first place.)

So why would a bunch of people take on a code of conduct that will only limit them?

Because, to quote the related article above, “the death cry of liberalism is not ‘death to the unbelievers’, it is ‘if you’re nice, you can join our cuddle pile.” Perhaps describing open source projects as “cuddle piles” is somewhat of an exaggeration, but the point remains. A code of conduct defines what “nice enough is”. The cuddle piles? Those conquer the world. WebKit is powering both Android and iPhone browsers, for example, making open source be, literally, the way people see the online world.

Adopting a code of conduct that says that we do not harass, we do not retaliate and we accept all people is a powerful statement. This is how we keep our own garden clear of the pests of prejudice, of hatred. Untended gardens end up wilderness. Well-tended gardens grow until we need to keep a fence around the wild and call it a “preserve”.

When I first saw the Rust code of conduct I thought “cool, but why does an open source project need a code of conduct”? Now I wonder how any open source project will survive without one. It will have to survive without me — in three years, I commit to not contribute to any project without a code of conduct, and I hope others will follow. Any project that does not have a CoC, in the interim, better behave as though it has one.

Twisted is working hard on adopting a code of conduct, and I will check one into NColony soon (a simpler one, appropriate to what is, so far, a one-person show).

by moshez at June 23, 2015 04:22 AM

June 18, 2015

Moshe Zadka

Everything but the code

(Or, “So you want to build a new open-source Python project”)

This is not going to be a “here is a list of options, but you do what is right for you” pandering post. This is going to be a “this is 2015, there are right ways to do things, and here they are” post. This is going to be an opinionated, in-your-face, THIS IS BEST post. That said, if you think that anything here does not represent the best practices in 2015, please do leave a comment. Also, if you have specific opinions on most of these, you’re already ahead of the curve — the main target audience are people new to creating open-source Python projects, who could use a clear guide.

This post will also emphasize the new project. It is not worth it, necessarily, to switch to these things in an existing project — certainly not as a “whole-sale, stop the world and let’s change” thing. But when starting a new project with zero legacy, there is no reason to do things wrong.

tl:dr; Use GitHub, put MIT license in “LICENSE”, README.rst with badges, use py.test and coverage, flake8 for static checking, tox to run tests, package with setuptools, document with sphinx on RTD, Travis CI/Coveralls for continuous integration, SemVer and versioneer for versioning, support Py2+Py3+PyPy, avoid C extensions, avoid requirements.txt.

Where

When publishing kids’ photos, you do it on Facebook, because that’s where everybody is. LinkedIn is where you connect with colleagues and lead your professional life. When publishing projects, put them on GitHub, for exactly the same reason. Have GitHub pull requests be the way contributors propose changes. (There is no need to use the “merge” button on GitHub — it is fine to merge changes via git and push. But the official “how do I submit a change” should be “pull request”, because that’s what people know).

License

Put the license in a file called “LICENSE” at the root. If you do not have a specific reason to choose otherwise, MIT is reasonably permissive and compatible. Otherwise, use something like the license chooser and remember the three most important rules:

  • Don’t create your own license
  • No, really, don’t create your own license
  • Don’t do it

At the end of the license file, you can have a list of the contributors. This is an easy place to credit them. It is a good idea to ask people who send in pull requests to add themselves to the contributor list in their first one (this allows them to spell their name and e-mail exactly the way they want to).

Note that if you use the GPL or LGPL, they will recommend putting it in a file called “COPYING”. Put it in “LICENSE” (the licenses explicitly allow it as an option, and it makes it easier for people to find the license if it always has the same name).

README

The GitHub default is README.md, but README.rst (restructured text) is perfectly supported via Sphinx, and is a better place to put Python-related documentation, because ReST plays better with Pythonic toolchains. It is highly encouraged to put badges on top of the document to link to CI status (usually Travis), ReadTheDocs and PyPI.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using the built-in unittest runner is not a good option — there is a reason the cottage industry of “test runner” evolved. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Distribute your tests alongside your code, by putting them under a subpackage called “tests” of the main package. This allows people who “pip install …” to run the tests, which means sending you bug reports is a lot easier.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox. It is not a bad idea to write a tox clause that builds and tests an installed wheel. This will require including all test code in the deployed package, which is a good idea.

Set tox to put all build artifacts in a build/ top-level directory.

Packaging

Have a setup.py file that uses setuptools. Tox will need it anyway to work correctly.

Structure

It is unlikely that you have a good reason to take more than one top-level name in the package namespace. Barring completely unavoidable name conflicts, your PyPI package name should be the same as your Python package name should be the same as your GitHub project. Your Python package should live at the top-level, not under “src/” or “py/”.

Documentation

Use sphinx for prose documentation. Put it in doc/ with a relevant conf.py. Use either pydoctor or sphinx autodoc for API docs. “Pydoctor” has the potential for nicer docs, sphinx is well integrated with ReadTheDocs. Configure ReadTheDocs to auto-build the documentation every check-in.

Continuous Integration

If you enjoy owning your own machines, or platform diversity in testing really matters, use buildbot. Otherwise, take advantage for free Travis CI and configure your project with a .travis.yml that breaks your tox tests into one test per Travis clause. Integrate with coveralls to have coverage monitored.

Version management

Use SemVer. Take advantage of versioneer to help you manage it.

Release

A full run of “tox” should leave in its wake tested .zip and .whl files. A successful, post-tag run of tox, combined with versioneer, should leave behind tested .zip and .whl. The release script could be as simple as “tox && git tag $1 && (tox || (git tag -d $1;exit 1) && cp …whl and zip locations… dist/”

GPG sign dist/ files, and then use “twine” to upload them to PyPI. Make sure to upload to TestPyPI first, and verify the upload, before uploading to PyPI. Twine is a great tool, but badly documented — among other things, it is hard to find information about .pypirc. “.pypirc” is an ini file, which needs to have the following sections:

.gitignore

  • build — all your build artifacts will go here
  • dist — this is where “ready to release” output will be
  • *.egg?info — this is an artifact of sdist that is really hard to put elsewhere
  • *.pyc — ignore byte-code files
  • .coverage — coverage artifact

Python versions

If all your dependencies support Python 2 and 3, support Python 2 and 3. That will almost certainly require using “six” (or one of its competitors, like “future”). Run your unit tests under both Python 2 and 3. Make sure to run your unit tests under PyPy, as well.

C extensions

Avoid, if possible. Certainly do not use C extensions for performance improvements before (1) making sure they’re needed (2) making sure they’re helpful (3) trying other performance improvements. Ideally structure your C extensions to be optional, and fall back to a slow(er) Python implementation if they are unavailable. If they speed up something more general than your specific needs, consider breaking them out into a C-only project which your Python will depend on.

If using C extensions, regardless of whether to improve performance or integrate with 3rd party libraries, use CFFI.

If C extensions have successfully been avoided, and Python 3 compatibility kept, build universal wheels.

requirements.txt

The only good “requirements.txt” file is a non-existing one. The “setup.py” file should have the dependencies (ideally as weak-versioned as possible, usually just a “>=” for a library that tends not to break backwards compatibility a lot). Tox will maintain the virtualenv needed based on the things in the tox file, and if needing to test against specific versions, this is where specific versions belong. The “requirements.txt” file belongs in Salt-style (Chef, Fab, …) configurations, Docker-style (Vagrant-, etc.) configurations or Pants-style (Buck-, etc.) build scripts when building a Pex. This is the “deployment configuration”, and needs to be decided by a deployer.

If your package has dependencies, they belong in a setup.py. Use extended_dependencies for test-only dependencies. Lists of test dependencies, and reproducible tests, can be configured in tox.ini. Tox will take care of pip-installing relevant packages in a virtualenv when running the tests.

Thanks

Thanks to John A. Booth, Dan Callahan, Robert Collins, Jack Diedrich,  Steve Holden for their reviews and insightful comments. Any mistakes that remain are mine!

by moshez at June 18, 2015 02:31 AM

June 09, 2015

Glyph Lefkowitz

Sorry I Unfollowed You

Since Alex Gaynor wrote his seminal thinkpiece on the subject, “I Hope Twitter Goes Away”, I’ve been wrestling to define my relationship to this often problematic product.

On the one hand, Twitter has provided me with delightful interactions with human beings who I would not otherwise have had the opportunity to meet or interact with. If you are the sort of person who likes following people, four suggestions I’d make on that front are Melissa 🔔, Gary Bernhardt, Eevee and Matt Blaze, all of whom have blogs but none of whom I would have discovered without Twitter.

Twitter has also allowed me to reach a larger audience with my writing than I otherwise would have been able to. Lots of people click on links to this blog from Twitter either from following me directly or from a retweet. (Thank you, retweeters, one and all.)

On the other hand, the effect of using Twitter on my productivity is like having a constant, low-grade headache. While Twitter has never been a particularly bad distraction as measured by hours spent on it (I keep metrics on that, and it’s rarely even in the top 10), I feel like consulting Twitter is something I do when I am stuck, or having to think about something hard. “I’ll just check Twitter” is an easy way to “take a break” right at the moment that I ought to be thinking harder, eliminating distractions, mustering my will to focus.

This has been particularly stark for me as I’ve been trying to get some real writing done over the last couple of weeks and have been consistently drawing a blank. Given that I have a deadline coming up on Wednesday and another next Monday, something had to give.

Or, as Joss Whedon put it, when he quit Twitter:

If I’m going to start writing again, I have to go to the quiet place, and this is the least quiet place I’ve ever been in my life.

I’m an introvert, and using Twitter is more like being at a gigantic, awkward party all the time than any other online space I’ve ever been in.

There’s an irony here. Mostly what people like that I put on Twitter (and yes, I’ve checked) are announcements that link to other things, accomplishments in other areas, like a blog post, or a feature in Twisted, but using Twitter itself is inimical to completing those things.

I’m loath to abandon the positive aspects of Twitter. Some people also use Twitter as a replacement for RSS, and I don’t want to break the way they choose to pay attention to the stuff that I do. And a few of my friends communicate exclusively through direct messages.

The really “good” thing about Twitter is discovery. It enables you to discover people, content, and, eugh, “brands” that appeal to you. I have discovered things that I enjoy many times. The fundamental problem I am facing, which is a little bit hard to admit to oneself, is that I have discovered enough. I have enough games to play, enough books and articles to read, enough podcasts to listen to, enough movies to watch, enough code to write, enough open source libraries to investigate, that I will be busy for years based on what I already know.

For me, using Twitter’s timeline at this point to “discover” more things is like being at a delicious buffet, being so full I’m nauseous, and stuffing my pockets with shrimp “just in case” I’m hungry “when I get home” - and then, of course, not going home.

Even disregarding my desire to produce useful content, if I just want to enjoy consuming content more deeply, I have to take the time to engage with it properly.

So here’s what I’m doing:

  1. I am turning on the “anyone can direct message me” feature. We’ll see how that goes; I may have to turn it off again later. As always, I’d prefer you send email (or text me, if it’s time-critical).
  2. I am unfollowing literally everyone, and will not follow people in the future. Checking my timeline was the main information junk-food I want to avoid.
  3. Since my timeline, rather than mentions and replies, was my main source of distraction, I’ll continue paying attention to mentions and replies (at least for now; I’ll have to see if that becomes a problem in the absence of a timeline).
  4. In order to avoid producing such information junk-food myself, I’m going to try to directly tweet less, and put more things into brief blog posts so I have enough room to express them. I won’t say “not at all”, but most of the things that I put on Twitter would really be better as longer, more thoughtful articles.

Please note that there’s nothing prescriptive here. I’m outlining what I’m doing in the hopes that others might recognize similar problems with themselves - if everyone used Twitter this way, there would hardly be a point to the site.

Also, if I’ve unfollowed you, that doesn’t mean I’m not interested in what you have to say. I already have a way of keeping in touch with people’s more fully-formed ideas: I use Blogtrottr to deliver relevant blog articles to my email. If I previously followed you and you think I might not be reading your blog already (in most cases I believe I already am), please feel free to drop me a line with an RSS link.

by Glyph at June 09, 2015 12:41 AM

June 07, 2015

Twisted Matrix Laboratories

Twisted Fellowship 2015: Call for proposals

On behalf of the Software Freedom Conservancy and the Twisted project I'm happy to announce that we're looking for a paid maintainer for the Twisted project.

Funding a software developer to work as a maintainer will help Twisted grow as a project, and enable Twisted's development community to increase their output of innovative code for the public's benefit. Twisted has strict coding, testing, documentation, and review standards, which ensures excellent code quality, continually improving documentation and code test coverage, and minimal regressions. Code reviews are historically a bottleneck for getting new code merged. A funded maintainer will help alleviate this bottleneck, and speed Twisted's development.

You can read more about the 2015 fellowship at https://twistedmatrix.com/trac/wiki/Fellowship2015

by Itamar Turner-Trauring (noreply@blogger.com) at June 07, 2015 11:12 PM

June 06, 2015

Moshe Zadka

(Somewhat confused) Thoughts on Languages, Eco-systems and Development

There are a few interesting new languages that would be fun to play with: Rust, Julia, LFE, Go and D all have interesting takes on some domain. But ultimately, a language is only as interesting as the things it can do. It used to be that “things it can do” referred to the standard library. I am old enough to remember when “batteries included” was one of the most interesting benefits Python had — the language had dared doing such avant-garde things in ’99 as having a *built-in* url fetching library. That behaved, pretty much, the same on all platform. Out of the box. (It’s hard to convey how amazing this was in ’99.)

This is no longer the case. Now what distinguishes Python as “can do lots of things” is its built-in package management. Python, pip and virtualenv together give the ability to work on multiple Python projects, that need different versions of libraries, without any interference. They are, to a first approximation, 100% reliable. With pip 7.0 supporting caching wheels, virtualenvs are even more disposable (I am starting to think pip uninstall is completely unnecessary). In fact, except for virtualenv itself, it is rare nowadays to install any Python module globally. There are some rusty corners, of course:

  • Uploading packages is…non-trivial. Any time the default tool is so bad that the official docs say “use this 3rd party library instead”, something is wrong.
  • The distinction between “PyPI name” and “Python package name”, while theoretically reasonable, is just annoying and hard to explain
  • (Related) There is no official PyPI-name-conflict-resolution procedure.

I’ll note npm, for example, clearly does better on these three (while doing worse on some things that Python does better). Of the languages mentioned above, it is nice to see that most have out-of-the-box built-in tools for ecosystem creation. Julia’s hilariously piggybacks on GitHub’s. Go’s…well…uses URLs as the equivalent PyPI-level-names. Rust has a pretty decent system in cargo and crates.io (the .toml file is kind of weird, but I’ve seen worse). While there is justifiable excitement about Rust’s approach to memory and thread-safety, it might be that it will win in the end based on having a better internal package management system than the low-level alternatives.

Note: OS package mgmt (yum, apt-get, brew and whatever Windows uses nowadays) is not a good fit for what developers need from a language-level package manager. Locality, quick package refreshes — these matter more to developers than to OS end-users.

by moshez at June 06, 2015 07:55 AM

June 05, 2015

Duncan McGreggor

Scientific Computing and the Joy of Language Interop

The scientific computing platform for Erlang/LFE has just been announced on the LFE blog. Though written in the Erlang Lisp syntax of LFE, it's fully usable from pure Erlang. It wraps the new py library for Erlang/LFE, as well as the ErlPort project. More importantly, though, it wraps Python 3 libs (e.g., math, cmath, statistics, and more to come) and the ever-eminent NumPy and SciPy projects (those are in-progress, with matplotlib and others to follow).

(That LFE blog post is actually a tutorial on how to use lsci for performing polynomial curve-fitting and linear regression, adapted from the previous post on Hy doing the same.)

With the release of lsci, one can now start to easily and efficiently perform computationally intensive calculations in Erlang/LFE (and any other Erlang Core-compatible language, e.g., Elixir, Joxa, etc.) That's super-cool, but it's not quite the point ...

While working on lsci, I found myself experiencing a great deal of joy. It wasn't just the fact that supervision trees in a programming language are insanely great. Nor just the fact that scientific computing in Python is one of the best in any language. It wasn't only being able to use two syntaxes that I love (LFE and Python) cohesively, in the same project. And it wasn't the sum of these either -- you probably see where I'm going with this ;-) The joy of these and many other fantastic aspects of inter-operation between multiple powerful computing systems is truly greater than the sum of its parts.

I've done a bunch of Julia lately and am a huge fan of this language as well. One of the things that Julia provides is explicit interop with Python. Julia is targeted at the world of scientific computing, aiming to be a compelling alternative to Fortran (hurray!), so their recognition of the enormous contribution the Python scientific computing community has made to the industry is quite wonderful to see.

A year or so ago I did some work with Clojure and LFE using Erlang's JInterface. Around the same time I was using LFE on top of  Erjang, calling directly into Java without JInterface. This is the same sort of Joy that users of Jython have, and there are many more examples of languages and tools working to take advantage of the massive resources available in the computing community.

Obviously, language inter-op is not new. Various FFIs have existed for quite some time (I'm a big fan of the Common Lisp CFFI), but what is new (relatively, that is ... as I age, anything in the past 10 years is new) is that we are seeing this not just for programs reaching down into C/C++, but reaching across, to other higher-level languages, taking advantage of their great achievements -- without having to reinvent so many wheels.

When this level of cooperation, credit, etc., is done in the spirit of openness, peer-review, code-reuse, and standing on the shoulders of giants (or enough people to make giants!), we get joy. Beautiful, wonderful coding joy.

And it's so much greater than the sum of the parts :-)


by Duncan McGreggor (noreply@blogger.com) at June 05, 2015 02:53 PM

May 24, 2015

Twisted Matrix Laboratories

Twisted 15.2.1 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.2.1.

This is a bugfix release for the 15.2 series that fixes a regression in the new logging framework.

You can find the downloads on PyPI (or alternatively on the Twisted Matrix Labs website).

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
HawkOwl

by HawkOwl (noreply@blogger.com) at May 24, 2015 12:22 PM

May 23, 2015

Moshe Zadka

Unicode, UTF-8 and you

Unicode is not a panacea. Some people’s names can’t even be written in unicode. However, as far as universal encodings go, it is the best we have got — warts and all. It is the only reasonable way to represent text inside programs, except for very very specialized needs (no, you don’t qualify).

Now, programs are made of libraries, and often there are several layers of abstraction between the library and the program. Sometimes, some weird abstraction layer in the middle will make it hard to convey user configuration into the library’s guts. Code should figure things out itself, most of the time.

So, there are several ways to make dealing with unicode not-horrible.

Unicode internally

I’ve already mentioned it, but it bears repeating. Internal representation should use the language’s built-in type (str in Python 3, String in Java, unicode in Python 2). All formatting, templating, etc. should be, internally, represented as taking unicode parameters and returning unicode results.

Standards

Obviously, when interacting with an external protocol that allows the other side to specify encoding, follow the encoding it specifies. Your program should support, at least, UTF-8, UTF-16, UTF-32 and Latin-1 through Latin-9. When choosing output encoding, choose UTF-8 by default. If there is some way for the user to specify an encoding, allow choosing between that and UTF-16. Anything else should be under “Advanced…” or, possibly, not at all.

Non-standards

When reading input that is not marked with an encoding, attempt to decode as UTF-8, then as UTF-16 (most UTF-16 decoders will auto-detect endianity, but it is pretty easy to hand-hack if people put in the BOM. UTF-8/16 are unlikely to have false positives, so if either succeeds, it’s likely correct. Otherwise, as-ASCII-and-ignore-high-order is often the best that can be done. If it is reasonable, allow user-intervention in specifying the encoding.

When writing output, the default should be UTF-8. If it is non-trivial to allow user specification of the encoding, that is fine. If it is possible, UTF-16 should be offered (and BOM should be prepended to start-of-output). Other encodings are not recommended if there is no way to specify them: the reader will have to guess correctly. At the least, giving the user such options should be hidden behind an “Advanced…” option.

The most popular I/O that does not have explicit encoding, or any way to specify one, is file names on UNIX systems. UTF-8 should be assumed, and reasonably recovered from when it proves false. No other encoding is reasonable (UTF-16 is uniquely unsuitable since UNIX filenames cannot have NULs, and other encodings cannot encode some characters).

by moshez at May 23, 2015 04:04 PM

May 19, 2015

Twisted Matrix Laboratories

Twisted 15.2.0 Released

On behalf of Twisted Matrix Labs, I'm honoured to announce the release of Twisted 15.2.

Bringing not only headlining features but also a lot of incremental improvements, this release has got plenty to like:

  • twisted.logger has landed! This is a brand-new, feature-rich logging framework.
  • Python 3.4 is now a supported platform for all the Py3 ported modules.
  • twisted.trial.unittest.TestCase's assertEqual, assertTrue, and assertFalse methods now pass through the standard library's more informative failure messages.
  • twisted.python.filepath.FilePath now supports Unicode (text) paths properly, and includes as{Bytes,Text}Mode methods for interacting with APIs that require a text/bytes-only FilePath.
  • twisted.mail.smtp.sendmail now supports ESMTP and provides a high-level interface for sending mail.
  • The following parts of Twisted are now ported to Python 3:
    • twisted.internet.process
    • twisted.cred.credentials
    • twisted.python.modules
    • twisted.internet.kqreactor
    • twisted.internet.endpoints.ProcessEndpoint
    • twisted.web.static
  • Over 50 tickets closed since 15.1.

You can find the downloads on PyPI (or alternatively the Twisted website).

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
HawkOwl

by HawkOwl (noreply@blogger.com) at May 19, 2015 06:51 AM

May 09, 2015

Glyph Lefkowitz

Separate your Fakes and your Inspectors

When you are writing unit tests, you will commonly need to write duplicate implementations of your dependencies to test against systems which do external communication or otherwise manipulate state that you can’t inspect. In other words, test fakes. However, a “test fake” is just one half of the component that you’re building: you’re also generally building a test inspector.

As an example, let’s consider the case of this record-writing interface that we may need to interact with.

1
2
3
4
5
6
class RecordWriter(object):
    def write_record(self, record):
        "..."

    def close(self):
        "..."

This is a pretty simple interface; it can write out a record, and it can be closed.

Faking it out is similarly easy:

1
2
3
4
5
class FakeRecordWriter(object):
    def write_record(self, record):
        pass
    def close(self):
        pass

But this fake record writer isn’t very useful. It’s a simple stub; if our application writes any interesting records out, we won’t know about it. If it closes the record writer, we won’t know.

The conventional way to correct this problem, of course, is to start tracking some state, so we can assert about it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class FakeRecordWriter(object):
    def __init__(self):
        self.records = []
        self.closed = False

    def write_record(self, record):
        if self.closed:
            raise IOError("cannot write; writer is closed")
        self.records.append(record)

    def close(self):
        if self.closed:
            raise IOError("cannot close; writer is closed")
        self.closed = True

This is a very common pattern in test code. However, it’s an antipattern.

We have exposed 2 additional, apparently public attributes to application code: .records and .closed. Our original RecordWriter interface didn’t have either of those. Since these attributes are public, someone working on the application code could easily, inadvertently access them. Although it’s unlikely that an application author would think that they could read records from a record writer by accessing .records, it’s plausible that they might add a check of .closed before calling .close(), to make sure they won’t get an exception. Such a mistake might happen because their IDE auto-suggested the completion, for example.

The resolution for this antipattern is to have a separate “fake” object, exposing only the public attributes that are also on the object being faked, and an “inspector” object, which exposes only the functionality useful to the test.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class WriterState(object):
    def __init__(self):
        self.records = []
        self.closed = False

    def raise_if_closed(self):
        if self.closed:
            raise ValueError("already closed")


class _FakeRecordWriter(object):
    def __init__(self, writer_state):
        self._state = writer_state

    def write_record(self, record):
        self._state.raise_if_closed()
        self._state.records.append(record)

    def close(self):
        self._state.raise_if_closed()
        self._state.closed = True


def create_fake_writer():
    state = WriterState()
    return state, _FakeRecordWriter(state)

In this refactored example, we now have a top-level entry point of create_fake_writer, which always creates a pair of WriterState and thing-which-is-like-a-RecordWriter. The type of _FakeRecordWriter can now be private, because it’s no longer interesting on its own; it exposes nothing beyond the methods it’s trying to fake.

Whenever you’re writing test fakes, consider writing them like this, to ensure that you can hand application code the application-facing half of your fake, and test code the test-facing half of the fake, and not get them mixed up.

by Glyph at May 09, 2015 06:52 AM

April 30, 2015

Moshe Zadka

Lessons learned in porting to PyPy

I’m running the ncolony tests with pypy so I can add PyPy support. I expected it to be a no-op — but turned out I have ingrained expectations that are no longer true: moreover, PyPy is right, and I’m wrong.

  • “hello” is “hello” –> not necessarily true. PyPy does not intern strings like CPython does.
  • file(name, ‘w’).write(“string”) is bad and you should not do that. PyPy is garbage collected, not ref counted, so the file might not be closed for a while.

That was it! It was actually a pleasant experience :)

by moshez at April 30, 2015 02:13 AM

April 13, 2015

Twisted Matrix Laboratories

Twisted 15.1.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.1.0 -- just in time for the PyCon sprints!

This is not a big release, but does have some nice-to-haves:

- You can now install Twisted's optional dependencies easier -- for example, pip install twisted[tls] installs Twisted with TLS support.
- twisted.web.static.File allows defining a custom resource for rendering forbidden pages.
- Twisted's MSN support is now deprecated.
- More documentation has been added on how Trial finds tests.
- ...and 26 other closed tickets containing bug fixes, feature enhancements, and documentation.

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively the Twisted website) . The NEWS file is also available for reading.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Hawkie Owl

by HawkOwl (noreply@blogger.com) at April 13, 2015 08:23 AM

April 10, 2015

Moshe Zadka

On Authentication in Web Applications

So you are writing a web application. Maybe it’s “2.0”, maybe it’s “1.0”. Maybe it’s 3.0? I am not sure if that’s a thing. Anyway, the important thing is that you want to know that people are who they say they are (I guess not if you’re reimplementing 4Chan or other anonymous-only applications). So you have some buttons that say “Log in” and “Sign up”. In the “Sign Up” page, you have the user enter in their e-mail address (twice, of course, because otherwise they’ll misspell it) and the password (twice, same reason), and possibly a username. You check the password is “strong enough”, and you have a little widget that rates it “weak”, “fair”, “strong” or whatever descriptive words you feel like today. Of course, being a civilized person, you store the passwords on the backend salted and hashed. You make sure that the password comparison is resistant to side-channel attacks. You add a CAPTCHA for the “forgot password” page to prevent mass-attacking it. Since you know your users are almost certainly using the same password on other sites, that do not do all of that, you also offer a 2-factor authentication scheme using Google Authenticator and falling back to SMS codes. Of course, before you store an SMS number as the fallback, you send it a trial code to make sure it is correct (right, WordPress.com?)

And, of course, no users use your 2-factor scheme, one day they get fished, and all their accounts are compromised.

Please note that the above paragraph is the absolute minimum you should do to be a responsible thing that says “sign up with your e-mail and password”. Also highly recommended is participating in a white-hat bug bounty, hiring dedicated pen-testers and having a lot of server-side heuristics to detect a brute-force attack and shut it down immediately.

Do you know who has the resources to do all of that correctly? I can think of two companies that get it all correct. Their names start with consecutive letters of the alphabet… :)

Yep, Facebook and Google actually have the security teams and expertise to check every single one of those boxes (with the exception of the silly little widget that rates your password strength which never in the history of mankind has ever caused a user to choose a different password, because they only remember one password for all the sites they use and it doesn’t change.) Please, for the love of kittens, puppies and hedgehogs, put a little “Sign-up with Facebook” and “Sign-up with Google+” widgets on your web-site. If you are worried about Facebook and Google “capturing” your users, just make sure to grab their (verified!) e-mail addresses when they sign up through OAuth, so that if you ever want to authenticate yourself, all you need to do is just have a “Forgot/recover password” widget, and you are on your way. You can even e-mail your users to tell them “Hey, FB/GOOG screwed us over, so start logging in with your password, and here is how you can recover the password”.

There really is no excuse not to offer this, in 2015. Unless you think your security team is roughly as good as Facebook’s.

by moshez at April 10, 2015 05:02 AM

April 02, 2015

Glyph Lefkowitz

Not Funny

What?

Today’s “joke” from the PSF about PyCon Havana was not funny, and, speaking as a PSF Fellow, I do not endorse it.

What’s Not Funny?

Honestly I’m not sure where I could find a punch-line in this. I just don’t see much there.

But if I look for something that’s supposed to be “funny”, here’s what I see:

  1. Cuba is a backward country without sufficient technology to host a technical conference, and it is absurd and therefore “funny” that we could hold PyCon there.
  2. We are talking about PyCon US; despite the recent thaw in relations, decades of hostility that have torn families apart make it “funny” that US citizens would go to Cuba for a conference.

These things aren’t funny.

Some Non-Reasons I’m Writing This

A common objection when someone speaks up about a subject like this is that it’s “just a joke”. That anyone speaking up and saying that offensive things aren’t funny somehow dislikes the very concept of humor. I don’t know why people think that, but I guess I need to make it clear: I am not an enemy of joy. That is not why I’m saying something.

I’m also not Cuban, I have no Cuban relatives, and until this incident I didn’t even know I had friends of Cuban extraction, so I am not personally insulted by this. That means another common objection will crop up: some will ask if I’m just looking for an excuse to get offended, to write about taking offense and get attention for it.

So let me assure you, that personally, this is not the kind of attention that I want. I really didn’t want to write this post. It’s awkward. I really don’t want to be having these types of conversations. I want to get attention for the software I write, not for my opinions about tacky blog posts.

Why, Then?

I might not know many Cuban python programmers personally, but I’d love to meet some. I’d love to meet anyone who cares about programming. Meeting diverse people from all over the world and working with them on code has been one of the great joys of my life. I love the fact that the Python community facilitates that and tries hard to reach out to people and to make them feel welcome.

I am writing this because I know that, somewhere out there, there’s a Cuban programmer, or a kid who will grow up to be one, who might see that blog post, and think that the Python community, or the software industry, thinks that they’re a throw-away punch line. I want them to know that I don’t think they’re a punch line. I want them to know that the python community doesn’t think they’re a punch line. I want them to know that they are not a punch line, and I want them to pursue their interest in programming exactly as far as it takes them and not push them away.

These people are real, they are listening, and if you tell me to just “lighten up” you are saying that your enjoyment of a joke is more important than their membership in our community.

It’s Not Just Me

The PSF is paying attention. The chairman of the PSF has acknowledged the problematic nature of the “joke”. Several of my friends in the Python community spoke up before I did (here, here, here, here, here, here, here), and I am very grateful for their taking the community to task and keeping us true to ideals of inclusiveness and empathy.

That doesn’t excuse the public statement, made using official channels, which was in very poor taste. I am also very disappointed in certain people within the PSF1 who seem intent on doubling down on this mistake rather than trying to do something to correct it.


  1. names withheld to avoid a pile-on, but you know who you are and you should be ashamed. 

by Glyph at April 02, 2015 05:47 AM

March 23, 2015

Moshe Zadka

Deploying With Docker

Glyph wrote about an interesting way to deploy Python applications to Docker. I wanted to try it out, and exercise ncolony a little. The result is NaNoAuto, which is still very preliminary. It was mostly written as a way to test out deploying to Docker and exercise ncolony. Several notes that I learned the hard way:

  • On Ubuntu, all docker client commands need to run as root to access the docker container’s UNIX domain socket. I think there is a way to run it on a host/port and ask for a username/password, but I have not figured that out.
  • Glyph’s scripts need a recent version of Docker. The built-in version for Ubuntu is too old — make sure to install docker from Docker’s apt repository.
  • The entry point you give Docker must not exit — and so must not daemonize. Glyph’s app used an explicit reactor running it (well, even worse — Klein’s implicit reactor). Civilized people, of course, use ‘twistd’. When building things like a civilized person, please remember to pass the “–nodaemonize” flag (AKA “-n”) to twistd.
  • On my personal TODO list is to add a hash checker to the runner, so that the wheels installed from the internet can have their hashes matched. It is possible that “peep” could play a role here.
  • The scripts are divided into “build-base” and “do-build”. “build-base” is the long-running but fairly stable part — it only changes if you change the basic platform (OS, Python interpreter, needed non-Pythonic parts). The “do-build” part is fairly fast, and needs to be run for every change to a Python file.

Note that this still means a little bit of overhead on top of building a new wheel for each change to a Python file. It should be possible to write something on top of “docker exec” that will copy Python files directly from source to container so as to facilitate even quicker edit/debug cycles.

by moshez at March 23, 2015 03:42 AM

March 22, 2015

Glyph Lefkowitz

Headcanon

My Castle headcanon1 has always been that, when they finally catch up with Mal (oh, and they definitely do catch up with him; the idea that no faction within the Alliance would seek revenge for what he’s done is laughable) they decide that they can, in fact, “make people better”, and he is no exception. After the service he has done in exposing the corruption and cover-ups behind Miranda, they can’t just dispose of him, so they want to rehabilitate him and make him a productive, contributing member of alliance society.

They can’t simply re-format his brain directly, of course. It wouldn’t be compatible with his personality, and his underlying connectome would simply reject the overlaid neural matrix; it would degrade over time, and he would have to return for treatments far too regularly for it to be practical.

The most fitting neural re-programming they can give him, of course, would be to have him gradually acclimate to becoming a lawman. So “Richard Castle” begins as an anti-authoritarian man-child and acquiesces, bit by bit, to the necessity of becoming an agent of the enforcement of order.

My favorite thing about the current season is that, while it is already obvious that my interpretation is correct, this season has given Mal a glimmer of hope. Clearly the reprogramming isn’t working, and aspects of his real life are coming through.

They really can’t take the sky from him.

by Glyph at March 22, 2015 08:16 AM

March 14, 2015

Moshe Zadka

Announcing: NColony — the infrastructure for your Python

For the last few days, I have been working on infrastructure for writing multi-process servers: basically, something that would be a good start-command for your Docker, or something you put in your ‘@reboot’ crontab to bring the entire thing app. It is still not tested, mature or production ready — there are still a lot of unimplemented features. I am putting the announcement up now, though, because at least everything that still needs to be done (as far as I know) to get it feature-ready is documented as an issue.

The code, issues and so on are all on the Github for ncolony. Patches for existing issues are welcome, as well as new issues. Questions, comments and code-critiques are also more than welcome.

by moshez at March 14, 2015 02:08 AM

March 06, 2015

Glyph Lefkowitz

Deploying Python Applications with Docker - A Suggestion

Deploying python applications is much trickier than it should be.

Docker can simplify this, but even with Docker, there are a lot of nuances around how you package your python application, how you build it, how you pull in your python and non-python dependencies, and how you structure your images.

I would like to share with you a strategy that I have developed for deploying Python apps that deals with a number of these issues. I don’t want to claim that this is the only way to deploy Python apps, or even a particularly right way; in the rapidly evolving containerization ecosystem, new techniques pop up every day, and everyone’s application is different. However, I humbly submit that this process is a good default.

Rather than equivocate further about its abstract goodness, here are some properties of the following container construction idiom:

  1. It reduces build times from a naive “sudo setup.py install” by using Python wheels to cache repeatably built binary artifacts.
  2. It reduces container size by separating build containers from run containers.
  3. It is independent of other tooling, and should work fine with whatever configuration management or container orchestration system you want to use.
  4. It uses existing Python tooling of pip and virtualenv, and therefore doesn’t depend heavily on Docker. A lot of the same concepts apply if you have to build or deploy the same Python code into a non-containerized environment. You can also incrementally migrate towards containerization: if your deploy environment is not containerized, you can still build and test your wheels within a container and get the advantages of containerization there, as long as your base image matches the non-containerized environment you’re deploying to. This means you can quickly upgrade your build and test environments without having to upgrade the host environment on finicky continuous integration hosts, such as Jenkins or Buildbot.

To test these instructions, I used Docker 1.5.0 (via boot2docker, but hopefully that is an irrelevant detail). I also used an Ubuntu 14.04 base image (as you can see in the docker files) but hopefully the concepts should translate to other base images as well.

In order to show how to deploy a sample application, we’ll need a sample application to deploy; to keep it simple, here’s some “hello world” sample code using Klein:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# deployme/__init__.py
from klein import run, route

@route('/')
def home(request):
    request.setHeader("content-type", "text/plain")
    return 'Hello, world!'

def main():
    run("", 8081)

And an accompanying setup.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from setuptools import setup, find_packages

setup (
    name             = "DeployMe",
    version          = "0.1",
    description      = "Example application to be deployed.",
    packages         = find_packages(),
    install_requires = ["twisted>=15.0.0",
                        "klein>=15.0.0",
                        "treq>=15.0.0",
                        "service_identity>=14.0.0"],
    entry_points     = {'console_scripts':
                        ['run-the-app = deployme:main']}
)

Generating certificates is a bit tedious for a simple example like this one, but in a real-life application we are likely to face the deployment issue of native dependencies, so to demonstrate how to deal with that issue, that this setup.py depends on the service_identity module, which pulls in cryptography (which depends on OpenSSL) and its dependency cffi (which depends on libffi).

To get started telling Docker what to do, we’ll need a base image that we can use for both build and run images, to ensure that certain things match up; particularly the native libraries that are used to build against. This also speeds up subsquent builds, by giving a nice common point for caching.

In this base image, we’ll set up:

  1. a Python runtime (PyPy)
  2. the C libraries we need (the libffi6 and openssl ubuntu packages)
  3. a virtual environment in which to do our building and packaging
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# base.docker
FROM ubuntu:trusty

RUN echo "deb http://ppa.launchpad.net/pypy/ppa/ubuntu trusty main" > \
    /etc/apt/sources.list.d/pypy-ppa.list

RUN apt-key adv --keyserver keyserver.ubuntu.com \
                --recv-keys 2862D0785AFACD8C65B23DB0251104D968854915
RUN apt-get update

RUN apt-get install -qyy \
    -o APT::Install-Recommends=false -o APT::Install-Suggests=false \
    python-virtualenv pypy libffi6 openssl

RUN virtualenv -p /usr/bin/pypy /appenv
RUN . /appenv/bin/activate; pip install pip==6.0.8

The apt options APT::Install-Recommends and APT::Install-Suggests are just there to prevent python-virtualenv from pulling in a whole C development toolchain with it; we’ll get to that stuff in the build container. In the run container, which is also based on this base container, we will just use virtualenv and pip for putting the already-built artifacts into the right place. Ubuntu expects that these are purely development tools, which is why it recommends installation of python development tools as well.

You might wonder “why bother with a virtualenv if I’m already in a container”? This is belt-and-suspenders isolation, but you can never have too much isolation.

It’s true that in many cases, perhaps even most, simply installing stuff into the system Python with Pip works fine; however, for more elaborate applications, you may end up wanting to invoke a tool provided by your base container that is implemented in Python, but which requires dependencies managed by the host. By putting things into a virtualenv regardless, we keep the things set up by the base image’s package system tidily separated from the things our application is building, which means that there should be no unforseen interactions, regardless of how complex the application’s usage of Python might be.

Next we need to build the base image, which is accomplished easily enough with a docker command like:

1
$ docker build -t deployme-base -f base.docker .;

Next, we need a container for building our application and its Python dependencies. The dockerfile for that is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# build.docker
FROM deployme-base

RUN apt-get install -qy libffi-dev libssl-dev pypy-dev
RUN . /appenv/bin/activate; \
    pip install wheel

ENV WHEELHOUSE=/wheelhouse
ENV PIP_WHEEL_DIR=/wheelhouse
ENV PIP_FIND_LINKS=/wheelhouse

VOLUME /wheelhouse
VOLUME /application

ENTRYPOINT . /appenv/bin/activate; \
           cd /application; \
           pip wheel .

Breaking this down, we first have it pulling from the base image we just built. Then, we install the development libraries and headers for each of the C-level dependencies we have to work with, as well as PyPy’s development toolchain itself. Then, to get ready to build some wheels, we install the wheel package into the virtualenv we set up in the base image. Note that the wheel package is only necessary for building wheels; the functionality to install them is built in to pip.

Note that we then have two volumes: /wheelhouse, where the wheel output should go, and /application, where the application’s distribution (i.e. the directory containing setup.py) should go.

The entrypoint for this image is simply running “pip wheel” with the appropriate virtualenv activated. It runs against whatever is in the /application volume, so we could potentially build wheels for multiple different applications. In this example, I’m using pip wheel . which builds the current directory, but you may have a requirements.txt which pins all your dependencies, in which case you might want to use pip wheel -r requirements.txt instead.

At this point, we need to build the builder image, which can be accomplished with:

1
$ docker build -t deployme-builder -f build.docker .;

This builds a deployme-builder that we can use to build the wheels for the application. Since this is a prerequisite step for building the application container itself, you can go ahead and do that now. In order to do so, we must tell the builder to use the current directory as the application being built (the volume at /application) and to put the wheels into a wheelhouse directory (one called wheelhouse will do):

1
2
3
4
5
$ mkdir -p wheelhouse;
$ docker run --rm \
         -v "$(pwd)":/application \
         -v "$(pwd)"/wheelhouse:/wheelhouse \
         deployme-builder;

After running this, if you look in the wheelhouse directory, you should see a bunch of wheels built there, including one for the application being built:

1
2
3
4
5
6
$ ls wheelhouse
DeployMe-0.1-py2-none-any.whl
Twisted-15.0.0-pp27-none-linux_x86_64.whl
Werkzeug-0.10.1-py2-none-any.whl
cffi-0.9.0-py2-none-any.whl
# ...

At last, time to build the application container itself. The setup for that is very short, since most of the work has already been done for us in the production of the wheels:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# run.docker
FROM deployme-base

ADD wheelhouse /wheelhouse
RUN . /appenv/bin/activate; \
    pip install --no-index -f wheelhouse DeployMe

EXPOSE 8081

ENTRYPOINT . /appenv/bin/activate; \
           run-the-app

During build, this dockerfile pulls from our shared base image, then adds the wheelhouse we just produced as a directory at /wheelhouse. The only shell command that needs to run in order to get the wheels installed is pip install TheApplicationYouJustBuilt, with two options: --no-index to tell pip “don’t bother downloading anything from PyPI, everything you need should be right here”, and, -f wheelhouse which tells it where “here” is.

The entrypoint for this one activates the virtualenv and invokes run-the-app, the setuptools entrypoint defined above in setup.py, which should be on the $PATH once that virtualenv is activated.

The application build is very simple, just

1
$ docker build -t deployme-run -f run.docker .;

to build the docker file.

Similarly, running the application is just like any other docker container:

1
$ docker run --rm -it -p 8081:8081 deployme-run

You can then hit port 8081 on your docker host to load the application.

The command-line for docker run here is just an example; for example, I’m passing --rm so that if you run this example just so that it won’t clutter up your container list. Your environment will have its own way to call docker run, how to get your VOLUMEs and EXPOSEd ports mapped, and discussing how to orchestrate your containers is out of scope for this post; you can pretty much run it however you like. Everything the image needs is built in at this point.

To review:

  1. have a common base container that contains all your non-Python (C libraries and utilities) dependencies. Avoid installing development tools here.
  2. use a virtualenv even though you’re in a container to avoid any surprises from the host Python environment
  3. have a “build” container that just makes the virtualenv and puts wheel and pip into it, and runs pip wheel
  4. run the build container with your application code in a volume as input and a wheelhouse volume as output
  5. create an application container by starting from the same base image and, once again not installing any dev tools, pip install all the wheels that you just built, turning off access to PyPI for that installation so it goes quickly and deterministically based on the wheels you’ve built.

While this sample application uses Twisted, it’s quite possible to apply this same process to just about any Python application you want to run inside Docker.

I’ve put a sample project up on Github which contain all the files referenced here, as well as “build” and “run” shell scripts that combine the necessary docker commandlines to go through the full process to build and run this sample app. While it defaults to the PyPy runtime (as most networked Python apps generally should these days, since performance is so much better than CPython), if you have an application with a hard CPython dependency, I’ve also made a branch and pull request on that project for CPython, and you can look at the relatively minor patch required to get it working for CPython as well.

Now that you have a container with an application in it that you might want to deploy, my previous write-up on a quick way to securely push stuff to a production service might be of interest.

(Once again, thanks to my employer, Rackspace, for sponsoring the time for me to write this post. Thanks also to Shawn Ashlee and Jesse Spears for helping me refine these ideas and listening to me rant about them. However, that expression of gratitude should not be taken as any kind of endorsement from any of these parties as to my technical suggestions or opinions here, as they are entirely my own.)

by Glyph at March 06, 2015 10:58 PM

March 03, 2015

Moshe Zadka

sphinx-quickstart: Bad idea, or worst idea?

When reading the documentation for sphinx, it suggests getting started by running the sphinx-quickstart “wizard”. Inspired by such beloved pieces of software like PowerPoint, sphinx-quickstart will ask you questions about your project and then manufacture a bunch of files which, when built, produce pretty documentation. This…this is wrong. We don’t start writing a new project in Python by having it ask a bunch of questions and then manufacture setup.y. We document, we have samples and we let people who are, ideally, capable of writing complicated Python module to create those using their favorite editor.

One might think perhaps this is a design limitation of sphinx  but it is not really. It’s just a silly little thing in the documents. For a minimal (and somewhat typical) sphinx set up all you need is:

index.rst

 
PROJECT-NAME-HERE
=================

.. toctree::
   :maxdepth: 2

conf.py

master_doc = 'index'
project = 'PROJECT'
copyright = '2015, YOUR NAME'
author = 'YOUR NAME'
version = release = '0.0.1'

Building the documentation, something that sphinx-quickstart will helpfully create a Makefile (that one calls recursively from their Python Makefile, I guess) and Windows batch file [in sphinx-quickstart’s defense, it asks whether to do those things]. Again, one might have the impression that perhaps sphinx is so complicated one might need those. Here is the complicated command needed to build the documentation:

sphinx-build -b html . _build/html

Sphinx is pretty well documented. Starting from this skeleton, it is easy to add extensions, change configuration options, etc. There is absolutely no excuse for sphinx-quickstart, a blemish on an otherwise pretty nifty documentation system. If you were also turned off by needing to run a quickstart script that fills your source directory with things, fear no more. Build your skeleton yourself, and enjoy beautiful docs!

by moshez at March 03, 2015 12:18 PM

February 13, 2015

Glyph Lefkowitz

According To...?

I believe that web browsers must start including the ultimate issuer in an always-visible user interface element.

You are viewing this website at glyph.twistedmatrix.com. Hopefully securely.

We trust that the math in the cryptographic operations protects our data from prying eyes. However, trusting that the math says the content is authentic and secure is useless unless you know who your computer is talking to. The HTTPS/TLS system identifies your interlocutor by their domain name.

In other words, you trust that these words come from me because glyph.twistedmatrix.com is reasonably associated with me. If the lock on your web browser’s title bar was next to the name stuff-glyph-says.stealing-your-credit-card.example.com, presumably you might be more skeptical that the content was legitimate.

But... the cryptographic primitives require a trust root - somebody that you “already trust” - meaning someone that your browser already knows about at the time it makes the request - to tell you that this site is indeed glyph.twistedmatrix.com. So you read these words as if they’re the world according to Glyph, but according to whom is it according to me?

If you click on some obscure buttons (in Safari and Firefox you click on the little lock; in Chrome you click on the lock, then “Connection”) you should see that my identity as glyph.twistedmatrix.com has been verified by “StartCom Class 1 Primary Intermediate Server CA” who was in turn verified by “StartCom Certification Authority”.

But if you do this, it only tells you about this one time. You could click on a link, and the issuer might change. It might be different for just one script on the page, and there’s basically no way to find out. There are more than 50 different organizations which could certify that could tell your browser to trust that this content is from me, several of whom have already been compromised. If you’re concerned about government surveillance, this list includes the governments of Hong Kong, Japan, France, the Netherlands, Turkey, as well as many multinational corporations vulnerable to secret warrants from the USA.

Sometimes it’s perfectly valid to trust these issuers. If I’m visiting a website describing some social services provided to French citizens, it would of course be reasonable for that to be trusted according to the government of France. But if you’re reading an article on my website about secure communications technology, probably it shouldn’t be glyph.twistedmatrix.com brought to you by the China Internet Network Information Center.

Information security is all about the user having some expectation and then a suite of technology ensuring that that expectation is correctly met. If the user’s expectation of the system’s behavior is incorrect, then all the technological marvels in the world making sure that behavior is faithfully executed will not help their actual security at all. Without knowing the issuer though, it’s not clear to me what the user’s expectation is supposed to be about the lock icon.

The security authority system suffers from being a market for silver bullets. Secure websites are effectively resellers of the security offered to them by their certificate issuers; however, the customers are practically unable to even see the trade mark - the issuer name - of the certificate authority ultimately responsible for the integrity and confidentiality of their communications, so they have no information at all. The website itself also has next to no information because the certificate authority themselves are under no regulatory obligation to disclose or verify their security practices.

Without seeing the issuer, there’s no way for “issuer reputation” to be a selling point, which means there’s no market motivation for issuers to do a really good job securing their infrastructure. There’s no way for average users to notice if they are the victims of a targetted surveillance attack.

So please, browser vendors, consider making this information available to the general public so we can all start making informed decisions about who to trust.

by Glyph at February 13, 2015 08:52 PM

January 30, 2015

Twisted Matrix Laboratories

Twisted 15.0.0 Released

Hi everyone!

On behalf of Twisted Matrix Laboratories, I am honoured to announce that Twisted 15.0.0 is here!

Highlights are:
  • SSLv3 is disabled by default by endpoints created by twisted.internet.endpoints.serverFromString and twisted.internet.endpoints.clientFromString.
  • inlineCallbacks now has introductory documentation, and now supports using the return statement with a value on Python 3.
  • twisted.web.client.Agent now supports using UNIX sockets.
  • ProcessEndpoint now has flow control, which makes it useful for many more protocols
  • A whole bunch of bug fixes and other improvements, with 70+ closed tickets.
You can find the downloads on PyPI (or alternatively the Twisted Matrix website). The full details of what’s new in the release is contained in the NEWS file.

As usual, many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code and documentation, and all the people building great things with Twisted!

Twisted Regards,
HawkOwl

by HawkOwl (noreply@blogger.com) at January 30, 2015 10:13 AM

January 25, 2015

Glyph Lefkowitz

Security as Stencil

Image Credit: Horia Varlan

On the Internet, it’s important to secure all of your communications.

There are a number of applications which purport to give you “secure chat”, “secure email”, or “secure phone calls”.

The problem with these applications is that they advertise their presence. Since “insecure chat”, “insecure email” and “insecure phone calls” all have a particular, detectable signature, an interested observer may easily detect your supposedly “secure” communication. Not only that, but the places that you go to obtain them are suspicious in their own right. In order to visit Whisper Systems, you have to be looking for “secure” communications.

This allows the adversary to use “security” technologies such as encryption as a sort of stencil, to outline and highlight the communication that they really want to be capturing. In the case of the NSA, this dumps anyone who would like to have a serious private conversation with a friend into the same bucket, from the perspective of the authorities, as a conspiracy of psychopaths trying to commit mass murder.

The Snowden documents already demonstrate that the NSA does exactly this; if you send a normal email, they will probably lose interest and ignore it after a little while, whereas if you send a “secure” email, they will store it forever and keep trying to crack it to see what you’re hiding.

If you’re running a supposedly innocuous online service or writing a supposedly harmless application, the hassle associated with setting up TLS certificates and encryption keys may seem like a pointless distraction. It isn’t.

For one thing, if you have anywhere that user-created content enters your service, you don’t know what they are going to be using it to communicate. Maybe you’re just writing an online game but users will use your game for something as personal as courtship. Can we agree that the state security services shouldn’t be involved in that?. Even if you were specifically writing an app for dating, you might not anticipate that the police will use it to show up and arrest your users so that they will be savagely beaten in jail.

The technology problems that “secure” services are working on are all important. But we can’t simply develop a good “secure” technology, consider it a niche product, and leave it at that. Those of us who are software development professionals need to build security into every product, because users expect it. Users expect it because we are, in a million implicit ways, telling them that they have it. If we put a “share with your friend!” button into a user interface, that’s a claim: we’re claiming that the data the user indicates is being shared only with their friend. Would we want to put in a button that says “share with your friend, and with us, and with the state security apparatus, and with any criminal who can break in and steal our database!”? Obviously not. So let’s stop making the “share with your friend!” button actually do that.

Those of us who understand the importance of security and are in the business of creating secure software must, therefore, take on the Sisyphean task of not only creating good security, but of competing with the insecure software on its own turf, so that people actually use it. “Slightly worse to use than your regular email program, but secure” is not good enough. (Not to mention the fact that existing security solutions are more than “slightly” worse to use). Secure stuff has to be as good as or better than its insecure competitors.

I know that this is a monumental undertaking. I have personally tried and failed to do something like this more than once. As the Rabbi Tarfon put it, though:

It is not incumbent upon you to complete the work, but neither are you at liberty to desist from it.

by Glyph at January 25, 2015 12:16 AM

January 12, 2015

Glyph Lefkowitz

The Glyph

As you may have seen me around the Internet, I am typically represented by an obscure symbol.

The “Glyph” Glyph

I have been asked literally hundreds of times about the meaning of that symbol, and I’ve always been cryptic in response because I felt that a full explanation is too much work. Since the symbol is something I invented, the invention is fairly intricate, and it takes some time to explain, describing it in person requires a degree of sustained narcissism that I’m not really comfortable with.

You all keep asking, though, and I really do appreciate the interest, so thanks to those of you who have asked over and over again: here it is. This is what the glyph means.

Ulterior Motive

I do have one other reason that I’m choosing to publish this particular tidbit now. Over the course of my life I have spent a lot of time imagining things, doing world-building for games that I have yet to make or books that I have yet to write. While I have published fairly voluminously at this point on technical topics (more than once on actual paper), as well as spoken about them at conferences, I haven’t made many of my fictional ideas public.

There are a variety of reasons for this (not the least of which that I have been gainfully employed to write about technology and nobody has ever wanted to do that for fiction) but I think the root cause is because I’m afraid that these ideas will be poorly received. I’m afraid that I’ll be judged according to the standards for the things that I’m now an expert professional at – software development – for something that I am a rank amateur at – writing fiction. So this problem is only going to get worse as I get better at the former and keep not getting practice at the latter by not publishing.

In other words, I’m trying to break free of my little hater.

So this represents the first – that I recall, at least – public sharing of any of the Divunal source material, since the Twisted Reality Demo Server was online 16 years ago. It’s definitely incomplete. Some of it will probably be bad; I know. I ask for your forbearance, and with it, hopefully I will publish more of it and thereby get better at it.

Backstory

I have been working on the same video game, off and on, for more or less my entire life. I am an extremely distractable person, so it hasn’t seen that much progress - at least not directly - in the last decade or so. I’m also relentlessly, almost pathologically committed to long-term execution of every dumb idea I’ve ever had, so any minute now I’m going to finish up with this event-driven networking thing and get back to the game. I’ll try to avoid spoilers, in case I’m lucky enough for any of you ever actually play this thing.

The symbol comes from early iterations of that game, right about the time that it was making the transition from Zork fan-fiction to something more original.

Literally translated from the in-game language, the symbol is simply an ideogram that means “person”, but its structure is considerably more nuanced than that simple description implies.

The world where Divunal takes place, Divuthan, was populated by a civilization that has had digital computers for tens of thousands of years, so their population had effectively co-evolved with automatic computing. They no longer had a concept of static, written language on anything like paper or books. Ubiquitous availability of programmable smart matter meant that the language itself was three dimensional and interactive. Almost any nuance of meaning which we would use body language or tone of voice to convey could be expressed in varying how letters were proportioned relative to each other, what angle they were presented at, and so on.

Literally every Divuthan person’s name is some variation of this ideogram.

So a static ideogram like the one I use would ambiguously reference a person, but additional information would be conveyed by diacritical marks consisting of other words, by the relative proportions of sizes, colors, and adornments of various parts of the symbol, indicating which person it was referencing.

However, the game itself is of the post-apocalyptic variety, albeit one of the more hopeful entries in that genre, since restoring life to the world is one of the player’s goals. One of the things that leads to the player’s entrance into the world is a catastrophe that has mysteriously caused most of the inhabitants to disappear and disabled or destroyed almost all of their technology.

Within the context of the culture that created the “glyph” symbol in the game world, it wasn’t really intended to be displayed in the form that you see it. The player first would first see such a symbol after entering a ruined, uninhabited residential structure. A symbol like this, referring to a person, would typically have adornments and modifications indicating a specific person, and it would generally be animated in some way.

The display technology used by the Divuthan civilization was all retained-mode, because I imagined that a highly advanced display technology would minimize power cost when not in use (much like e-paper avoids bleeding power by constantly updating the screen). When functioning normally, this was an irrelevant technical detail, of course; the displays displayed what you want them to display. But after a catastrophe that has disrupted network connectivity and ruined a lot of computers, this detail is important because many of the displays were still showing static snapshots of a language intended to use motion and interactivity as ways to convey information.

As the player wandered through the environment, they would find some systems that were still active, and my intent was (or “is”, I suppose, since I do still hold out hope that I’ll eventually actually make some version of this...) that the player would come to see the static, dysfunctional environment around them as melancholy, and set about restoring function to as many of these devices as possible in order to bring the environment back to life. Some of this would be represented quite concretely as time-travel puzzles later in the game actually allowed the players to mitigate aspects of the catastrophe that broke everything in the first place, thereby “resurrecting” NPCs by preventing their disappearance or death in the first place.

Coen

COEN

Coen refers to the self, the physical body, the notion of “personhood” abstractly. The minified / independent version is an ideogram for just the head, but the full version as it is presented in the “glyph” ideogram is a human body: the crook at the top is the head (facing right); the line through the middle represents the arms, and the line going down represents the legs and feet.

This is the least ambiguous and nuanced of all the symbols. The one nuance is that if used in its full form with no accompanying ideograms, it means “corpse”, since a body which can’t do anything isn’t really a person any more.

Kset

KSET

This is the trickiest ideogram to pronounce. The “ks” is meant to be voiced as a “click-hiss” noise, the “e” has a flat tone like a square wave from a synthesizer, and the “t” is very clipped. It is intended to reference the power-on sound that some of the earliest (remember: 10s of thousands of years before the main story, so it’s not like individuals have a memory of the way these things sounded) digital computers in Divuthan society made.

Honestly though if you try to do this properly it ends up sounding a lot like the English word “cassette”, which I assure you is fitting but completely unintentional.

Kset refers to algorithms and computer programs, but more generally, thought and the life of the mind.

This is a reference to the “Ee” spell power rune in the 80s Amiga game, Dungeon Master, which sadly I can’t find any online explanation of how the manual described it. It is an object poised on a sharp edge, ready to roll either left or right - in other words, a symbolic representation of a physical representation of the algorithmic concept of a decision point, or the computational concept of a branch, or a jump instruction.

Edec

EDEC

Edec refers to connectedness. It is an ideogram reflecting a social graph, with the individual below and their many connections above them. It’s the general term for “social relationship” but it’s also the general term for “network protocol”. When Divuthan kids form relationships, they often begin by configuring a specific protocol for their communication.

This is how boundary-setting within friendships and work environments (and, incidentally, flirting) works; they use meta-protocol messages to request expanded or specialized interactions for use within the context of their dedicated social-communication channels.

Unlike most of these other ideograms, its pronunciation is not etymologically derived from an onomatopoeia, but rather from an acronym identifying one of the first social-communication protocols (long since obsoleted).

Zenk

ZENK

“Zenk” is the ideogram for creation. It implies physical, concrete creations but denotes all types of creation, including intellectual products.

The ideogram represents the Divuthan version of an anvil, which, due to certain quirks of Divuthan materials science that is beyond the scope of this post, doubles for the generic idea of a “work surface”. So you could also think of it as a desk with two curved legs. This is the only ideogram which represents something still physically present in modern, pre-catastrophe Divuthan society. In fact, workshop surfaces are often stylized to look like a Zenk radical, as are work-oriented computer terminals (which are basically an iPad-like device the size of a dinner table).

The pronunciation, “Zenk”, is an onomatopoeia, most closely resembled in English by “clank”; the sound of a hammer striking an anvil.

Lesh

LESH

“Lesh” is the ideogram for communication. It refers to all kinds of communication - written words, telephony, video - but it implies persistence.

The bottom line represents a sheet of paper (or a mark on that sheet of paper), and the diagonal line represents an ink brush making a mark on that paper.

This predates the current co-evolutionary technological environment, because appropriately for a society featured in a text-based adventure game, the dominant cultural groups within this civilization developed a shared obsession for written communication and symbolic manipulation before they had access to devices which could digitally represent all of it.

All Together Now

There is an overarching philosophical concept of “person-ness” that this glyph embodies in Divuthan culture: although individuals vary, the things that make up a person are being (the body, coen), thinking (the mind, kset), belonging (the network, edec), making (tools, zenk) and communicating (paper and pen, lesh).

In summary, if a Divuthan were to see my little unadorned avatar icon next to something I have posted on twitter, or my blog, the overall impression that it would elicit would be something along the lines of:

“I’m just this guy, you know?”

And To Answer Your Second Question

No, I don’t know how it’s pronounced. It’s been 18 years or so and I’m still working that bit out.

by Glyph at January 12, 2015 09:00 AM

December 28, 2014

Duncan McGreggor

Improved Python Support in Erlang/LFE

The previous post on Python support in Erlang/LFE made Hacker News this week, climbing in fits and starts to #19 on the front page. That resulted in the biggest spike this blog has seen in several months.

It's a shame, in a way, since it came a few days too early: there's a new library out for the Erlang VM (written in LFE) which makes it much easier to use Python from Erlang (the language from Sweden that's famous for impressing both your mum and your cats).

The library is simply called py. It's a wrapper for ErlPort, providing improved usability for Python-specific code as well as an Erlang process supervision tree for the ErlPort Python server. It has an extensive README that not only does the usual examples with LFE, but gives a full accounting of usage in the more common Prolog-inspired syntax Erlang. The LFE Blog has a new post with code examples as well as a demonstration of the py supervision tree (e.g., killing Python server processes and having them restart automatically) which hasn't actually made it into the README yet -- so get it while it's hot!

The most exciting bits are yet to come: there are open tickets for:
  • work on multiple Python server processes
  • scheduling code execution to these, and
  • full Python distribution infrastructure with parallel execution.
This could drastically change the picture for compute-intensive tasks in Erlang, Elixir, LFE, and Joxa. The Erlang VM was never intended to excel at the sort of problems that Python has traditionally focused on... yet it provides the sort of infrastructure that the Python community has been agonizing over for more than a decade. For Pythonistas, this may not be a very big deal ... but for the Erlang and functional programming communities, the LFE py project could be a life-saver for any number of projects which need easy-access to the strengths of Python.


by Duncan McGreggor (noreply@blogger.com) at December 28, 2014 01:09 AM

December 26, 2014

Duncan McGreggor

ErlPort: Using Python from Erlang/LFE

Update 1: This post has a sequel here.

Update 2: There is a new LFE library that provides more idiomatic access to Python from LFE/Erlang by wrapping ErlPort and creating convenience functions. Lisp macros were, of course, involved in its making.

This is a short little blog post I've been wanting to get out there ever since I ran across the erlport project a few years ago. Erlang was built for fault-tolerance. It had a goal of unprecedented uptimes, and these have been achieved. It powers 40% of our world's telecommunications traffic. It's capable of supporting amazing levels of concurrency (remember the 2007 announcement about the performance of YAWS vs. Apache?).

With this knowledge in mind, a common mistake by folks new to Erlang is to think these performance characteristics will be applicable to their own particular domain. This has often resulted in failure, disappointment, and the unjust blaming of Erlang. If you want to process huge files, do lots of string manipulation, or crunch tons of numbers, Erlang's not your bag, baby. Try Python or Julia.

But then, you may be thinking: I like supervision trees. I have long-running processes that I want to be managed per the rules I establish. I want to run lots of jobs in parallel on my 64-core box. I want to run jobs in parallel over the network on 64 of my 64-core boxes. Python's the right tool for the jobs, but I wish I could manage them with Erlang.

(There are sooo many other options for the use cases above, many of them really excellent. But this post is about Erlang/LFE :-)).

Traditionally, if you want to run other languages with Erlang in a reliable way that doesn't bring your Erlang nodes down with badly behaved code, you use Ports. (more info is available in the Interoperability Guide). This is what JInterface builds upon (and, incidentally, allows for some pretty cool integration with Clojure). However, this still leaves a pretty significant burden for the Python or Ruby developer for any serious application needs (quick one-offs that only use one or two data types are not that big a deal).

erlport was created by Dmitry Vasiliev in 2009 in an effort to solve just this problem, making it easier to use of and integrate between Erlang and more common languages like Python and Ruby. The project is maintained, and in fact has just received a few updates. Below, we'll demonstrate some usage in LFE with Python 3.

If you want to follow along, there's a demo repo you can check out:
Change into the repo directory and set up your Python environment:
Next, switch over to the LFE directory, and fire up a REPL:
Note that this will first download the necessary dependencies and compile them (that's what the [snip] is eliding).

Now we're ready to take erlport for a quick trip down to the local:
And that's all there is to it :-)

Perhaps in a future post we can dive into the internals, showing you more of the glory that is erlport. Even better, we could look at more compelling example usage, approaching some of the functionality offered by such projects as Disco or Anaconda.


by Duncan McGreggor (noreply@blogger.com) at December 26, 2014 06:06 AM

December 22, 2014

Jp Calderone

Asynchronous Object Initialization - Patterns and Antipatterns

I caught Toshio Kuratomi's post about asyncio initialization patterns (or anti-patterns) on Planet Python. This is something I've dealt with a lot over the years using Twisted (one of the sources of inspiration for the asyncio developers).

To recap, Toshio wondered about a pattern involving asynchronous initialization of an instance. He wondered whether it was a good idea to start this work in __init__ and then explicitly wait for it in other methods of the class before performing the distinctive operations required by those other methods. Using asyncio (and using Toshio's example with some omissions for simplicity) this looks something like:


class Microblog:
def __init__(self, ...):
loop = asyncio.get_event_loop()
self.init_future = loop.run_in_executor(None, self._reading_init)

def _reading_init(self):
# ... do some initialization work,
# presumably expensive or otherwise long-running ...

@asyncio.coroutine
def sync_latest(self):
# Don't do anything until initialization is done
yield from self.init_future
# ... do some work that depends on that initialization ...

It's quite possible to do something similar to this when using Twisted. It only looks a little bit difference:


class Microblog:
def __init__(self, ...):
self.init_deferred = deferToThread(self._reading_init)

def _reading_init(self):
# ... do some initialization work,
# presumably expensive or otherwise long-running ...

@inlineCallbacks
def sync_latest(self):
# Don't do anything until initialization is done
yield self.init_deferred
# ... do some work that depends on that initialization ...

Despite the differing names, these two pieces of code basical do the same thing:

  • run _reading_init in a thread from a thread pool
  • whenever sync_latest is called, first suspend its execution until the thread running _reading_init has finished running it

Maintenance costs

One thing this pattern gives you is an incompletely initialized object. If you write m = Microblog() then m refers to an object that's not actually ready to perform all of the operations it supposedly can perform. It's either up to the implementation or the caller to make sure to wait until it is ready. Toshio suggests that each method should do this implicitly (by starting with yield self.init_deferred or the equivalent). This is definitely better than forcing each call-site of a Microblog method to explicitly wait for this event before actually calling the method.

Still, this is a maintenance burden that's going to get old quickly. If you want full test coverage, it means you now need twice as many unit tests (one for the case where method is called before initialization is complete and another for the case where the method is called after this has happened). At least. Toshio's _reading_init method actually modifies attributes of self which means there are potentially many more than just two possible cases. Even if you're not particularly interested in having full automated test coverage (... for some reason ...), you still have to remember to add this yield statement to the beginning of all of Microblog's methods. It's not exactly a ton of work but it's one more thing to remember any time you maintain this code. And this is the kind of mistake where making a mistake creates a race condition that you might not immediately notice - which means you may ship the broken code to clients and you get to discover the problem when they start complaining about it.

Diminished flexibility

Another thing this pattern gives you is an object that does things as soon as you create it. Have you ever had a class with a __init__ method that raised an exception as a result of a failing interaction with some other part of the system? Perhaps it did file I/O and got a permission denied error or perhaps it was a socket doing blocking I/O on a network that was clogged and unresponsive. Among other problems, these cases are often difficult to report well because you don't have an object to blame the problem on yet. The asynchronous version is perhaps even worse since a failure in this asynchronous initialization doesn't actually prevent you from getting the instance - it's just another way you can end up with an incompletely initialized object (this time, one that is never going to be completely initialized and use of which is unsafe in difficult to reason-about ways).

Another related problem is that it removes one of your options for controlling the behavior of instances of that class. It's great to be able to control everything a class does just by the values passed in to __init__ but most programmers have probably come across a case where behavior is controlled via an attribute instead. If __init__ starts an operation then instantiating code doesn't have a chance to change the values of any attributes first (except, perhaps, by resorting to setting them on the class - which has global consequences and is generally icky).

Loss of control

A third consequence of this pattern is that instances of classes which employ it are inevitably doing something. It may be that you don't always want the instance to do something. It's certainly fine for a Microblog instance to create a SQLite3 database and initialize a cache directory if the program I'm writing which uses it is actually intent on hosting a blog. It's most likely the case that other useful things can be done with a Microblog instance, though. Toshio's own example includes a post method which doesn't use the SQLite3 database or the cache directory. His code correctly doesn't wait for init_future at the beginning of his post method - but this should leave the reader wondering why we need to create a SQLite3 database if all we want to do is post new entries.

Using this pattern, the SQLite3 database is always created - whether we want to use it or not. There are other reasons you might want a Microblog instance that hasn't initialized a bunch of on-disk state too - one of the most common is unit testing (yes, I said "unit testing" twice in one post!). A very convenient thing for a lot of unit tests, both of Microblog itself and of code that uses Microblog, is to compare instances of the class. How do you know you got a Microblog instance that is configured to use the right cache directory or database type? You most likely want to make some comparisons against it. The ideal way to do this is to be able to instantiate a Microblog instance in your test suite and uses its == implementation to compare it against an object given back by some API you've implemented. If creating a Microblog instance always goes off and creates a SQLite3 database then at the very least your test suite is going to be doing a lot of unnecessary work (making it slow) and at worst perhaps the two instances will fight with each other over the same SQLite3 database file (which they must share since they're meant to be instances representing the same state). Another way to look at this is that inextricably embedding the database connection logic into your __init__ method has taken control away from the user. Perhaps they have their own database connection setup logic. Perhaps they want to re-use connections or pass in a fake for testing. Saving a reference to that object on the instance for later use is a separate operation from creating the connection itself. They shouldn't be bound together in __init__ where you have to take them both or give up on using Microblog.

Alternatives

You might notice that these three observations I've made all sound a bit negative. You might conclude that I think this is an antipattern to be avoided. If so, feel free to give yourself a pat on the back at this point.

But if this is an antipattern, is there a pattern to use instead? I think so. I'll try to explain it.

The general idea behind the pattern I'm going to suggest comes in two parts. The first part is that your object should primarily be about representing state and your __init__ method should be about accepting that state from the outside world and storing it away on the instance being initialized for later use. It should always represent complete, internally consistent state - not partial state as asynchronous initialization implies. This means your __init__ methods should mostly look like this:


class Microblog(object):
def __init__(self, cache_dir, database_connection):
self.cache_dir = cache_dir
self.database_connection = database_connection

If you think that looks boring - yes, it does. Boring is a good thing here. Anything exciting your __init__ method does is probably going to be the cause of someone's bad day sooner or later. If you think it looks tedious - yes, it does. Consider using Hynek Schlawack's excellent characteristic package (full disclosure - I contributed some ideas to characteristic's design and Hynek ocassionally says nice things about me (I don't know if he means them, I just know he says them)).

The second part of the idea an acknowledgement that asynchronous initialization is a reality of programming with asynchronous tools. Fortunately __init__ isn't the only place to put code. Asynchronous factory functions are a great way to wrap up the asynchronous work sometimes necessary before an object can be fully and consistently initialized. Put another way:


class Microblog(object):
# ... __init__ as above ...

@classmethod
@asyncio.coroutine
def from_database(cls, cache_dir, database_path):
# ... or make it a free function, not a classmethod, if you prefer
loop = asyncio.get_event_loop()
database_connection = yield from loop.run_in_executor(None, cls._reading_init)
return cls(cache_dir, database_connection)

Notice that the setup work for a Microblog instance is still asynchronous but initialization of the Microblog instance is not. There is never a time when a Microblog instance is hanging around partially ready for action. There is setup work and then there is a complete, usable Microblog.

This addresses the three observations I made above:

  • Methods of Microblog never need to concern themselves with worries about whether the instance has been completely initialized yet or not.
  • Nothing happens in Microblog.__init__. If Microblog has some methods which depend on instance attributes, any of those attributes can be set after __init__ is done and before those other methods are called. If the from_database constructor proves insufficiently flexible, it's easy to introduce a new constructor that accounts for the new requirements (named constructors mean never having to overload __init__ for different competing purposes again).
  • It's easy to treat a Microblog instance as an inert lump of state. Simply instantiating one (using Microblog(...) has no side-effects. The special extra operations required if one wants the more convenient constructor are still available - but elsewhere, where they won't get in the way of unit tests and unplanned-for uses.

I hope these points have made a strong case for one of these approaches being an anti-pattern to avoid (in Twisted, in asyncio, or in any other asynchronous programming context) and for the other as being a useful pattern to provide both convenient, expressive constructors while at the same time making object initializers unsurprising and maximizing their usefulness.

by Jean-Paul Calderone (noreply@blogger.com) at December 22, 2014 12:54 AM

December 09, 2014

Glyph Lefkowitz

Docker Dev to Prod in Just A Few Easy Steps

It seems that docker is all the rage these days. Docker has popularized a powerful paradigm for repeatable, isolated deployments of pretty much any application you can run on Linux. There are numerous highly sophisticated orchestration systems which can leverage Docker to deploy applications at massive scale. At the other end of the spectrum, there are quick ways to get started with automated deployment or orchestrated multi-container development environments.

When you're just getting started, this dazzling array of tools can be as bewildering as it is impressive.

A big part of the promise of docker is that you can build your app in a standard format on any computer, anywhere, and then run it. As docker.com puts it:

“... run the same app, unchanged, on laptops, data center VMs, and any cloud ...”

So when I started approaching docker, my first thought was: before I mess around with any of this deployment automation stuff, how do I just get an arbitrary docker container that I've built and tested on my laptop shipped into the cloud?

There are a few documented options that I came across, but they all had drawbacks, and didn't really make the ideal tradeoff for just starting out:

  1. I could push my image up to the public registry and then pull it down. While this works for me on open source projects, it doesn't really generalize.
  2. I could run my own registry on a server, and push it there. I can either run it plain-text and risk the unfortunate security implications that implies, deal with the administrative hassle of running my own certificate authority and propagating trust out to my deployment node, or spend money on a real TLS certificate. Since I'm just starting out, I don't want to deal with any of these hassles right away.
  3. I could re-run the build on every host where I intend to run the application. This is easy and repeatable, but unfortunately it means that I'm missing part of that great promise of docker - I'm running potentially subtly different images in development, test, and production.

I think I have figured out a fourth option that is super fast to get started with, as well as being reasonably secure.

What I have done is:

  1. run a local registry
  2. build an image locally - testing it until it works as desired
  3. push the image to that registry
  4. use SSH port forwarding to "pull" that image onto a cloud server, from my laptop

Before running the registry, you should set aside a persistent location for the registry's storage. Since I'm using boot2docker, I stuck this in my home directory, like so:

1
me@laptop$ mkdir -p ~/Documents/Docker/Registry

To run the registry, you need to do this:

1
2
3
4
5
6
7
8
me@laptop$ docker pull registry
...
Status: Image is up to date for registry:latest
me@laptop$ docker run --name registry --rm=true -p 5000:5000 \
    -e GUNICORN_OPTS=[--preload] \
    -e STORAGE_PATH=/registry \
    -v "$HOME/Documents/Docker/Registry:/registry" \
    registry

To briefly explain each of these arguments - --name is just there so I can quickly identify this as my registry container in docker ps and the like; --rm=true is there so that I don't create detritus from subsequent runs of this container, -p 5000:5000 exposes the registry to the docker host, -e GUNICORN_OPTS=[--preload] is a workaround for a small bug, STORAGE_PATH=/registry tells the registry to look in /registry for its images, and the -v option points /registry at the directory we previously created above.

It's important to understand that this registry container only needs to be running for the duration of the commands below. Spin it up, push and pull your images, and then you can just shut it down.

Next, you want to build your image, tagging it with localhost.localdomain.

1
2
me@laptop$ cd MyDockerApp
me@laptop$ docker build -t localhost.localdomain:5000/mydockerapp .

Assuming the image builds without incident, the next step is to send the image to your registry.

1
me@laptop$ docker push localhost.localdomain:5000/mydockerapp

Once that has completed, it's time to "pull" the image on your cloud machine, which - again, if you're using boot2docker, like me, can be done like so:

1
2
3
me@laptop$ ssh -t -R 127.0.0.1:5000:"$(boot2docker ip 2>/dev/null)":5000 \
    mycloudserver.example.com \
    'docker pull localhost.localdomain:5000/mydockerapp'

If you're on Linux and simply running Docker on a local host, then you don't need the "boot2docker" command:

1
2
3
me@laptop$ ssh -t -R 127.0.0.1:5000:127.0.0.1:5000 \
    mycloudserver.example.com \
    'docker pull localhost.localdomain:5000/mydockerapp'

Finally, you can now run this image on your cloud server. You will of course need to decide on appropriate configuration options for your applications such as -p, -v, and -e:

1
2
3
4
me@laptop$ ssh mycloudserver.example.com \
    'docker run -d --restart=always --name=mydockerapp \
        -p ... -v ... -e ... \
        localhost.localdomain:5000/mydockerapp'

To avoid network round trips, you can even run the previous two steps as a single command:

1
2
3
4
5
6
me@laptop$ ssh -t -R 127.0.0.1:5000:"$(boot2docker ip 2>/dev/null)":5000 \
    mycloudserver.example.com \
    'docker pull localhost.localdomain:5000/mydockerapp && \
     docker run -d --restart=always --name=mydockerapp \
        -p ... -v ... -e ... \
        localhost.localdomain:5000/mydockerapp'

I would not recommend setting up any intense production workloads this way; those orchestration tools I mentioned at the beginning of this article exist for a reason, and if you need to manage a cluster of servers you should probably take the time to learn how to set up and manage one of them.

However, as far as I know, there's also nothing wrong with putting your application into production this way. If you have a simple single-container application, then this is a reasonably robust way to run it: the docker daemon will take care of restarting it if your machine crashes, and running this command again (with a docker rm -f mydockerapp before docker run) will re-deploy it in a clean, reproducible way.

So if you're getting started exploring docker and you're not sure how to get a couple of apps up and running just to give it a spin, hopefully this can set you on your way quickly!

(Many thanks to my employer, Rackspace, for sponsoring the time for me to write this post. Thanks also to Jean-Paul Calderone, Alex Gaynor, and Julian Berman for their thoughtful review. Any errors are surely my own.)

by Glyph at December 09, 2014 02:14 AM

November 28, 2014

Glyph Lefkowitz

Public or Private?

If I am creating a new feature in library code, I have two choices with the implementation details: I can make them public - that is, exposed to application code - or I can make them private - that is, for use only within the library.

https://www.flickr.com/photos/skyrim/6518329775/

If I make them public, then the structure of my library is very clear to its clients. Testing is easy, because the public structures may be manipulated and replaced as the tests dictate. Inspection is easy, because all the data is exposed for clients to manipulate. Developers are happy when they can manipulate and test things easily. If I select "public" as the general rule, then developers using my library will be happy, because they'll be able to inspect and test everything quite easily whether I specifically designed in support for that or not.

However, now that they're public, I have to support them in their current form into the forseeable future. Since I tend to maintain the libraries I work on, and maintenance necessarily means change, a large public API surface means a lot of ongoing changes to exposed functionality, which means a constant stream of deprecation warnings and deprecated feature removals. Without private implementation details, there's no axis on which I can change my software without deprecating older versions. Developers hate keeping up with deprecation warnings or having their applications break when a new version of a library comes out, so if I adopt "public" as the general rule, developers will be unhappy.

https://www.flickr.com/photos/lisasaunders18/14061397865

If I make them private, then the structure of my library is a lot easier to understand by developers, because the API surface is much smaller, and exposes only the minimum necessary to accomplish their tasks. Because the implementation details are private, when I maintain the library, I can add functionality "for free" and make internal changes without requiring any additional work from developers. Developers like it when you don't waste their time with trivia and make it easier to get to what they want right away, and they love getting stuff "for free", so if I adopt "private" as the general rule, developers will be happy.

However, now that they're private, there's potentially no way to access that functionality for unforseen use-cases, and testing and inspection may be difficult unless the functionality in question was designed with an above-average level of care. Since most functionality is, by definition, designed with an average level of care, that means that there will inevitably be gaps in these meta-level tools until the functionality has already been in use for a while, which means that developers will need to report bugs and wait for new releases. Developers don't like waiting for the next release cycle to get access to functionality that they need to get work done right now, so if I adopt "private" as the general rule, developers will be unhappy.

Hmm.

by Glyph at November 28, 2014 11:25 PM

Duncan McGreggor

Scientific Computing with Hy and IPython

This blog post is a bit different than other technical posts I've done in the past in that the majority of the content is not on the blog in or gists; instead, it is in an IPython notebook. Having adored Mathematica back in the 90s, you can imagine how much I love the IPython Notebook app. I'll have more to say on that at a future date.

I've been doing a great deal of NumPy and matplotlib again lately, every day for hours a day. In conjunction with the new features in Python 3, this has been quite a lot of fun -- the most fun I've had with Python in years (thanks Guido, et al!). As you might have guessed, I'm also using it with Erlang (specifically, LFE), but that too is for a post yet to come.

With all this matplotlib and numpy work in standard Python, I've been going through Lisp withdrawals and needed to work with it from a fresh perspective. Needless to say, I had an enormous amount of fun doing this. Naturally, I decided to share with folks how one can do the latest and greatest with the tools of Python scientific computing, but in the syntax of the Python community's best kept secret: Clojure-Flavoured Python (Github, Twitter, Wikipedia).

Spoiler: observed data and
polynomial curve fitting
Looking about for ideas, I decided to see what Clojure's Incanter project had for tutorials, and immediately found what I was looking for: Linear regression with higher-order terms, a 2009 post by David Edgar Liebke.

Nearly every cell in the tutorial notebook is in Hy, and for that we owe a huge thanks to yardsale8 for his Hy IPython magics code. For those that love Python and Lisp equally, who are familiar with the ecosystems' tools, Hy offers a wonderful option for being highly productive with a language supporting Lisp- and Clojure-style macros. You can get your work done, have a great time doing it, and let that inner code artist out!

(In fact, I've started writing a macro for one of the examples in the tutorial, offering a more Lisp-like syntax for creating class methods. We'll see what Paul Tagliamonte has to say about it when it's done ... !)

If you want to check out the notebook code and run it locally, just do the following:

This will do the following:

  • Create a virtualenv using Python 3
  • Download all the dependencies, and then
  • Start up the notebook using a local IPython HTTP server

If you just want to read along, you're more than welcome to do that as well, thanks to the IPython NBViewer service. Here's the link: Scientific Computing with Hy: Linear Regressions.

One thing I couldn't get working was the community-provided code for generating tables of contents in IPython notebooks. If you have any expertise in this area, I'd love to get your feedback to see how I need to configure the custom ihy IPython profile for this tutorial.

Without that, I've opted for the manual approach and have provided a table of contents here:
If all goes well, you will enjoy that as much as I did :-)

More soon ...


by Duncan McGreggor (noreply@blogger.com) at November 28, 2014 08:11 PM

November 21, 2014

Duncan McGreggor

The Secret History of Lambda

Being a bit of an origins nut (I always want to know how something came to be or why it is a certain way), one of the things that always bothered me with regard to Lisp was that no one seemed to talking about the origin of lambda in the lambda calculus. I suppose if I wasn't lazy, I'd have gone to a library and spent some time looking it up. But since I was lazy, I used Wikipedia. Sadly, I never got what I wanted: no history of lambda. [1] Well, certainly some information about the history of the lambda calculus, but not the actual character or term in that context.

Why lambda? Why not gamma or delta? Or Siddham ṇḍha?

To my great relief, this question was finally answered when I was reading one of the best Lisp books I've ever read: Peter Norvig's Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp. I'll save my discussion of that book for later; right now I'm going to focus on the paragraph at location 821 of my Kindle edition of the book. [2]

The story goes something like this:
  • Between 1910 and 1913, Alfred Whitehead and Bertrand Russell published three volumes of their Principia Mathematica, a work whose purpose was to derive all of mathematics from basic principles in logic. In these tomes, they cover two types of functions: the familiar descriptive functions (defined using relations), and then propositional functions. [3]
  • Within the context of propositional functions, the authors make a typographical distinction between free variables and bound variables or functions that have an actual name: bound variables use circumflex notation, e.g. x̂(x+x). [4]
  • Around 1928, Church (and then later, with his grad students Stephen Kleene and J. B. Rosser) started attempting to improve upon Russell and Whitehead regarding a foundation for logic. [5]
  • Reportedly, Church stated that the use of  in the Principia was for class abstractions, and he needed to distinguish that from function abstractions, so he used x [6] or ^x [7] for the latter.
  • However, these proved to be awkward for different reasons, and an uppercase lambda was used: Λx. [8].
  • More awkwardness followed, as this was too easily confused with other symbols (perhaps uppercase delta? logical and?). Therefore, he substituted the lowercase λ. [9]
  • John McCarthy was a student of Alonzo Church and, as such, had inherited Church's notation for functions. When McCarthy invented Lisp in the late 1950s, he used the lambda notation for creating functions, though unlike Church, he spelled it out. [10] 
It seems that our beloved lambda [11], then, is an accident in typography more than anything else.

Somehow, this endears lambda to me even more ;-)



[1] As you can see from the rest of the footnotes, I've done some research since then and have found other references to this history of the lambda notation.

[2] Norvig, Peter (1991-10-15). Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp (Kindle Locations 821-829). Elsevier Science - A. Kindle Edition. The paragraph in question is quoted here:
The name lambda comes from the mathematician Alonzo Church’s notation for functions (Church 1941). Lisp usually prefers expressive names over terse Greek letters, but lambda is an exception. Abetter name would be make - function. Lambda derives from the notation in Russell and Whitehead’s Principia Mathematica, which used a caret over bound variables: x( x + x). Church wanted a one-dimensional string, so he moved the caret in front: ^ x( x + x). The caret looked funny with nothing below it, so Church switched to the closest thing, an uppercase lambda, Λx( x + x). The Λ was easily confused with other symbols, so eventually the lowercase lambda was substituted: λx( x + x). John McCarthy was a student of Church’s at Princeton, so when McCarthy invented Lisp in 1958, he adopted the lambda notation. There were no Greek letters on the keypunches of that era, so McCarthy used (lambda (x) (+ xx)), and it has survived to this day.
[3] http://plato.stanford.edu/entries/pm-notation/#4

[4] Norvig, 1991, Location 821.

[5] History of Lambda-calculus and Combinatory Logic, page 7.

[6] Ibid.

[7] Norvig, 1991, Location 821.

[8] Ibid.

[9] Looking at Church's works online, he uses lambda notation in his 1932 paper A Set of Postulates for the Foundation of Logic. His preceding papers upon which the seminal 1932 is based On the Law of Excluded Middle (1928) and Alternatives to Zermelo's Assumption (1927), make no reference to lambda notation. As such, A Set of Postulates for the Foundation of Logic seems to be his first paper that references lambda.

[10] Norvig indicates that this is simply due to the limitations of the keypunches in the 1950s that did not have keys for Greek letters.

[11] Alex Martelli is not a fan of lambda in the context of Python, and though a good friend of Peter Norvig, I've heard Alex refer to lambda as an abomination :-) So, perhaps not beloved for everyone. In fact, Peter Norvig himself wrote (see above) that a better name would have been make-function.


by Duncan McGreggor (noreply@blogger.com) at November 21, 2014 09:12 PM

Maths and Programming: Whence Recursion?

As a manager in the software engineering industry, one of the things that I see on a regular basis is a general lack of knowledge from less experienced developers (not always "younger"!) with regard to the foundations of computing and the several related fields of mathematics. There is often a great deal of focus on what the hottest new thing is, or how the industry can be changed, or how we can innovate on the decades of profound research that has been done. All noble goals.

Notably, another trend I've recognized is that in a large group of devs, there are often a committed few who really know their field and its history. That is always so amazing to me and I have a great deal of admiration for the commitment and passion they have for their art. Let's have more of that :-)

As for myself, these days I have many fewer hours a week which I can dedicate to programming compared to what I had 10 years ago. This is not surprising, given my career path. However, what it has meant is that I have to be much more focused when I do get those precious few hours a night (and sometimes just a few per week!). I've managed this in an ad hoc manner by taking quick notes about fields of study that pique my curiosity. Over time, these get filtered and a few pop to the top that I really want to give more time.

One of the driving forces of this filtering process is my never-ending curiosity: "Why is it that way?" "How did this come to be?" "What is the history behind that convention?" I tend to keep these musings to myself, exploring them at my leisure, finding some answers, and then moving on to the next question (usually this takes several weeks!).

However, given the observations of the recent years, I thought it might be constructive to ponder aloud, as it were. To explore in a more public forum, to set an example that the vulnerability of curiosity and "not knowing" is quite okay, that even those of us with lots of time in the industry are constantly learning, constantly asking.

My latest curiosity has been around recursion: who first came up with it? How did it make it's way from abstract maths to programming languages? How did it enter the consciousness of so many software engineers (especially those who are at ease in functional programming)? It turns out that an answer to this is actually quite closely related to a previous post I wrote on the secret history of lambda. A short version goes something like this:

Giuseppe Peano wanted to establish a firm foundation for logic and maths in general. As part of this, he ended up creating consistent axioms around the hard-to-define natural numbers, counting, and arithmetic operations (which utilized recursion).  While visiting a conference in Europe, Bertrand Russell was deeply impressed by the dialectic talent of Peano and his unfailing clarity; he queried Peano as to his secret for success (Peano told him) and them asked for all of his published works. Russell proceeded to study these quite deeply. With this in his background, he eventually co-wrote the Principia Mathematica. Later, Alonzo Church (along with his grad students) sought to improve upon this, and in the process Alonzo Church ended up developing the lambda calculus. His student, John McCarthy, later created the first functional programming language, Lisp, utilizing concepts from the lambda calculus (recursion and function composition).

In the course of reading between 40-50 mathematics papers (including various histories) over the last week, I have learned far more than I had originally intended. So much so, in fact, that I'm currently working on a very fun recursion tutorial that not only covers the usual practical stuff, but steps the reader through programming implementations of the Peano axioms, arithmetic definitions, the Ackermann function, and parts of the lambda calculus.

I've got a few more blog post ideas cooking that dive into functions, their history and evolution. We'll see how those pan out. Even more exciting, though, was having found interesting papers discussing the evolution of functions and the birth of category theory from algebraic topology. This, needless to say, spawned a whole new trail of research, papers, and books... and I've got some great ideas for future blog posts/tutorials around this topic as well. (I've encountered category theory before, but watching it appear unsearched and unbidden in the midst of the other reading was quite delightful).

In closing, I enjoy reading not only the original papers (and correspondence between great thinkers of a previous era), but also the meanderings and rediscoveries of my peers. I've run across blog posts like this in the past, and they were quite enchanting. I hope that we continue to foster that in our industry, and that we see more examples of it in the future.

Keep on questing ;-)


by Duncan McGreggor (noreply@blogger.com) at November 21, 2014 09:10 PM

October 22, 2014

Moshe Zadka

Clash of Clans: War strategy tips

[I wanted a central place to put my CoC Clan War strategy tips so I can refer people to them.]

Defense

  • A town hall outside the walls is great for farming, but your war base should have amazingly good protection of the town hall, otherwise they get, essentially, a free one star (together with the one star for 50% that people can get by just taking down the outermost buildings with an archer rush, this almost always means free two stars).
  • Air defenses should be well protected, or even a simple “take out the air defenses” together with a castle dragon can let even relatively low-level town hall take out pretty impressive bases.
  • Do not bother with walls around your town hall. If giants got to the point where they have nothing to do but to take out your town hall, you have bigger problems and otherwise, the archers will take out your town hall while leaving the wall intact. Instead, use walls to protect the giants’ favorite targets: wizards and mortars.
  • Avoid the “eggshell” design: one strong outer wall with no internal walls. Once the outer wall is breached, your base is effectively wall-free.

Offense

  • Meta tip #1: Remember to question your instincts if you raid for trophies or loot a lot. War has different incentive schemas and different pros and cons.
  • Meta tip #2: [Continuation of meta tip #1] You have more than 30 seconds to figure out your strategy. You can look at the enemy village, and even tap buildings to see ranges. Spend some times thinking about your strategy before sending in your first troop,
  • If you are not the first to attack, watch replays of all previous attacks. Make sure to memorize:
    • Location of traps
    • Location of hidden tesla towers
    • Composition of clan castle
  • Go for the three stars. A one-star attack is almost useless, and a two-star attack is worth only 66% of a three star attack. Only go for the one-star if nothing else seems realistic.
  • The clan castle almost certainly has troops in it. Probably a mix of archers and wizards.
    • Check the range of the clan castle ahead of time, and plan how to draw the troops out.
    • Giants and, to a lesser extent, hog riders, are extremely vulnerable to castle troops, because they do not fight back. Clear castle troops before sending them in, if possible.
    • Make sure to draw out troops and kill them with either a lightening spell or archers/wizards (you’ll probably need the equivalent of 20-30 housing spaces, in either archers or wizards, to demolish a 20-25 sized castle).
    • If you draw out troops, try drawing them out to an area that’s free of other defensive buildings.
    • Make sure you drew out all troops: some will sometimes stay in. Better safe than sorry.
  • If you are going to send in dragons or healers, it is probably a good idea to get rid of the air defenses using giants or hog riders first. Air defenses are extremely effective against flying troops. [This does not apply if you plan your attack taking into the air defenses into account and being OK with the flying troops having a limited life time.]
  • Walls should be dealt with, if possible, by wall breakers and not keeping the giants there hacking at the wall. First, this is an inefficient use of the giants’ hit points, and worse, this wastes a lot of time. Time is of the essence for getting all three stars.
  • Send troops (somewhat) en-masse. Sending the archers or wizards one-by-one just lets the defensive buildings take them out one by one. Even better, send mass amounts of archers or wizards hidden behind giants to let giants soak up the pain while the archers take out buildings.

by moshez at October 22, 2014 03:52 PM

October 13, 2014

Ashwini Oruganti

Open Source Retreat: Afterword

As I wrap up my time with Stripe’s Open Source Retreat, I have quite some things to be thankful for. (Also, I just always wanted to give a thank-you speech.) So here goes:

I am grateful to Glyph Lefkowitz, Ying Li, and Christopher Armstrong for their long-suffering patience and support. Thanks also to Allen Short, who coaxed work out of me when I badly needed to feel capable of it.

I would like to thank Jessica McKellar, for being generally and specifically brilliant, for teaching me how to be a meticulous manager, and for helping me find my things when I lost them in a scary building.

I am grateful to Stripe, for giving me a place to be. And the monitor. To all the people there who showed me where the good coffee is hidden: thank you! Thanks too to those who went climbing with me, pretended to not hear me when I said I wanted to stop, and patiently helped me improve.

Thanks to friends near and far – for helping me deal with dead things, for teaching me all about alligators and crocodiles, and for helping me pronounce words correctly.

I am grateful to the authors of the good books I steal phrases from.

And lastly, I am grateful to my family, for spoiling me always.

October 13, 2014 10:54 PM