2 months ago

In reviews I often call out implicit boolean conditions:

while True:
    data = fd.read(4096)
    if not data:

I usually wave my arms a bit, repeat "explicit is better than implicit", and maybe recall that one time this saved my bacon.

Now I have numbers to back me up.

Read on →
4 months ago

It's quicker to build a set and test for membership than it is to build a list and do the same test:

$ python3 -m timeit -c '2 in {1,2}'
10000000 loops, best of 3: 0.0259 usec per loop
$ python3 -m timeit -c '2 in [1,2]'
10000000 loops, best of 3: 0.0335 usec per loop

With the overhead of hashing and the more complex data structure I expected a set to be slower than a linear search of a short list, in this example at least, but it turns out that sets are faster for even just a couple of elements, and it only gets worse as the element count goes up.

The one time a list is faster is when the element being looked for is the very first:

$ python3 -m timeit -c '1 in [1,2]'
10000000 loops, best of 3: 0.0213 usec per loop

YMMV. For example, objects with custom __hash__ implementations may skew the results back in favour of using a list or tuple. Running within pypy may even eliminate the difference altogether.

7 months ago

Our mission was to port MAAS from Python 2.7 to Python 3.5.

This post has a rundown of how we prepared, how we did the port itself, and what we learned. It follows on from Porting MAAS to Python 3 which gives an overview of the port. They're both written from an engineering perspective, but this post contains a lot more technical detail.

What we did to prepare

We used a few features of Python 2.7 that are meant to help prepare for a port to Python 3, and we also devised one of our own. At the top of every module and script we:

  • imported unicode_literals, absolute_imports, and print_function from __future__,

  • selected new-style classes by default with __metaclass__ = type,

  • forbade the use of str with str = None.

The latter forced the use of bytes or unicode and so brought decisions about encoding and decoding to the fore. Sadly it couldn't prevent implicit coercion between these types.

With an irritating tenacity I would also recommend the use of dict.view{keys,values,items} in code reviews.

In Python 2.7 these methods exhibit the closest behaviour to Python 3's dict.{keys,values,items}. They're also converted cleanly by 2to3 whereas, for example, dict.keys() is converted to list(dict.keys()) and dict.iterkeys() is converted to iter(dict.keys()).

The intermediate lists that arise are somewhat wasteful and very often unnecessary, but it's hard for 2to3 to know this because the dict fixer doesn't consider the context (and doing so may be a task more suited to a human in any case). Using the dict.view* variants gives it a hint.

Unfortunately old habits die hard, and I ended up spending a lot of time manually reverting these kinds of changes from 2to3's patches.

The process

  1. A bug in 2to3 means that all __future__ imports needed to be reformated onto a single line (see reformat-future-imports-on-single-line.py):

    bzr ls --kind=file --recursive --versioned --null | \
      xargs -r0 python python3/reformat-future-imports-on-single-line.py
  2. The str = None lines also needed to be removed (see remove-str-equals-none-shim.py):

    bzr ls --kind=file --recursive --versioned --null | \
      xargs -r0 python python3/remove-str-equals-none-shim.py
  3. We converted MAAS's code directory by directory, but worked with patches instead of getting 2to3 to write directly:

    2to3 --nofix=callable src/${subcomponent} > \
  4. We reviewed patches to sanity check them, and to remove unnecessary conversions. Commit each patch again, then apply:

    patch -p0 < python3/fix-${subcomponent}.diff
  5. The __metaclass__ = type lines and all remaining shims were next to go (see remove-all-shims.py):

    bzr ls --kind=file --recursive --versioned --null | \
      xargs -r0 python python3/remove-all-shims.py
  6. We got the tests passing, committing as we went. Problematic tests were skipped like so:


We did this last step for tests that depended on code that had not yet ported. Instead of pushing that work onto the stack we would just skip the tests and move on. Later on we revisited these tests (which, marked distinctively, were easy to find) and got them all working.


On the usefulness of annotations

Python 3.5 has the typing module in the standard library, the use of which results in quite readable type annotations. This was more useful than I expected.

I was often trying to keep in mind many disparate parts of the code base and I found it was much more convenient to have type information in the function signature rather than in the docstring, or discernible only from reading the code or call-sites.

I started to pine for tooling to enforce those annotations. Duck-typing doesn't mean that anything goes: arguments still need to look and quack like the duck you're expecting.

ABCs and this new and related typing module make it possible to describe the ducks you're looking for. It seems a shame not to take full advantage of it.

I could not get mypy to install. From what I can tell, this is the big boss of type annotations in Python. It can statically analyze your program and discover typing mistakes. But I didn't have time to figure out what was wrong and learn how to use it. Another day.

However, I would settle for checking annotations at run-time if I could get it working quickly, so I put together the short typecheck module.

By decorating function and methods with @typecheck.typed and adding annotations I could make type-related issues shallower, by which I mean that the code would crash closer to where the problem originated. This made an immediate difference, especially when unravelling byte/Unicode string issues.

This approach is imperfect and simplistic, sure. There's none of that uncanny magic you get with, say, Haskell, where a program that merely compiles actually stands a good chance of doing what you meant, first time. But it is valuable all the same; it is another layer of defence.

Annotations and typecheck combined also replace the need for documenting the types of function arguments and returned values, and of that documentation being out-of-date, a state towards which documentation rapidly decays.

The Big One: Byte and Unicode strings

Almost all difficulties in this port were caused by Python 2's automatic coercion of byte strings into Unicode strings and vice-versa. That one language feature has a lot of sloppy code to answer for. It has also made it hard for even the most systematic developer to live free from the shadow of UnicodeError and its spawn.

It is cold-sweat-inducing to realise that the following code in Python 2 that works fine:

from urllib2 import urlopen
response = urlopen('http://example.com/')
data = json.load(response)

is actually complete bollocks because it disregards the encoding of the response (i.e. the charset in the Content-Type header).

Python 3 forces you to fix this, but the temptation is to do something like:

data = json.loads(response.read().decode("utf-8"))

which is a different class of bollocks because, although UTF-8 is common, it still disregards the encoding of the response. So, Python 3 gives us a big shove in the right direction but can't yet magically fix faulty reasoning.

We used unicode_literals in our code. In code reviews we would check for correct encoding and decoding. We forbade the use of str. These things helped, I am sure, but I expected far fewer surprises from our own code; you might even say I was shocked at how much coercion between bytes and unicode was going on once Python 3 was there to coax it out.

Fixing these issues was, at a guess, over half of the work required to port MAAS.

In retrospect I wish there had been a way to disable automatic coercion in Python 2 although I suspect it would have been unworkable in practice; that's Python 3's big feature after all. A more selective Unicode-only literals feature with a corresponding unicodeonly type (and a converse bytesonly type) that Python would never automatically coerce to a byte string might have been a workable way to improve the sorry string story in Python 2.

Sorting disparate types

Python 3 doesn't allow sorting of different types unless they explicitly support it. However, one important part of MAAS uses this.

MAAS's Web API publishes a description document; a blob of JSON that describes all the objects and calls available. The CLI client downloads this once and refers back to it when generating sub-commands and options. When the server's API is updated we need to detect that the client is working from an outdated description.

To do this, the server renders a canonical representation of the description document and calculates an SHA1 hash from it. This is included in the description that the client downloads, and the server also sends it in an X-MAAS-API-Hash header in every HTTP response. The client can compare the server's hash with the local hash; if they differ, the API has changed.

Rendering the canonical representation is where the problem lies. We want to ensure a consistent ordering, and we had relied on Python 2's built-in rules for a few types:

None < Numeric/Boolean < String < Tuple

We reproduced this by creating wrappers — KeyCanonicalNone, KeyCanonicalNumeric, KeyCanonicalString, and KeyCanonicalTuple — that sort correctly with respect to one another. A function, key_canonical, wraps disparate objects according to type, and can be used with sorted:

sorted(disparate_objects, key=key_canonical)

This solved our problem and we were back in business.

Things that 2to3 misses

I'm a Bad Person because these are bugs and I didn't capture enough context at the time to be able to report them, nor have I tried to reproduce them since:

  • string.letters is not automatically changed to string.ascii_letters.

  • Imports of __builtin__ are changed to builtins, but references to __builtin__ are not updated.

  • Imports of urllib2 are changed to urllib.*, but some references are missed.


  • Not all of twisted.conch has been ported. This means that we can no longer support the little-known introspect service in MAAS. It's a niche service for developer-driven debugging and it's not enabled by default, so we dropped it.

  • 2to3 converts things like isinstance(thing, (bytes, unicode)) to isinstance(thing, (bytes, str)), but it's likely that we only want either str or bytes in Python 3.

  • sudo_write_file conflated its core mission (writing a file as another user via sudo) with encoding the file content. I changed it to instead raise TypeError if the given content is not a byte string, so that encoding must be done by the caller.

  • atomic_write also conflated its mission: it expected text content and silently encoded it as UTF-8. It will now raise TypeError if the content is not a byte string; again, encoding must be done by the caller.

  • TFTP paths are always byte strings. Other paths are often, but not always, represented as Unicode strings. This caused some difficulty.

  • Integer division: we had to change many expressions like a / b into a // b to ensure integer results.

  • When testing web interactions, content coming from Django is always a byte string. We used django.conf.settings.DEFAULT_CHARSET to decode. Strictly, however, we should have checked the Content-Type header.

  • Python 3 cushions us by wrapping sys.std{in,out,err} in io.TextIOWrappers, but when forking processes you are presented with the underlying reality: byte streams. The question arises: which character encoding should we use? The LANG and LC_* environment variables typically coordinate these kinds of understandings between processes. A new select_c_utf8_locale() function was created to select the C.UTF-8 locale. For cooperating applications this will mean we can reliably use UTF-8.

  • Command-line arguments given to subprocess's functions should be Unicode strings, and it will encode them as appropriate. I did check further: that bit is implemented in C, but the result is very similar to calling os.fsencode().

  • Python 3 has only new-style classes. Classes that explicitly inherit from object can be amended to implicitly inherit from object instead.

  • No one seems to have paid any attention to the years of deprecation warnings about Exception.message. I conclude that deprecation warnings are more useful as retrospective justifications for breaking someone else's application than they are useful in getting that person to update their application in time.

That's it

I hope you find this useful when making your own plans. Good luck!

7 months ago

MAAS, up to and including version 1.9, is a Python 2.7 application. We have wanted to move to Python 3 for a long time but it hasn't been feasible. Recently the wind changed and we decided to port MAAS to Python 3.5 for MAAS's 2.0 release.

The patch for the port ended up at 152564 lines, but it was actually a mostly straightforward piece of work. We used 2to3 everywhere, with some hand-holding, and only had to port a few bits entirely by hand.

It was not perfect. MAAS did not operate out-of-the-box afterwards. We honestly expected that; in fact, we chose that! We worked until the unit test suite passed under Python 3.5 and then landed it straight away because:

  • We did not want this large branch to be long-lived. The cost of maintaining something like that over time, especially when derived from a busy trunk branch, is horrendous. The risk of complete failure can go up rapidly if, say, project priorities shift and the branch languishes, even for a short while.

  • Perhaps more significantly, we decided it was good enough, and that there was a greater chance of success if we put it right in front of everyone in a kaizen-like stop the line approach. I had done most of the early porting work, then two others helped out in a week-long crunch to get it landable, but getting everyone in the team ironing out the last creases gave us the best chance of success.

So, it was not perfect, but it worked. We've fixed the issues that came up and have not looked back.

Python 3 is worth it

The MAAS team is learning its way around the shiny new stuff in Python 3 and increasingly taking advantage of it. It's already clear that Python 3.5 is a better language than Python 2.7. The standard library is more consistent, better organised, and has several useful additions.

The split of byte strings and Unicode strings is The Big One though. The port to Python 3 would have been worth it for that alone. Python 3.5 fortunately makes it an easier split to live with because of the reintroduction of %-formatting for byte strings.

Why had we not done this before?

MAAS's biggest dependencies are Django and Twisted. Django, until somewhat recently, had no mature Python 3 port, but version 1.8 changed that. The story is somewhat similar for Twisted.

Prior to Python 3.3 it wasn't considered possible to port Twisted to Python 3 without breaking compatibility with Python 2, so the porting effort is fairly new. A testament to Twisted's modularity is that it can be ported bit by bit: Twisted in Python 3 is incomplete, but the parts that have been ported are rock solid.

Two modules were missing though: twisted.protocols.amp (which we use for RPC) and twisted.web.wsgi (the WSGI container in which we host Django). With guidance and plenty of discussion and reviews from the good people in the Twisted development community, I ported both of those (Twisted tickets 6833 and 7993 for the curious). They'll be in an upcoming release of Twisted.

By the time both Django and Twisted were ready, almost all of MAAS's other dependencies were ready too, and the work to port the remainder was small.

South to Django

Django posed one additional special problem for MAAS: the upgrade path.

MAAS has relied on South to migrate the PostgreSQL database as MAAS is upgraded. South is gone in Django 1.7, replaced by Django's own migrations system. Handing over a database from South to Django is fine, as long as the database is in the state defined by the final South migration, mirrored by the initial state of Django's native migrations.

However, we really wanted and needed MAAS to support upgrades from any supported 1.x release to any supported 2.x release, meaning we can't guarantee that MAAS has applied all known South migrations before a version of MAAS based on Django >=1.7 has to run with it.

Another member of the MAAS team, Blake, built the mechanism to address this. Out of necessity it is a sausage factory — that is, you should not look inside if you like sausages (or your database) — but it's a good one; I hope he'll write about in more detail. Needless to say, it works, and your bare-metal clouds can be upgraded directly to MAAS in Ubuntu Xenial when it's released.

All done

This port was an intense piece of work, but fun too, educational certainly, and it helps MAAS's future. Nothing has caused us to regret our decision; the opposite in fact.

We're running MAAS entirely on Python 3.5 in development, and it'll be available in Ubuntu Xenial soon in the form of MAAS 1.10. This is essentially MAAS 1.9 running on Python 3.5. We're already pushing towards a solid 2.0 release to coincide with the release of Xenial.

Next: Porting MAAS to Python 3: The (More) Technical Bits

about 1 year ago

Near the end of Transactions in MAAS I mentioned post-commit hooks. These are a mechanism that MAAS uses for making changes to external systems once a database transaction has been committed.


Database transactions in any piece of software are not guaranteed to be committed, be it because of bugs, errors, choice, or because the database rejects the transaction due to a serialisation conflict.

This makes it less than safe to change the outside world from within a transaction. Transactions can be rolled-back, but things Out There often do not have that property.

Suppose I make an EatPizza RPC call to a pizza-eating robot's web API within a transaction. That transaction later fails and is retried n times by MAAS. The robot would rupture and short-circuit from the n + 1 pizzas in its belly.

MAAS has to interact with a lot of things outside of the database, and that do not have database-like properties. When MAAS needs to perform such an interaction it arranges for it to happen in a post-commit hook, which are run once the current transaction has been fully committed by PostgreSQL.

Read on →
about 1 year ago

In MAAS, to ensure that a function is run within its own database transaction, decorate it with @transactional:

from maasserver.utils.orm import transactional

def do_something_databasey():

If a transaction is already in progress when do_something_databasey is called, it will instead be called within a savepoint.

That's it.

Now for the why.

Read on →
about 1 year ago

MAAS has a couple of function decorators that are designed to help blocking code work with non-blocking code: synchronous and asynchronous.

The blocking, or synchronous, code in MAAS is primarily though not solely the realm of Django, which handles all web API calls and some web views. The biggest responsibility of Django these days is the ORM and database migrations.

Django doesn't do non-blocking, or asynchronous, at all. For that MAAS uses Twisted. Twisted has many useful pieces that can be used in other projects, but most of the time the Twisted reactor is what you want.

Read on →