Home ¦ Posts ¦ RSS

High Performance Python review

I had been meaning to read this book for a while now. It is a nice collections of strategies to deal efficiently with CPU, memory and I/O related problems which you might encounter while writing Python applications. The amount of topics covered is good but not hyper-extensive, which on another hand makes it a pleasant and relaxing read.

High performance Python book cover

The book is peculiar because it brings together high-level topics such as idiomatic Python development while taking a look at the impact on low-level parameters like memory fragmentation and number of context-switches being generated.

A few funny/interesting things I found in this book:

Did you know that appending items to a list can become a very expensive operation under certain circumstances? cPython implements lists using arrays of fixed size and it copies data to bigger arrays when the content exceeds the maximum size of the array being appended to. This is the rule followed in cPython 2.7 to figure out the size of new arrays being allocated:

M = (N >> 3) + (N < 9 ? 3 : 6)

Where M is the size of the new array, and N is the size of the current one. The rational behind this is that an append operations might mark the start of a series of append operations, which can be accomodated in a faster way if the fixed-size array has enough spare space to hold new items.

Something equally surprising happens when inserting new items in a dictionary. As we add more items to a dictionary, we will eventually hit a point where cPython copies the hash table contents to a bigger hash table structure in order to avoid hitting too many collisions. This means that certain key insert operation will have a O(N) cost (although the amortized cost will still be O(1) because these operations only happen under certain pressure conditions).

Long story short: your application may slow down temporarily when crunching large amounts of data using lists and dictionaries.

Another interesting chapter is the one that introduces vectorization/SIMD techniques and Numpy to speed-up certain number-crunching operations. While it's not something I can immediately apply to any of the projects I'm working on, I've been wanting to give Numpy a look for quite some time now and this book was the perfect excuse to do so.

Conclusions

These are just a few examples of the kind of information you will find in the book. Head to the the table of contents for more info.

If you're interested in understanding how certain performance-related details work in Python, this is definitely a solid resource and a pleasant read.

© Giuseppe Ciotta. Built using Pelican. Theme on github.