Know the basic Data Structures

In python Dictionaries and sets use hash tables that have O(1) time complexity. As The Hitchhiker’s Guide states:

… it is often a good idea to use sets or dictionaries instead of lists in cases where:

– The collection will contain a large number of items

– You will be repeatedly searching for items in the collection

– You do not have duplicate items.

For a performance cheat sheet, refer to Time Complexity. For a nice, accessible, and visual book on algorithms, refer to Algorithms

However in case of list using:

for value in set(list_name)

is also expensive because of the list-to-set cast.


Reduce ‘not so useful’ memory usage

Avoid + operator on strings. Let us have the following case:

a = 'line_a\n'
a += 'line_b\n'
a += 'line_c\n'

Instead, use this:

a = ['line1', 'line2', 'line3']
'\n'.join(a)

Similarly, use formatting instead of using the addition operator:

# slow
text = 'hello' + variable + 'world!'
# faster
text = 'hello %s world!' % variable
# fastest
text = 'hellow {} world'.format(variable)

This not only makes the code faster but also more readable. Also, doing

if variable == True

is also more inefficient than,

if variable

This is, of course, more pythonic too!


Utilize generators

In case of large sets of results, go for generators. For example:

def a_fun(a):
    result = []
    for i in a:
        result.append(i*2)
    return(result)

Takes more time than using a generator i.e,

def a_fun(a):
    for i in a:
        yield(i*2)

which can then be taken to a list.

Generators give you lazy evaluation. You use them by iterating over them: either explicitly with ‘for’ or implicitly, by passing it to any function or construct that iterates.

You can understand generators as iterating and operating over the list one by one instead of iterating over all of it at once.


Use built-in functions and libraries

A lot of built-in functions which are implemented in C are made for efficiency and easy usability. Some of these functions are sum, max, any, map, etc. For example, if we have a list of strings str_list then,

upper_list = []
for string in str_list:
    upper_list.append(string.upper())

is slower than,

upper_list = map(str.upper, str_list)

This is also called ‘list comprehension’. Which is a bit more pythonic too!

Guido’s Python Patterns — An Optimization Anecdote is a great read:

If you feel the need for speed, go for built-in functions — you can’t beat a loop written in C. Check the library manual for a built-in function that does what you want.

Another useful library is collections, one of its useful functions is:

from collections import Counter
s = 'convert the character count of this string to a dictionary'
Counter(s)

This will return the dictionary with the count as values and character of the string as keys.


Think about the calculations and functions outside the loop

Iterating and doing the calculations is slower than doing it outside the iteration when it can be done so! Let us have a look at a case where you have a big iterator and you need to do some regex operation:

for i in iter_list:
    m = re.search(r'regex_expression', i)

Is slower instead, compile the regex at once and then use the cached version of the same.

reg_ex = re.compile(r'regex_expression')
for i in iter_list:
    m = reg_ex.search(i)

Also in case of classes, assigning the methods to a local variable is more efficient than calling them directly and this helps in improving the performance majorly in case of loops. For example:

myfun = my_object.myFun
for i in range(n):
    myfun(i)

How to spot performance issues

We can’t just guess performance, therefore using a profiler is a better option like

python -m cProfile [-o output_file] [-s sort_order] myscript.py

To time your code.


There are still many other ways that I may not have discovered yet, which I will be discovering and sharing with you time to time. Please let me know in comments if there is some topic for which I can be of help!

Author –  Gaganpreet Singh