How to Profile Python Code: Accurately Measure the Performance of your Software

As software engineers, we are paid to create high-quality code that performs.

Performance can be measured in a number of ways: the problems it solves, its uptime, or a slew of quantitative metrics.

Speed is unquestionably an important component of software performance. In this article, I’ll show you how to measure the speed of Python code using a process called “profiling”, step-by-step.

What is Profiling?

To measure the speed of Python code, we use a process called ‘profiling’. Profiling measures the speed of your code and also provides additional information like:

The number of times each method was called
The average runtime of each method per call
The total time spent running each function

There are a number of terms that you should familiarize yourself with before you start profiling your first application:

Method profiling: Measuring the performance of methods and functions
Line profiling: Measuring the performance of an application line-by-line
APM Tools: APM stands for application performance management. These are off-the-shelf tools that you can pair with your applications to monitor their performance in real-time. I would say that these could be useful for production-level business applications but might not be worth the cost for weekend warriors building side projects.

Fortunately, there are a number of Python modules that make it easy to profile your code. We’ll explore them in the next section of this article.

The Profiling Ecosystem

Python has a robust ecosystem for profiling code. The major components are:

cProfile: a C extension suitable for profiling long-running programs. cProfile is recommend for most Python users.
Profile: a pure Python module that adds significant technical overhead to profiled programs. Unless you’re trying to extend your profiler beyond the functionality provided by cProfile, I would recommend against using the profile module.
pstats: a Python module for formatting data and generating reports from the information generated by profile and cProfile.

In this article, I will demonstrate how to profile your code using the functionality provided by cProfile and pstats. Specifically, I will provide an easy template where you can paste in your code and get a nicely-formatted report on the runtimes of the functions it contains.

Step 1: Finish Writing and Debugging Your Code

While this step is basically a formality, it’s worth mentioning that you need to finish and debug the code first. Said differently, your code needs to run properly before you can profile it. Spend time completing and debugging your code before

Step 2: Download or Fork my Profiler Repository

I’ve created a GitHub repository to accompany this blog post. In it, you will see a file called profiler.py. This is the file we’ll be using to profile your code.

If you’re not a GitHub user (you should be), I have also embedded the profiler.py file below:

import cProfile, pstats, io

def profile(fnc):
    
    def inner(*args, **kwargs):
        
        pr = cProfile.Profile()
        pr.enable()
        retval = fnc(*args, **kwargs)
        pr.disable()
        s = io.StringIO()
        sortby = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats()
        print(s.getvalue())
        return retval

    return inner

@profile
def code_to_profile():    
    ###################################
    # Insert code to be profiled here
    ###################################
    pass
    
%time datafeeds_builder()

Open up some sort of Python environment (like a Jupyter Notebook, vim, or VS Code) and open the profiler.py file.

Step 3: Enter the Code You Wish to Profile

In the profiler.py file, you’ll see the following code block:

@profile

def code_to_profile():    

    ###################################

    # Insert code to be profiled here

    ###################################

    pass

Within the commented block, replace it with the code you’d like to profile. For the purpose of this tutorial, we’ll be profiling a very simple program called code-to-test.py, which imports population data from an open source GitHub repository and prints each country’s name along with its population.

The entire program we’re profiling is only 13 lines long. I have embedded it below.

import numpy as np
import pandas as pd

populationData = pd.read_json('https://raw.githubusercontent.com/samayo/country-json/master/src/country-by-population.json')
stringDataFrame = pd.DataFrame(pd.np.empty((0,1)))
for index, row in populationData.iterrows():
    stringDataFrame.append(pd.Series(row[0] + ' has population ' + str(row[1])),ignore_index = True)

i=0
while i < 10**8:
    i+=1

print('The code is done!')

Step 4: Run the Profiler

Once the commented section is replaced with the code you’d like to profile, you can now see the output of the profiler. As an example, here is the output of the profiler executing code-to-test.py. Please take a moment to observe this output.

The rows of this quasi-spreadsheet profiler output are explained below:

ncalls: the number of times each functions was called
tottime: the total time spent running each function
percall: the total time spent running the function divided by the number of calls
cumtime: the total time spent running each function, as well as the functions that it itself called (in other words, its dependency functions)
percall: cumtime divided by the number of calls

To be clear, tottime and cumtime are indeed different. Here is the best explanation (found on this StackOverflow thread):

tottime is the total time spent in the function alone. cumtime is the total time spent in the function plus all functions that this function called.

From time to time when profiling code, you may also see something like this:

CPU times: user 2 µs, sys: 0 ns, total: 2 µs

Wall time: 5.72 µs

This means that the CPU actually performed calculations for 2 microseconds, while the program took a total of 5.7 microseconds to really complete. If you’re ever confused about what CPU time or wall time means, it is helpful to remember that ‘wall time’ really means ‘time as measured by a clock on the wall’.

Why are CPU time and wall time different? It is usually because the CPU sends a request across a network and must wait for a response before continuing.

In this case, the pd.read_json() function sends a request to GitHub for the population statistics database and must wait for the request to be served before printing the strings specified by the program.

Final Thoughts

In this article, I explained how to measure the performance of Python code using a profiler. I also included links to a GitHub repository with a profiler already built, and some prepopulated code for you to profile yourself.

One of my goals for 2020 is to create many tutorials on Python programming. If you’d like to suggest a tutorial, please email me. You can also join my mailing list to stay up-to-date on my latest articles.

Written on March 24th, 2020