Multi-processing in Python

March 13, 2015
12-1 PM
3425 Sterling Hall

Attending

Kalin Kiesling

I am a first year grad student in nuclear engineering, currently developing software to aid in computational nuclear engineering tasks.

Multiprocessing in Python

Today will be a discussion of using the multiprocessing module from Python. You can follow allow with what we will be doing today here.

Why might you want to use multiprocessing?

If you have functions within a single Python file, or process, that cannot be run at the same time, then Python’s multiprocessing is for you. “Multiprocessing” here means threading, so you can use this module to force functions to run on different processors.

Documentation for the module can be found here.

How to Use

To use the package simply add this to the top of your python script:

import multiprocessing

There are many classes you can import specifically like Pool, Process, Queue, Pipe, etc. Today we will go over the Pool and Process classes specifically.

The Process Class

from multiprocessing import Process

This class will run a function f(x) on a single process. There are two main chunks of code needed in the script:

  1. Any function f(x) that you wish to run.

  2. The process that calls that function.

In our example in process_example.py, we will demonstrate how to use this class by getting the process id in a few different ways.

We will start by retrieving the current process id for the script without launching any extra processes.

current = os.getpid()
print "Current process:", current

We should get “Current process: x

Next we will create a function get_id() that will give us the current process id and the parent process id.

def get_id():   
    pp = os.getppid()   # parent process
    p = os.getpid()     # current process
    print "parent process:", pp
    print "current process:", p

We will then call that function a by creating a new process.

p = Process(target=get_id)
p.start()
p.join()

We initialize the process with p = Process(target=get_id) where target specifies the function we wish to call on a new process. We then have to start the process p.start() and bring it back to our current process with p.join().

We should recieve that the current process is now a new number x+1 and the parent process has the same id as the main script x. We can call many processes and see that the parent process id will remain at x and the current process will continue to increase by 1.

Notes:

if __name__ == '__main__':
    p = Process(target=f)
    p.start()
    p.join()

The Pool Class

from multiprocessing import Pool

The Pool class is similar to Process except that you can control a pool of processes. This is a good class to use if the function returns a value.

In the most basic case, you can create a Pool instance with no arguments and call the function by using apply_async(). We can use our get_id example from before in the same way (see here).

p = Pool()
r = p.apply_async(get_id_print)

We can also use Pool if we have a function that returns a value by using r.get() to retrieve the return value. For example if our det_id function now returns a list of ids [pp, p], we can retrieve them as so:

p2 = Pool()
r2 = p2.apply_async(get_id_return)
vals = r2.get()
print "List of ids:", vals
print "Parent id:", vals[0]
print "Current id:", vals[1]

Another great use for Pool is its map which allows you to call the function many times, each on a new process.

def f(x):
    r = x*2
    return r

p4 = Pool()
r4 = p4.map(f, [1,2,3]) 
print r4

Notes

Tips and Tricks

Lightning Discussions

Treat rotation?

We can take turns bringing in lunch-time treats.

Share

Discuss

comments powered by Disqus