Often, a parallel algorithm will require moving data between the engines. One way of accomplishing this is by doing a pull and then a push using the multiengine client. However, this will be slow as all the data has to go through the controller to the client and then back through the controller, to its final destination.
A much better way of moving data between engines is to use a message passing library, such as the Message Passing Interface (MPI) [MPI]. IPython’s parallel computing architecture has been designed from the ground up to integrate with MPI. This document describes how to use MPI with IPython.
If you want to use MPI with IPython, you will need to install:
Note
The mpi4py package is not a strict requirement. However, you need to have some way of calling MPI from Python. You also need some way of making sure that MPI_Init() is called when the IPython engines start up. There are a number of ways of doing this and a good number of associated subtleties. We highly recommend just using mpi4py as it takes care of most of these problems. If you want to do something different, let us know and we can help you get started.
To use code that calls MPI, there are typically two things that MPI requires.
There are a couple of ways that you can start the IPython engines and get these things to happen.
The easiest approach is to use the mpiexec mode of ipcluster, which will first start a controller and then a set of engines using mpiexec:
$ ipcluster mpiexec -n 4
This approach is best as interrupting ipcluster will automatically stop and clean up the controller and engines.
If you want to start the IPython engines using the mpiexec, just do:
$ mpiexec -n 4 ipengine --mpi=mpi4py
This requires that you already have a controller running and that the FURL files for the engines are in place. We also have built in support for PyTrilinos [PyTrilinos], which can be used (assuming is installed) by starting the engines with:
mpiexec -n 4 ipengine --mpi=pytrilinos
Once the engines are running with MPI enabled, you are ready to go. You can now call any code that uses MPI in the IPython engines. And, all of this can be done interactively. Here we show a simple example that uses mpi4py [mpi4py].
First, lets define a simply function that uses MPI to calculate the sum of a distributed array. Save the following text in a file called psum.py:
from mpi4py import MPI
import numpy as np
def psum(a):
s = np.sum(a)
return MPI.COMM_WORLD.Allreduce(s,MPI.SUM)
Now, start an IPython cluster in the same directory as psum.py:
$ ipcluster mpiexec -n 4
Finally, connect to the cluster and use this function interactively. In this case, we create a random array on each engine and sum up all the random arrays using our psum() function:
In [1]: from IPython.kernel import client
In [2]: mec = client.MultiEngineClient()
In [3]: mec.activate()
In [4]: px import numpy as np
Parallel execution on engines: all
Out[4]:
<Results List>
[0] In [13]: import numpy as np
[1] In [13]: import numpy as np
[2] In [13]: import numpy as np
[3] In [13]: import numpy as np
In [6]: px a = np.random.rand(100)
Parallel execution on engines: all
Out[6]:
<Results List>
[0] In [15]: a = np.random.rand(100)
[1] In [15]: a = np.random.rand(100)
[2] In [15]: a = np.random.rand(100)
[3] In [15]: a = np.random.rand(100)
In [7]: px from psum import psum
Parallel execution on engines: all
Out[7]:
<Results List>
[0] In [16]: from psum import psum
[1] In [16]: from psum import psum
[2] In [16]: from psum import psum
[3] In [16]: from psum import psum
In [8]: px s = psum(a)
Parallel execution on engines: all
Out[8]:
<Results List>
[0] In [17]: s = psum(a)
[1] In [17]: s = psum(a)
[2] In [17]: s = psum(a)
[3] In [17]: s = psum(a)
In [9]: px print s
Parallel execution on engines: all
Out[9]:
<Results List>
[0] In [18]: print s
[0] Out[18]: 187.451545803
[1] In [18]: print s
[1] Out[18]: 187.451545803
[2] In [18]: print s
[2] Out[18]: 187.451545803
[3] In [18]: print s
[3] Out[18]: 187.451545803
Any Python code that makes calls to MPI can be used in this manner, including compiled C, C++ and Fortran libraries that have been exposed to Python.
[MPI] | Message Passing Interface. http://www-unix.mcs.anl.gov/mpi/ |
[mpi4py] | (1, 2) MPI for Python. mpi4py: http://mpi4py.scipy.org/ |
[OpenMPI] | Open MPI. http://www.open-mpi.org/ |
[PyTrilinos] | PyTrilinos. http://trilinos.sandia.gov/packages/pytrilinos/ |