This section gives an overview of IPython’s sophisticated and powerful architecture for parallel and distributed computing. This architecture abstracts out parallelism in a very general way, which enables IPython to support many different styles of parallelism including:
Most importantly, IPython enables all types of parallel applications to be developed, executed, debugged and monitored interactively. Hence, the I in IPython. The following are some example usage cases for IPython:
The IPython architecture consists of three components:
These components live in the IPython.kernel package and are installed with IPython. They do, however, have additional dependencies that must be installed. For more information, see our installation documentation.
The IPython engine is a Python instance that takes Python commands over a network connection. Eventually, the IPython engine will be a full IPython interpreter, but for now, it is a regular Python interpreter. The engine can also handle incoming and outgoing Python objects sent over a network connection. When multiple engines are started, parallel and distributed computing becomes possible. An important feature of an IPython engine is that it blocks while user code is being executed. Read on for how the IPython controller solves this problem to expose a clean asynchronous API to the user.
The IPython controller provides an interface for working with a set of engines. At an general level, the controller is a process to which IPython engines can connect. For each connected engine, the controller manages a queue. All actions that can be performed on the engine go through this queue. While the engines themselves block when user code is run, the controller hides that from the user to provide a fully asynchronous interface to a set of engines.
Note
Because the controller listens on a network port for engines to connect to it, it must be started before any engines are started.
The controller also provides a single point of contact for users who wish to utilize the engines connected to the controller. There are different ways of working with a controller. In IPython these ways correspond to different interfaces that the controller is adapted to. Currently we have two default interfaces to the controller:
Advanced users can easily add new custom interfaces to enable other styles of parallelism.
Note
A single controller and set of engines can be accessed through multiple interfaces simultaneously. This opens the door for lots of interesting things.
For each controller interface, there is a corresponding client. These clients allow users to interact with a set of engines through the interface. Here are the two default clients:
By default (as long as pyOpenSSL is installed) all network connections between the controller and engines and the controller and clients are secure. What does this mean? First of all, all of the connections will be encrypted using SSL. Second, the connections are authenticated. We handle authentication in a capability based security model [Capability]. In this model, a “capability (known in some systems as a key) is a communicable, unforgeable token of authority”. Put simply, a capability is like a key to your house. If you have the key to your house, you can get in. If not, you can’t.
In our architecture, the controller is the only process that listens on network ports, and is thus responsible to creating these keys. In IPython, these keys are known as Foolscap URLs, or FURLs, because of the underlying network protocol we are using. As a user, you don’t need to know anything about the details of these FURLs, other than that when the controller starts, it saves a set of FURLs to files named something.furl. The default location of these files is the ~./ipython/security directory.
To connect and authenticate to the controller an engine or client simply needs to present an appropriate FURL (that was originally created by the controller) to the controller. Thus, the FURL files need to be copied to a location where the clients and engines can find them. Typically, this is the ~./ipython/security directory on the host where the client/engine is running (which could be a different host than the controller). Once the FURL files are copied over, everything should work fine.
Currently, there are three FURL files that the controller creates:
More details of how these FURL files are used are given below.
A detailed description of the security model and its implementation in IPython can be found here.
To use IPython for parallel computing, you need to start one instance of the controller and one or more instances of the engine. Initially, it is best to simply start a controller and engines on a single host using the ipcluster command. To start a controller and 4 engines on you localhost, just do:
$ ipcluster local -n 4
More details about starting the IPython controller and engines can be found here
Once you have started the IPython controller and one or more engines, you are ready to use the engines to do something useful. To make sure everything is working correctly, try the following commands:
In [1]: from IPython.kernel import client
In [2]: mec = client.MultiEngineClient()
In [4]: mec.get_ids()
Out[4]: [0, 1, 2, 3]
In [5]: mec.execute('print "Hello World"')
Out[5]:
<Results List>
[0] In [1]: print "Hello World"
[0] Out[1]: Hello World
[1] In [1]: print "Hello World"
[1] Out[1]: Hello World
[2] In [1]: print "Hello World"
[2] Out[1]: Hello World
[3] In [1]: print "Hello World"
[3] Out[1]: Hello World
Remember, a client also needs to present a FURL file to the controller. How does this happen? When a multiengine client is created with no arguments, the client tries to find the corresponding FURL file in the local ~./ipython/security directory. If it finds it, you are set. If you have put the FURL file in a different location or it has a different name, create the client like this:
mec = client.MultiEngineClient('/path/to/my/ipcontroller-mec.furl')
Same thing hold true of creating a task client:
tc = client.TaskClient('/path/to/my/ipcontroller-tc.furl')
You are now ready to learn more about the MultiEngine and Task interfaces to the controller.
Note
Don’t forget that the engine, multiengine client and task client all have different furl files. You must move each of these around to an appropriate location so that the engines and clients can use them to connect to the controller.
[Capability] | Capability-based security, http://en.wikipedia.org/wiki/Capability-based_security |