IPython’s IPython.kernel package exposes the full power of the Python interpreter over a TCP/IP network for the purposes of parallel computing. This feature brings up the important question of IPython’s security model. This document gives details about this model and how it is implemented in IPython’s architecture.
To enable parallel computing, IPython has a number of different processes that run. These processes are discussed at length in the IPython documentation and are summarized here:
Collectively, these three processes are called the IPython kernel.
These three processes communicate over TCP/IP connections with a well defined topology. The IPython controller is the only process that listens on TCP/IP sockets. Upon starting, an engine connects to a controller and registers itself with the controller. These engine/controller TCP/IP connections persist for the lifetime of each engine.
The IPython client also connects to the controller using one or more TCP/IP connections. These connections persist for the lifetime of the client only.
A given IPython controller and set of engines typically has a relatively short lifetime. Typically this lifetime corresponds to the duration of a single parallel simulation performed by a single user. Finally, the controller, engines and client processes typically execute with the permissions of that same user. More specifically, the controller and engines are not executed as root or with any other superuser permissions.
When running the IPython kernel to perform a parallel computation, a user utilizes the IPython client to send Python commands and data through the IPython controller to the IPython engines, where those commands are executed and the data processed. The design of IPython ensures that the client is the only access point for the capabilities of the engines. That is, the only way of addressing the engines is through a client.
A user can utilize the client to instruct the IPython engines to execute arbitrary Python commands. These Python commands can include calls to the system shell, access the filesystem, etc., as required by the user’s application code. From this perspective, when a user runs an IPython engine on a host, that engine has the same capabilities and permissions as the user themselves (as if they were logged onto the engine’s host with a terminal).
All TCP/IP connections between the client and controller as well as the engines and controller are fully encrypted and authenticated. This section describes the details of the encryption and authentication approached used within IPython.
IPython uses the Foolscap network protocol [Foolscap] for all communications between processes. Thus, the details of IPython’s security model are directly related to those of Foolscap. Thus, much of the following discussion is actually just a discussion of the security that is built in to Foolscap.
For encryption purposes, IPython and Foolscap use the well known Secure Socket Layer (SSL) protocol [RFC5246]. We use the implementation of this protocol provided by the OpenSSL project through the pyOpenSSL [pyOpenSSL] Python bindings to OpenSSL.
IPython clients and engines must also authenticate themselves with the controller. This is handled in a capabilities based security model [Capability]. In this model, the controller creates a strong cryptographic key or token that represents each set of capability that the controller offers. Any party who has this key and presents it to the controller has full access to the corresponding capabilities of the controller. This model is analogous to using a physical key to gain access to physical items (capabilities) behind a locked door.
For a capabilities based authentication system to prevent unauthorized access, two things must be ensured:
The keys in Foolscap are called Foolscap URL’s or FURLs. The following section gives details about how these FURLs are created in Foolscap. The IPython controller creates a number of FURLs for different purposes:
Upon starting, the controller creates these different FURLS and writes them files in the user-read-only directory $HOME/.ipython/security. Thus, only the user who starts the controller has access to the FURLs.
For an IPython client or engine to authenticate with a controller, it must present the appropriate FURL to the controller upon connecting. If the FURL matches what the controller expects for a given capability, access is granted. If not, access is denied. The exchange of FURLs is done after encrypted communications channels have been established to prevent attackers from capturing them.
Note
The FURL is similar to an unsigned private key in SSH.
In this section we detail the precise security handshake that takes place at the beginning of any network connection in IPython. For the purposes of this discussion, the SERVER is the IPython controller process and the CLIENT is the IPython engine or client process.
Upon starting, all IPython processes do the following:
Upon starting, the IPython controller also does the following:
For a CLIENT to be able to connect to the SERVER and access a capability of that SERVER, the CLIENT must have knowledge of the FURL for that SERVER’s capability. This typically requires that the file containing the FURL be moved from the SERVER’s host to the CLIENT’s host. This is done by the end user who started the SERVER and wishes to have a CLIENT connect to the SERVER.
When a CLIENT connects to the SERVER, the following handshake protocol takes place:
The public/private key pair associated with each process’s x509 certificate are completely hidden from this handshake protocol. There are however, used internally by OpenSSL as part of the SSL handshake protocol. Each process keeps their own private key hidden and sends its peer only the public key (embedded in the certificate).
Finally, when the CLIENT requests access to a particular SERVER capability, the following happens:
There are a number of potential security vulnerabilities present in IPython’s architecture. In this section we discuss those vulnerabilities and detail how the security architecture described above prevents them from being exploited.
The IPython client can instruct the IPython engines to execute arbitrary Python code with the permissions of the user who started the engines. If an attacker were able to connect their own hostile IPython client to the IPython controller, they could instruct the engines to execute code.
This attack is prevented by the capabilities based client authentication performed after the encrypted channel has been established. The relevant authentication information is encoded into the FURL that clients must present to gain access to the IPython controller. By limiting the distribution of those FURLs, a user can grant access to only authorized persons.
It is highly unlikely that a client FURL could be guessed by an attacker in a brute force guessing attack. A given instance of the IPython controller only runs for a relatively short amount of time (on the order of hours). Thus an attacker would have only a limited amount of time to test a search space of size 2**320. Furthermore, even if a controller were to run for a longer amount of time, this search space is quite large (larger for instance than that of typical username/password pair).
If an attacker were able to connect a hostile engine to a user’s controller, the user might unknowingly send sensitive code or data to the hostile engine. This attacker’s engine would then have full access to that code and data.
This type of attack is prevented in the same way as the unauthorized client attack, through the usage of the capabilities based authentication scheme.
It is also possible that an attacker could try to convince a user’s IPython client or engine to connect to a hostile IPython controller. That controller would then have full access to the code and data sent between the IPython client and the IPython engines.
Again, this attack is prevented through the FURLs, which ensure that a client or engine connects to the correct controller. It is also important to note that the FURLs also encode the IP address and port that the controller is listening on, so there is little chance of mistakenly connecting to a controller running on a different IP address and port.
When starting an engine or client, a user must specify which FURL to use for that connection. Thus, in order to introduce a hostile controller, the attacker must convince the user to use the FURLs associated with the hostile controller. As long as a user is diligent in only using FURLs from trusted sources, this attack is not possible.
A number of other measures are taken to further limit the security risks involved in running the IPython kernel.
First, by default, the IPython controller listens on random port numbers. While this can be overridden by the user, in the default configuration, an attacker would have to do a port scan to even find a controller to attack. When coupled with the relatively short running time of a typical controller (on the order of hours), an attacker would have to work extremely hard and extremely fast to even find a running controller to attack.
Second, much of the time, especially when run on supercomputers or clusters, the controller is running behind a firewall. Thus, for engines or client to connect to the controller:
or:
In either case, an attacker is presented with addition barriers that prevent attacking or even probing the system.
IPython’s architecture has been carefully designed with security in mind. The capabilities based authentication model, in conjunction with the encrypted TCP/IP channels, address the core potential vulnerabilities in the system, while still enabling user’s to use the system in open networks.
Can you clarify the roles of the certificate and its keys versus the FURL, which is also called a key?
The certificate created by IPython processes is a standard public key x509 certificate, that is used by the SSL handshake protocol to setup encrypted channel between the controller and the IPython engine or client. This public and private key associated with this certificate are used only by the SSL handshake protocol in setting up this encrypted channel.
The FURL serves a completely different and independent purpose from the key pair associated with the certificate. When we refer to a FURL as a key, we are using the word “key” in the capabilities based security model sense. This has nothing to do with “key” in the public/private key sense used in the SSL protocol.
With that said the FURL is used as an cryptographic key, to grant IPython engines and clients access to particular capabilities that the controller offers.
Is the controller creating a self-signed certificate? Is this created for per instance/session, one-time-setup or each-time the controller is started?
The Foolscap network protocol, which handles the SSL protocol details, creates a self-signed x509 certificate using OpenSSL for each IPython process. The lifetime of the certificate is handled differently for the IPython controller and the engines/client.
For the IPython engines and client, the certificate is only held in memory for the lifetime of its process. It is never written to disk.
For the controller, the certificate can be created anew each time the controller starts or it can be created once and reused each time the controller starts. If at any point, the certificate is deleted, a new one is created the next time the controller starts.
How the private key (associated with the certificate) is distributed?
In the usual implementation of the SSL protocol, the private key is never distributed. We follow this standard always.
Many SSL connections only perform one sided authentication (the server to the client). How is the client authentication in IPython’s system related to SSL authentication?
We perform a two way SSL handshake in which both parties request and verify the certificate of their peer. This mutual authentication is handled by the SSL handshake and is separate and independent from the additional authentication steps that the CLIENT and SERVER perform after an encrypted channel is established.
[RFC5246] | <http://tools.ietf.org/html/rfc5246> |