The IPython Hub stores all task requests and results in a database. Currently supported backends are: MongoDB, SQLite (the default), and an in-memory DictDB. The most common use case for this is clients requesting results for tasks they did not submit, via:
In [1]: rc.get_result(task_id)
However, since we have this DB backend, we provide a direct query method in the client for users who want deeper introspection into their task history. The db_query() method of the Client is modeled after MongoDB queries, so if you have used MongoDB it should look familiar. In fact, when the MongoDB backend is in use, the query is relayed directly. However, when using other backends, the interface is emulated and only a subset of queries is possible.
See also
MongoDB query docs: http://www.mongodb.org/display/DOCS/Querying
Client.db_query() takes a dictionary query object, with keys from the TaskRecord key list, and values of either exact values to test, or MongoDB queries, which are dicts of The form: {'operator' : 'argument(s)'}. There is also an optional keys argument, that specifies which subset of keys should be retrieved. The default is to retrieve all keys excluding the request and result buffers. db_query() returns a list of TaskRecord dicts. Also like MongoDB, the msg_id key will always be included, whether requested or not.
TaskRecord keys:
Key | Type | Description |
---|---|---|
msg_id | uuid(bytes) | The msg ID |
header | dict | The request header |
content | dict | The request content (likely empty) |
buffers | list(bytes) | buffers containing serialized request objects |
submitted | datetime | timestamp for time of submission (set by client) |
client_uuid | uuid(bytes) | IDENT of client’s socket |
engine_uuid | uuid(bytes) | IDENT of engine’s socket |
started | datetime | time task began execution on engine |
completed | datetime | time task finished execution (success or failure) on engine |
resubmitted | datetime | time of resubmission (if applicable) |
result_header | dict | header for result |
result_content | dict | content for result |
result_buffers | list(bytes) | buffers containing serialized request objects |
queue | bytes | The name of the queue for the task (‘mux’ or ‘task’) |
pyin | <unused> | Python input (unused) |
pyout | <unused> | Python output (unused) |
pyerr | <unused> | Python traceback (unused) |
stdout | str | Stream of stdout data |
stderr | str | Stream of stderr data |
MongoDB operators we emulate on all backends:
Operator | Python equivalent |
---|---|
‘$in’ | in |
‘$nin’ | not in |
‘$eq’ | == |
‘$ne’ | != |
‘$ge’ | > |
‘$gte’ | >= |
‘$le’ | < |
‘$lte’ | <= |
The DB Query is useful for two primary cases:
To get all msg_ids that are not completed, only retrieving their ID and start time:
In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])
All jobs started in the last hour by me:
In [1]: from datetime import datetime, timedelta
In [2]: hourago = datetime.now() - timedelta(1./24)
In [3]: recent = rc.db_query({'started' : {'$gte' : hourago },
'client_uuid' : rc.session.session})
All jobs started more than an hour ago, by clients other than me:
In [3]: recent = rc.db_query({'started' : {'$le' : hourago },
'client_uuid' : {'$ne' : rc.session.session}})
Result headers for all jobs on engine 3 or 4:
In [1]: uuids = map(rc._engines.get, (3,4))
In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')