Python caches

The fastest code is the code you don’t run



dogpile.cache:

dogpile.cache is a caching API which provides a generic interface to caching backends of any variety, and additionally provides API hooks which integrate these cache backends with the locking mechanism of dogpile.

It’s well-done, but doesn’t integrate especially smoothly with futures and modern concurrency as seen in tornado or asyncio, being threading-based.

There is an advanced version for higher-performance redis.

The joblib cache looks convenient, but I can’t work out if it’s multi-write safe, or supposed to be only invoked from some master process and thus not need locking:

Transparent and fast disk-caching of output value: a memoize or make-like functionality for Python functions that works well for arbitrary Python objects, including very large numpy arrays. Separate persistence and flow-execution logic from domain logic or algorithmic code by writing the operations as a set of steps with well-defined inputs and outputs: Python functions. Joblib can save their computation to disk and rerun it only if necessary.

It looks convenient, and is the easiest mmap-compatible solution I know, but it only supports function memoization, so if you want to access results some other way or access partial results it can get convoluted unless you can naturally factor your code into function memoizaions.

On that tip, klepto is also scientific computation focussed, (part of the pathos project) and attempts to be even cleverer than joblib.

Klepto extends python’s lru_cache to utilize different keymaps and alternate caching algorithms, such as lfu_cache and mru_cache. While caching is meant for fast access to saved results, klepto also has archiving capabilities, for longer-term storage. Klepto uses a simple dictionary-sytle interface for all caches and archives, and all caches can be applied to any python function as a decorator. Keymaps are algorithms for converting a function’s input signature to a unique dictionary, where the function’s results are the dictionary value. Thus for y = f(x), y will be stored in cache[x] (e.g. {x:y}).

Klepto provides both standard and ‘safe’ caching, where safe caches are slower but can recover from hashing errors. Klepto is intended to be used for distributed and parallel computing, where several of the keymaps serialize the stored objects. Caches and archives are intended to be read/write accessible from different threads and processes. Klepto enables a user to decorate a function, save the results to a file or database archive, close the interpreter, start a new session, and reload the function and its cache.

However, pathos isn’t especially active, and despite many lofty goals, seems in practice to provide little that you can’t already get from joblib. Is this a case of the perfect being the enemy of the good.?

cachetools extends the python3 lru_cache reference implementation.


No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.