cache-manager is a lightweight, Pythonic cache for files with an SQLite registry. It lets you:
- Store files under a cache directory while tracking metadata in SQLite.
- Version entries automatically based on a stable key derived from a URI/parameters.
- Track and query item status (UNINITIALIZED, WRITE, READY, FAILED, TRASH).
- Attach type-aware attributes (int, float, varchar, datetime, text) and search by them.
- Open plain and compressed files (gz, zip, tar(.gz|.bz2)) via a unified interface.
- Clean up stale records/files and keep the best/most recent ready entries.
This is ideal for reproducible data pipelines and ETL steps where you want deterministic, discoverable artifacts.
Clone and install in editable mode (no extra tools required):
git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
python -m venv .venv
source .venv/bin/activate
pip install -e .Alternatively, if you prefer Poetry:
git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
poetry installThe API centers around two types: Cache (manager) and CacheItem (one file + metadata). Create a cache, create or retrieve items, write files, mark them READY, and open them later.
Minimal example:
import cache_manager as cm
from cache_manager._status import Status
cache = cm.Cache(path="./my_cache")
item = cache.create(
uri="https://example.org/data.tsv",
params={"dataset": "demo", "version": 1},
attrs={"species": "human", "rows": 1200},
status=Status.WRITE.value,
filename="data.tsv",
)
with open(item.path, "w", encoding="utf-8") as f:
f.write("col_a\tcol_b\n1\t2\n")
item.ready()
best = cache.best(
uri="https://example.org/data.tsv",
params={"dataset": "demo", "version": 1},
)
print(best, best.path)Run the included example script which downloads a real dataset and caches it:
python scripts/hello_cache_manager.pyThere is no global config file; you configure the cache per instance:
Cache(path: str | None = None, pkg: str | None = None)path: explicit directory for cache (contains the SQLite registry and files).pkg: if set, uses an OS-specific cache directory for that application name viaplatformdirs(e.g., on Linux:~/.cache/<pkg>).
Common item fields:
uri(str): a canonical identifier used for the hash key (together withparams).params(dict): serialized to the stable key; changing them yields a new key.attrs(dict): typed attributes persisted to attribute tables for rich queries.status(int): fromcache_manager._status.Status(READY, WRITE, etc.).filename(str): filename to be used in the cache; extension is auto-inferred.
Logging/session helpers are available under cache_manager.session and cache_manager.log if you want simple trace output.
- Create or reuse an item with
best_or_new.
import os
import cache_manager as cm
from cache_manager._status import Status
cache = cm.Cache(path="./my_cache")
uri = "https://example.org/report.csv"
params = {"year": 2026, "cohort": "A"}
item = cache.best_or_new(
uri=uri,
params=params,
attrs={"kind": "report", "format": "csv"},
filename="report.csv",
new_status=Status.WRITE.value,
)
if item.status != Status.READY.value or not os.path.exists(item.path):
with open(item.path, "w", encoding="utf-8") as f:
f.write("id,value\n1,42\n")
item.ready()
print("Using:", item.path)- Query by attributes and metadata.
import cache_manager as cm
from cache_manager._status import Status
cache = cm.Cache(path="./my_cache")
cache.create(
uri="demo://sample-1",
attrs={"project": "alpha", "batch": 1, "score": 0.95},
status=Status.READY.value,
)
cache.create(
uri="demo://sample-2",
attrs={"project": "alpha", "batch": 2, "score": 0.71},
status=Status.READY.value,
)
ids = cache.by_attrs({"project": "alpha", "batch": 2})
print("matching ids:", ids)
items = cache.search(uri="demo://sample-2", status=Status.READY.value)
for it in items:
print(it.version_id, it.attrs)- Open a cached file through
CacheItem.open.
import cache_manager as cm
from cache_manager._status import Status
cache = cm.Cache(path="./my_cache")
item = cache.create(
uri="demo://text-file",
filename="hello.txt",
status=Status.WRITE.value,
)
with open(item.path, "w", encoding="utf-8") as f:
f.write("hello\nworld\n")
item.ready()
opened = item.open(default_mode="r", encoding="utf-8", large=True)
print(next(iter(opened.result)).strip())Contributions are welcome! A typical flow:
Please open issues and pull requests on GitHub. If you plan a larger change, consider discussing it in an issue first. (A dedicated CONTRIBUTING.md may be added later.)
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
OmniPath Team - omnipathdb@gmail.com
Project page: https://github.com/saezlab/cache-manager