Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local disk cache for remote data chunks #99

Merged
merged 52 commits into from
Apr 1, 2025
Merged

Add local disk cache for remote data chunks #99

merged 52 commits into from
Apr 1, 2025

Conversation

sd109
Copy link
Member

@sd109 sd109 commented Feb 3, 2025

TODO:

  • Figure out how to test this (unit tests, integration tests, benchmark tests, all three?)
  • Make cache location (and maybe size?) configurable via Ansible
  • Check cache-related error handling and HTTP error responses

maxstack and others added 6 commits January 20, 2025 08:28
…Podman

- change Ansible collection from community.docker to containers.podman
- update all Ansible tasks to use podman instead of docker, we install podman-docker for a docker compatible CLI but we need to be podman specific in our Ansible playbook
- move group_vars/reductionist -> group_vars/all so all "reductionist_" prefixed vars can be used across plays, specifically Step and Reductionist
- update documentation
@sd109 sd109 marked this pull request as draft February 3, 2025 18:10
sd109 and others added 4 commits February 7, 2025 18:04
…ged user account that'll be running the Reductionist, and any other, containers.

We need linger enabled for the podman process to continue running after the user's logged out of the session.
Deployment documentation updated.
Reductionist can be used with https, either with the optional Ansible playbook Step deployment or with third party certificates. Documentation updated for installation of 3rd party certificates.
 * the on disk path to the cache
 * the TTL of cache entries
 * whether a cache hit refreshes the chunk's TTL
Reductionist and others added 4 commits March 4, 2025 15:10
Features:
 * configure a maximum size for the cache
 * pruned periodically so expired entries are purged and the cache files deleted
 * pruning removes excess files, that haven't yet expired, to keep within configured maximum cache size
…ctionist by using the ChunkCache object as an interfacing wrapper for ActiveStorageError.

Currently loads and saves state to disk, rather than keeping in memory, which could be a future performance improvement.
maxstack added 13 commits March 11, 2025 10:18
…he operation handler deciding which to use based on app settings.

Ideally move the decision even further up registering one S3 download handler in the shared state, the actual one used being based on app settings. This would allow a http handler to be registered in the same way. Proves "fun" with the lifetime of objects.
Change tests over to temporary directories so they'll never fail due to an existing directory.
Remove explicit cache wipe on exit, only used for test and not needed for temporary directories.
… coming from requests in a multi-threaded async API. This protects the cache from concurrent access and the request that downloaded the chunk doesn't have to wait for it to be committed to the cache before it can be returned to the client.
maxstack added 10 commits March 17, 2025 18:06
…will fit into the cache after it is pruned, we'll use this to ensure the chunk we're adding can be accomodated.

Add associated tests.
Add Reductionist build configuration to group_vars/all to:
 - Specify the location of the repo
 - Disable cloning to the repo location, this is useful if the location already exists with changes that you don't want to lose
@sd109 sd109 marked this pull request as ready for review March 21, 2025 16:54
@sd109 sd109 changed the title WIP: Add local disk cache for remote data chunks Add local disk cache for remote data chunks Mar 22, 2025
@sd109 sd109 force-pushed the feat/disk-cache branch from f64ef3d to 063a084 Compare March 24, 2025 23:20
@sd109 sd109 force-pushed the feat/disk-cache branch from db2c255 to 195e663 Compare March 27, 2025 19:25
@sd109 sd109 merged commit 5916eb2 into main Apr 1, 2025
8 checks passed
@sd109 sd109 deleted the feat/disk-cache branch April 1, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants