Skip to content

Commit 5916eb2

Browse files
sd109maxstackReductionist
authored
Add local disk cache for remote data chunks (#99)
* WIP: cached crate experiments * Add Bytes serde feature * Functional disk cache implementation * Add prometheus counter for cache misses * Ensure client ID is included in chunk cache lookup key * Refactor to allow reinstating resource manager * Update the podman installation play to enable linger on the unprivileged user account that'll be running the Reductionist, and any other, containers. We need linger enabled for the podman process to continue running after the user's logged out of the session. Deployment documentation updated. * Allow Prometheus and HAProxy to be installed as the non-privileged user. Reductionist can be used with https, either with the optional Ansible playbook Step deployment or with third party certificates. Documentation updated for installation of 3rd party certificates. * Implement an object chunk cache on disk where we can configure: * the on disk path to the cache * the TTL of cache entries * whether a cache hit refreshes the chunk's TTL * Refactor the chunk cache so it stores each chunk in a separate file. Features: * configure a maximum size for the cache * pruned periodically so expired entries are purged and the cache files deleted * pruning removes excess files, that haven't yet expired, to keep within configured maximum cache size * Add example of simplified cache implementation * Add SimpleChunkCache pruning, with tests, and integrate with the Reductionist by using the ChunkCache object as an interfacing wrapper for ActiveStorageError. Currently loads and saves state to disk, rather than keeping in memory, which could be a future performance improvement. * Remove unneeded dependencies, added during development of the caching feature * Provide a S3 download handler and a cached S3 download handler with the operation handler deciding which to use based on app settings. Ideally move the decision even further up registering one S3 download handler in the shared state, the actual one used being based on app settings. This would allow a http handler to be registered in the same way. Proves "fun" with the lifetime of objects. * Panic if the cache directory already exists. Change tests over to temporary directories so they'll never fail due to an existing directory. * Remove test portion relating to cache wipe on init. Remove explicit cache wipe on exit, only used for test and not needed for temporary directories. * Channel all cache commits through a mpsc channel to serialise commits coming from requests in a multi-threaded async API. This protects the cache from concurrent access and the request that downloaded the chunk doesn't have to wait for it to be committed to the cache before it can be returned to the client. * Allow configuration of the pruning interval, in seconds, of cache chunks based on ttl. * Make "cargo clippy" happy. * Configure chunk cache test deployment. * Deploy a local build of the Reductionist. * Update Reductionist tag. * Tweak what I think I think is needed to deploy our local build. * Correct the volume setup for the chunk cache. * Tweak Reductionist deployment in group vars. * Tweak the Reductionist's ansible build so we can: - control where the Reductionist is checked out - control the clone altogether, disable it and we'll use an existing checkout without overwriting local changeswq * Allow the tokio mpsc buffer size to be configured, the number of chunks we'll queue up before blocking requests. * Add the chunk cache queue size to the group_vars. * Print out our command line args. * Add "headroom_bytes" to the pruning so we can ensure this many bytes will fit into the cache after it is pruned, we'll use this to ensure the chunk we're adding can be accomodated. Add associated tests. * Tidy up adding code documentation. * Change over to tempfile::TempDir for temporary directory creation due to a security issue with tempdir::TempDir. * Change boolean assert_eq to assert to keep clippy happy. * Remove duplicated cache error. * Debug trait no longer needed on ResourceManager. * Fix for group_vars/all * More sensible default for the cache path that works with our without the chunk cache enabled. * Misc fixes * Re-adopt an existing cache. * Fix broken resource manager memory permits implementation * Add S3 client auth check to cached object * Run compliance suite with cache enabled as part of CI test suite * Add chunk_cache_bypass_auth flag * Update documentation inline with the chunk cache additions * Fix minor typos --------- Co-authored-by: Max Norton <[email protected]> Co-authored-by: Reductionist <[email protected]>
1 parent 2740d92 commit 5916eb2

17 files changed

+1621
-52
lines changed

Diff for: .github/workflows/pull-request.yml

+40-6
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,13 @@ jobs:
9999
sleep 1;
100100
done
101101
102+
- name: Create artifacts directory
103+
run: mkdir artifacts
104+
105+
#####
106+
# Test without local disk cache
107+
#####
108+
102109
- name: Run active storage container
103110
run: make run
104111

@@ -108,16 +115,45 @@ jobs:
108115
sleep 1;
109116
done
110117
111-
- name: Create artifacts directory
112-
run: mkdir artifacts
113-
114118
- name: Run compliance test suite
115119
run: pytest -s > artifacts/pytest.log
116120

117121
- name: Get active storage logs
118122
run: docker logs reductionist > artifacts/reductionist.log
119123
if: always()
120124

125+
- name: Stop active storage container
126+
run: make stop
127+
if: always()
128+
129+
#####
130+
# Test with local disk cache
131+
#####
132+
133+
- name: Run active storage container with local disk cache
134+
run: make run-with-cache
135+
136+
- name: Wait for active storage server to start
137+
run: |
138+
until curl -if http://localhost:8080/.well-known/reductionist-schema; do
139+
sleep 1;
140+
done
141+
142+
- name: Run compliance test suite
143+
run: pytest -s > artifacts/pytest-with-cache.log
144+
145+
- name: Get active storage logs
146+
run: docker logs reductionist > artifacts/reductionist-with-cache.log
147+
if: always()
148+
149+
- name: Stop active storage container
150+
run: make stop
151+
if: always()
152+
153+
#####
154+
# Clean up steps
155+
#####
156+
121157
- name: Upload artifacts
122158
uses: actions/upload-artifact@v4
123159
with:
@@ -129,9 +165,6 @@ jobs:
129165
run: scripts/minio-stop
130166
if: always()
131167

132-
- name: Stop active storage container
133-
run: make stop
134-
if: always()
135168
deployment-test:
136169
runs-on: ubuntu-latest
137170
steps:
@@ -183,6 +216,7 @@ jobs:
183216
sudo ip a
184217
sudo ip r
185218
if: failure()
219+
186220
dependency-review:
187221
runs-on: ubuntu-latest
188222
if: github.event_name == 'pull_request'

0 commit comments

Comments
 (0)