Description
When we first implemented Warehouse, we could pretty easily handle purging all of like, /simple/
or all of */json
, or really the entire site at once without too much issue.
In the intervening years, our traffic has grown a bit, and these keys really have too much blast radius for us to safely use them. Purging all of /simple/
for instance would likely crush our backends to a halt. We recently had to purge all of the project/$NAME
keys for all of PyPI, and @ewdurbin did that by iterating over all of the projects over a 2-3 hour time span, and even that was impacting the health of our backends.
In the normal course of action, we rarely need such large scale purges, but we do need to do them from time to time, and we should make sure that our solutions for doing so are still safe to do at our scale.
A few ideas:
- Ditch the large scale surrogate keys, and use something similar to the sitemap bucket, put each thing into a set bucket, and add tooling to purge by bucket.
- This has the problem that we have to decide up front (indirectly at least) how big our buckets need to be, but the proper size for our buckets will change over time as we get more traffic.
- Provide tooling to basically do what @ewdurbin did today, take our entire project space, and do individual purges, with tunable parameters to control the blast radius.
We also probably want to consider providing more fine grained surrogate keys than we currently do. We can do an individual purge for a project, but that purges everything for that project, even unaffected stuff. We probably want a per project key for each "category of endpoint", so like a json/$project
key, and a simple/$project
key, and a ui/$project
key.