-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow to setup redis maxmemory #984
Conversation
This PR points to a serious issue: that the course structure cache can grow without bounds when many changes are made to the courses. The Am I getting this right? While I recognize that this is a serious issue, and I thank you for suggesting a fix, your approach has some important drawbacks:
I think that the "right" fix would be to not override the default timeout, as was suggested by @ormsbee here. But I'm open to other approaches. |
I didn't realize that Redis wouldn't evict something unless it had an explicit TTL set on it. I think it's fine to put a timeout on the structure cache entries, but it should be a long timeout (a week maybe?). I definitely wouldn't want to go to the default |
Thanks for commenting Dave. We can definitely bump the current timeout from 7200s to 1 week. Would you be open to a PR that removed the I can confirm that
|
@regisb: I was thinking this over last night. If having any timeout is sufficient to trigger Redis's LRU eviction policy, let's just set it for something that's absurdly high, like a year. I have on one occasion been saved by having structure cache entries > 1 month old. Tag me on that for a PR, and I'll merge it. |
I might be wrong, but I don't think this is the case. In Redis, LRU eviction will only work if there is a defined maximum memory value. I would like to avoid defining such a value. If we do define such a value, then LRU eviction will kick in even for keys that do not have an expiration time. Thus, defining a very large timeout will not resolve our issue. |
@regisb You are right, we would need to define the |
So if someone makes a PR that sets the timeout for structures to a week, I will merge it. But I really think that we do need to put some value for |
@regisb Would you agree on creating a patch instead for it? So people can define whatever parameter they need for their redis installation. Also, this fix can be more easily backported to other tutor versions instead of edx-platform versions. What do you think? |
Sorry for my slow response...
The flip side of that is that for smaller platforms, unused and useless cached entries would just accumulate until reaching the maxmemory, which would be a waste of resources.
Yes, absolutely. We need to make it possible to customize Redis settings. Here's what I propose:
That way, the default would be to not have a maxmemory for Redis, but people would be able to set it based on their requirements using a simple patch. |
That plan works for me, and I think it puts us in a better place than we are now.
Just to be clear, I mean to say that Redis should have a |
Yes, I agree, but I'm really wondering what this conservative number should be. Maybe 8G? According to my experiments, we can hardcode I opened an upstream PR here: openedx/edx-platform#34210 |
See the discussion here: #984 And the upstream PR here: openedx/edx-platform#34210 The tl;dr is that the Redis course structure cache was growing without bounds. While the upstream fix should resolve that issue, we decided that Tutor should have a maxmemory limit and an eviction policy set for operational safety.
See the discussion here: #984 And the upstream PR here: openedx/edx-platform#34210 The tl;dr is that the Redis course structure cache was growing without bounds. While the upstream fix should resolve that issue, we decided that Tutor should have a maxmemory limit and an eviction policy set for operational safety. To manually expire existing keys, run: tutor local run cms ./manage.py cms shell -c "from django.core.cache import caches; c = caches['course_structure_cache']; [c.expire(key, 604800) for key in c.keys('*')]"
I agree on the patch but the default |
OK, let's go with 4GB. |
See the discussion here: #984 And the upstream PR here: openedx/edx-platform#34210 The tl;dr is that the Redis course structure cache was growing without bounds. While the upstream fix should resolve that issue, we decided that Tutor should have a maxmemory limit and an eviction policy set for operational safety. Thus, Redis now has a 4gb maxmemory. If you need more memory on your instance, you should implement the "redis-conf" patch. To manually expire existing keys, run: tutor local run cms ./manage.py cms shell -c "from django.core.cache import caches; c = caches['course_structure_cache']; [c.expire(key, 604800) for key in c.keys('*')]"
See discussion here: overhangio/tutor#984 This is a breaking change that will explicitely set the timeout of course structure cache entries to 1 week, instead of being unlimited. If you wish to revert to the former behaviour, you should set `course_structure_cache["TIMEOUT"] = None`. The course structure cache keys were previously set with an explicit timeout of `None` which was overriding the cache default timeout of 2h. This was OK in environments where the cache is configured with a maximum memory limit and an automatic key eviction strategy. But in others (such as Tutor), the course structure cache could grow infinitely. It was agreed that course structure cache keys should be long-lived but should respect the default cache structure timeout. Thus, we set here the TTL to 1 week. We can also configure Tutor to use a cache eviction policy. But that means we need to set a `maxmemory` value in Redis. It's not possible to set a value that will be appropriate for everyone: - if it's higher than the total memory (e.g: in small instances), server will crash before the cache is filled. - if it's too low (e.g: in large instances), the whole platform will abruptly slow down as many cache entries are suddenly evicted. That question of whether Tutor should define a cache eviction policy is still under discussion, but it can be resolved independently of this change.
See discussion here: overhangio/tutor#984 This is a breaking change that will explicitely set the timeout of course structure cache entries to 1 week, instead of being unlimited. If you wish to revert to the former behaviour, you should set `course_structure_cache["TIMEOUT"] = None`. The course structure cache keys were previously set with an explicit timeout of `None` which was overriding the cache default timeout of 2h. This was OK in environments where the cache is configured with a maximum memory limit and an automatic key eviction strategy. But in others (such as Tutor), the course structure cache could grow infinitely. It was agreed that course structure cache keys should be long-lived but should respect the default cache structure timeout. Thus, we set here the TTL to 1 week. We can also configure Tutor to use a cache eviction policy. But that means we need to set a `maxmemory` value in Redis. It's not possible to set a value that will be appropriate for everyone: - if it's higher than the total memory (e.g: in small instances), server will crash before the cache is filled. - if it's too low (e.g: in large instances), the whole platform will abruptly slow down as many cache entries are suddenly evicted. That question of whether Tutor should define a cache eviction policy is still under discussion, but it can be resolved independently of this change.
See the discussion here: #984 And the upstream PR here: openedx/edx-platform#34210 The tl;dr is that the Redis course structure cache was growing without bounds. While the upstream fix should resolve that issue, we decided that Tutor should have a maxmemory limit and an eviction policy set for operational safety. Thus, Redis now has a 4gb maxmemory. If you need more memory on your instance, you should implement the "redis-conf" patch. To manually expire existing keys, run: tutor local run cms ./manage.py cms shell -c "from django.core.cache import caches; c = caches['course_structure_cache']; [c.expire(key, 604800) for key in c.keys('*')]"
See the discussion here: #984 And the upstream PR here: openedx/edx-platform#34210 The tl;dr is that the Redis course structure cache was growing without bounds. While the upstream fix should resolve that issue, we decided that Tutor should have a maxmemory limit and an eviction policy set for operational safety. Thus, Redis now has a 4gb maxmemory. If you need more memory on your instance, you should implement the "redis-conf" patch. To manually expire existing keys, run: tutor local run cms ./manage.py cms shell -c "from django.core.cache import caches; c = caches['course_structure_cache']; [c.expire(key, 604800) for key in c.keys('*')]"
Closed by #1001. |
See discussion here: overhangio/tutor#984 This is a breaking change that will explicitely set the timeout of course structure cache entries to 1 week, instead of being unlimited. If you wish to revert to the former behaviour, you should set `course_structure_cache["TIMEOUT"] = None`. The course structure cache keys were previously set with an explicit timeout of `None` which was overriding the cache default timeout of 2h. This was OK in environments where the cache is configured with a maximum memory limit and an automatic key eviction strategy. But in others (such as Tutor), the course structure cache could grow infinitely. It was agreed that course structure cache keys should be long-lived but should respect the default cache structure timeout. Thus, we set here the TTL to 1 week. We can also configure Tutor to use a cache eviction policy. But that means we need to set a `maxmemory` value in Redis. It's not possible to set a value that will be appropriate for everyone: - if it's higher than the total memory (e.g: in small instances), server will crash before the cache is filled. - if it's too low (e.g: in large instances), the whole platform will abruptly slow down as many cache entries are suddenly evicted. That question of whether Tutor should define a cache eviction policy is still under discussion, but it can be resolved independently of this change.
See discussion here: overhangio/tutor#984 This is a breaking change that will explicitely set the timeout of course structure cache entries to 1 week, instead of being unlimited. If you wish to revert to the former behaviour, you should set `course_structure_cache["TIMEOUT"] = None`. The course structure cache keys were previously set with an explicit timeout of `None` which was overriding the cache default timeout of 2h. This was OK in environments where the cache is configured with a maximum memory limit and an automatic key eviction strategy. But in others (such as Tutor), the course structure cache could grow infinitely. It was agreed that course structure cache keys should be long-lived but should respect the default cache structure timeout. Thus, we set here the TTL to 1 week. We can also configure Tutor to use a cache eviction policy. But that means we need to set a `maxmemory` value in Redis. It's not possible to set a value that will be appropriate for everyone: - if it's higher than the total memory (e.g: in small instances), server will crash before the cache is filled. - if it's too low (e.g: in large instances), the whole platform will abruptly slow down as many cache entries are suddenly evicted. That question of whether Tutor should define a cache eviction policy is still under discussion, but it can be resolved independently of this change.
See discussion here: overhangio/tutor#984 This is a breaking change that will explicitely set the timeout of course structure cache entries to 1 week, instead of being unlimited. If you wish to revert to the former behaviour, you should set `course_structure_cache["TIMEOUT"] = None`. The course structure cache keys were previously set with an explicit timeout of `None` which was overriding the cache default timeout of 2h. This was OK in environments where the cache is configured with a maximum memory limit and an automatic key eviction strategy. But in others (such as Tutor), the course structure cache could grow infinitely. It was agreed that course structure cache keys should be long-lived but should respect the default cache structure timeout. Thus, we set here the TTL to 1 week. We can also configure Tutor to use a cache eviction policy. But that means we need to set a `maxmemory` value in Redis. It's not possible to set a value that will be appropriate for everyone: - if it's higher than the total memory (e.g: in small instances), server will crash before the cache is filled. - if it's too low (e.g: in large instances), the whole platform will abruptly slow down as many cache entries are suddenly evicted. That question of whether Tutor should define a cache eviction policy is still under discussion, but it can be resolved independently of this change.
Description
In some of our production environments, we have experienced an incremental use of Redis capacity. After some research we found that this code (with no major changes since the nutmeg release) was storing course structure data in Redis without TTL as it the expected cache layer was
memcached
and it's configured by default to use the LRU eviction policy.The above code stored the data in the following key format:
course_structure.{{db}}{{version_id}}
so there was a lot of outdated/unused data (due to changes in the courses) stored in Redis indefinitely.Note: Although the majority of the keys stored in Redis were associated with sessions, those already had a TTL configured and didn't represent the majority of the data. It was estimated (using a sample) that every key in the
course_structure
namespace has 0.3 MB of data (for small courses) and there was around 54k keys there.To configure an eviction policy in Redis we need to configure two params:
maxmemory
: Setups the Redis memory usage limit.maxmemory-policy
: Setups the eviction policy used by Redis when themaxmemory
limit is reached.See Key eviction for more information.
The problem was discussed here in the Juniper release, but the problem is the same.
This PR solves that issue by configuring the above parameters.