-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk cache failure with large db sizes #7793
Comments
I guess you already have
What is your platform? |
Is it feasible to provide an option to not use the ldb disk cache at all and allow us to rely on memcache exclusively? Our main platforms are RHEL8 & 9. Thanks again |
No. Cache is the only way a responder can get info from a backend: https://sssd.io/contrib/architecture.html
What is the real use case that is slow? Could you maybe profile an issue in your env? |
What I saw in that ticket is described in https://bugzilla.redhat.com/show_bug.cgi?id=1886492#c6 |
Hi,
We have a relatively large AD deployment with provider = ldap and are severely affected by this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1886492
we have tried to mitigate the issue with:
Despite this we are still running into cases where certain hosts that see access from a higehr number of users (nfs servers) grow the database too quickly despite the optimisations above, this is the current performance for a user lookup when memcache expires, and it gets progressively worse until it can't return queries anymore:
id user, db 22M -> 7.0s
id user, db 43M -> 14s
id user, db 100M -> 30s
We were counting on purging the disk cache frequently enough with ldap_purge_cache_timeout, but we found that is that once the db reaches a certain size, the ldap purge process is unable to complete (as if it times out, there does not seem to be any detailed information even with the highest debug level on the ldap backend). So the db does not shrink, and the purge process also is a blocking operation that hangs queries while it runs, so running it frequently is less than ideal.
Ultimately with db growing further, sssd becomes unresponsive and the only way to recover is to delete the disk cache manually and restart the service.
We have had a case open with Red Hat for over a year and they open an internal case SSSD-5812 to which we have no access and have been given no updates in this time period.
We understand that the disk cache performance might be related to missing indexes as specified in https://bugzilla.redhat.com/show_bug.cgi?id=1886492 but it's not clear why this was marked as CLOSED WONTFIX or if there is a plan to resolve.
Ultimately for us it would be acceptable to have an option to disable the disk cache completely and rely exclusively on memcache, but I understand that is not supported currently? Or if the cache purge timeout issue can be resolved that would also work for us.
Any suggestions appreciated
Thanks
The text was updated successfully, but these errors were encountered: