-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Background
We've had an ecosystem running for a couple of weeks now, and there are 17774 run records in the couchDB database.
This amount was unexpected, and after investigating, I found our RAS cleanup process wasn't working correctly, which is now resolved and will delete ~11k rows out of the 17k+, HOWEVER, the reason I started looking into this was due to a huge performance slowdown for API calls to the ecosystem. This would often result in 504s and group runs failing within our polling CI run. Our processes absolutely hammer the API endpoints and we need them to be performant.
The ecosystem should be able to handle having 20k+ records in the Db.
I think the key thing here, is that we are constantly querying the DSS and RAS using:
grouprunNamerunIdfromto- we always want
detail=methods
via the RAS API, which will map to db fields
....and then we often query via our eclipse plugin on requestor and owner, then maybe on certain tags.
We also search regularly on streamName via the Streams API, which will relate to a db field.
Additionally on namespace & propertyName via the CPS API, which will relate to db fields.
I suspect no form of indexing has been set up on any of fields that are regularly queried by customers, this will result in a full db scan, which would explain their poor performance. CouchDB sets up a default "Primary Index" on the document ID (i.e. runId), which will explain why its very quick when searching on that.
There are a number of ways to set up indexing in CouchDB, which are described well in this blog.
Some evidence:
- 10.3s (!!!) for a
GETonhttps://<server>/api/ras/runs?runname=U134115 - 483ms for a
GETon the same run but usinghttps://<server>/api/ras/runs?runId=cdb-db069dde-a163-40da-ae5e-ccd6910cf24d-1766079846823-U134115 - 8.6s for a
GETon the group that contains the above run usinghttps://<server>/api/ras/runs?group=yueeYxJiFl
Tasks
- <task>
Metadata
Metadata
Assignees
Labels
Type
Projects
Status