-
Notifications
You must be signed in to change notification settings - Fork 60
[SCIM 4/4]: Draw the rest of the owl #9276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The majority of this PR fills out the CrdbScimProviderStore implementation that the scim2-rs crate's Provider object will use to implement SCIM where CRDB is the durable store for users and groups, and adds a boat load of integration tests. The notion of a silo user being "active" or not has also been added to support users being deactivated by the SCIM client. Non-SCIM silo users should not be affected.
| for user in users { | ||
| let groups = self | ||
| .get_user_groups_for_user_in_txn(conn, user.identity.id.into()) | ||
| .await?; | ||
|
|
||
| let SiloUser::Scim(user) = user.into() else { | ||
| // With the user provision type filter, this should never be | ||
| // another type. | ||
| unreachable!(); | ||
| }; | ||
|
|
||
| returned_users.push(convert_to_scim_user(user, groups)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be able to be done in one query with the list of user IDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow on PR by @david-crespo in #9325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is classified as a performance optimization I am inclined to say that we should punt on the addition of a follow on PR until after R17 is cut. The main reasoning behind this is all of the testing done so far by me, James, Angela, and Jay has been against the current implementation and the window for R17 is closing asap.
I would like to give the perf optimaztion more time to soak as we already know some of this code will have to change as we have to fix oxidecomputer/scim2-rs#7 relatively soon anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about running 1000+ queries in a single GET request — has that been exercised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this gist we compared the time to list users before and after my changes and it was 75-80% faster after the optimization, at least on my machine. However both response times were tolerable — the slower one was 260-290ms. The fast one was 62-65ms. On @papertigers machine the slow version was more like 800-900ms.
https://gist.github.com/papertigers/5aa9f341797e0a82e66c3f8188c9701f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified one of the integration tests to do the following:
- Create 1000 users
- Create a group with those 1000 users as member
- Captured the time it takes to hit the GET
/Usersendpoint
On my machine over multiple runs it seemed to take ~700-900ms.
Again, I think this should be improved upon and I suspect many optimizations can be made here. Regardless I think some of this code will need to be modified anyways to support pagination.
Thoughts on having this go in as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to get this landed as is and follow up with tuning work afterwards to unblock customers waiting to validate their integration workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if it's tolerable for now and considered a starting point anyway then that's fine.
| let SiloUser::Scim(user) = user.into() else { | ||
| // With the user provision type filter, this should never be another | ||
| // type. | ||
| unreachable!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to 500 here? This will be a 500, I guess, but doing it explicitly lets you stick a message on it. This could be an argument against using diesel::result::Error as your error type for these functions. Most of the datastore functions return external::Error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started prototyping a new error type that can be returned however the transaction retry calls are expecting a diesel::result::Error . So I don't think we can return another type there, which is probably fine since we do have the filter on the query already.
| let maybe_other_user = dsl::silo_user | ||
| .filter(dsl::silo_id.eq(self.authz_silo.id())) | ||
| .filter(dsl::user_provision_type.eq(model::UserProvisionType::Scim)) | ||
| .filter(lower(dsl::user_name).eq(lower(user_request.name.clone()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the DB itself isn't enforcing case-sensitive uniqueness, is it possible that two concurrent requests could write different cases of the same name at the same time? You might want a unique index like CREATE UNIQUE INDEX ON silo_user (silo_id, LOWER(user_name)) WHERE user_provision_type = 'scim' AND time_deleted IS NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think two concurrent requests could race in this way because of the other calls to lower in the filters but a belt and suspenders approach here is the right one.
done in 6285129
| // v | ||
| // KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"), | ||
| KnownVersion::new(203, "scim-users-and-groups-lower"), | ||
| KnownVersion::new(202, "scim-actor-audit-log"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I didn't realize this PR already touched this and can probably collapse it into one version bump. Will fix this soonish when back at the keyboard.
The majority of this PR fills out the CrdbScimProviderStore implementation that the scim2-rs crate's Provider object will use to implement SCIM where CRDB is the durable store for users and groups, and adds a boat load of integration tests.
The notion of a silo user being "active" or not has also been added to support users being deactivated by the SCIM client. Non-SCIM silo users should not be affected.