Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AJ-1347: entityQuery streams entities #2665

Merged
merged 19 commits into from
Jan 17, 2024

Conversation

davidangb
Copy link
Contributor

@davidangb davidangb commented Dec 13, 2023

Ticket: https://broadworkbench.atlassian.net/browse/AJ-1347

Moves the entityQuery API to a streaming model, in an effort to reduce large memory spikes when users hit this API hard.

Manually tested in a BEE, above and beyond the unit/swatomation tests.

Also, I inspected the response headers when running this branch vs. hitting dev directly. When running this branch, I see an extra Transfer-Encoding: chunked header, indicating a streaming response.

High-Level Design

At a high level, the old code flow was:
route definition in EntityApiService -> EntityService.queryEntities -> LocalEntityProvider.queryEntities
this returned a fully materialized EntityQueryResponse, which was serialized to JSON and sent to the user.

The new code flow is:
route definition in EntityApiService -> EntityService.queryEntitiesSource -> LocalEntityProvider.queryEntitiesSource
this returns a tuple of (EntityQueryResultMetadata, Source[Entity]). The EntityQueryResultMetadata is fully materialized, but the Source[Entity] is a stream. This tuple is processed through EntityStreamingUtils.createResponseSource to incrementally send response bytes to the user.

In this PR:

  • EntityStreamingUtils.createResponseSource is new
  • multiple methods were moved to make them composable and accessible to calling code
  • multiple db-related methods were tweaked to return SqlStreamingAction instead of ReadAction
  • The EntityStreamingUtils.gatherEntities method is extracted out of EntityService and modified to allow results which are not sorted strictly by entity id. This method still requires all rows of a given entity id to be contiguous in the result set - but they do not have to be sorted purely ascending or descending.

PR checklist

  • Include the JIRA issue number in the PR description and title
  • Make sure Swagger is updated if API changes
    • ...and Orchestration's Swagger too!
  • If you changed anything in model/, then you should publish a new official rawls-model and update rawls-model in Orchestration's dependencies.
  • Get two thumbsworth of PR review
  • Verify all tests go green, including CI tests
  • Squash commits and merge to develop (branches are automatically deleted after merging)
  • Inform other teams of any substantial changes via Slack and/or email

@@ -48,6 +49,8 @@ class LocalEntityProviderSpec
with MockitoTestUtils {
import driver.api._

implicit override val patienceConfig: PatienceConfig = PatienceConfig(timeout = scaled(Span(3000, Millis)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some tests in this class were timing out intermittently; it was using the default timeout of 150ms. Increasing the timeout made it stable. The changes to streaming likely tweaked timing.

@davidangb davidangb marked this pull request as ready for review January 11, 2024 15:18
Copy link
Contributor

@jladieu jladieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this looks like Scala. 😈


val entitySource = EntityStreamingUtils.gatherEntities(dataSource, dbSource)

(metadata, entitySource)
Copy link
Contributor

@dvoet dvoet Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably works but the structure is confusing. I think there are actually 2 separate transactions:

  1. calls dataAccess.entityQuery.loadEntityPageCounts
  2. calls dataAccess.entityQuery.loadEntityPageSource

Structurally, it looks like 2 is within the transaction that includes 1 but it is not. Yet 2 uses information from 1 before 1 completes. I would like to see this broken up a little more cleanly.

@davidangb
Copy link
Contributor Author

Two 504 Gateway Time-out errors in swatomation; I believe these are unrelated. Jenkins retest.

}

case _ =>
traceWithParent("loadEntityPage", parentContext) { childContext =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tracing does not encompass streaming the results, it probably just encompasses up until the start of streaming. Probably not much to do about it right now, I am just calling it out.

@davidangb davidangb merged commit 8bfdb89 into develop Jan 17, 2024
10 checks passed
@davidangb davidangb deleted the da_AJ-1347_streamingEntityQuery branch January 17, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants