-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AJ-1347: entityQuery streams entities #2665
Conversation
@@ -48,6 +49,8 @@ class LocalEntityProviderSpec | |||
with MockitoTestUtils { | |||
import driver.api._ | |||
|
|||
implicit override val patienceConfig: PatienceConfig = PatienceConfig(timeout = scaled(Span(3000, Millis))) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some tests in this class were timing out intermittently; it was using the default timeout of 150ms. Increasing the timeout made it stable. The changes to streaming likely tweaked timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this looks like Scala. 😈
|
||
val entitySource = EntityStreamingUtils.gatherEntities(dataSource, dbSource) | ||
|
||
(metadata, entitySource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably works but the structure is confusing. I think there are actually 2 separate transactions:
- calls
dataAccess.entityQuery.loadEntityPageCounts
- calls
dataAccess.entityQuery.loadEntityPageSource
Structurally, it looks like 2 is within the transaction that includes 1 but it is not. Yet 2 uses information from 1 before 1 completes. I would like to see this broken up a little more cleanly.
Two |
} | ||
|
||
case _ => | ||
traceWithParent("loadEntityPage", parentContext) { childContext => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tracing does not encompass streaming the results, it probably just encompasses up until the start of streaming. Probably not much to do about it right now, I am just calling it out.
Ticket: https://broadworkbench.atlassian.net/browse/AJ-1347
Moves the
entityQuery
API to a streaming model, in an effort to reduce large memory spikes when users hit this API hard.Manually tested in a BEE, above and beyond the unit/swatomation tests.
Also, I inspected the response headers when running this branch vs. hitting dev directly. When running this branch, I see an extra
Transfer-Encoding: chunked
header, indicating a streaming response.High-Level Design
At a high level, the old code flow was:
route definition in EntityApiService -> EntityService.queryEntities -> LocalEntityProvider.queryEntities
this returned a fully materialized
EntityQueryResponse
, which was serialized to JSON and sent to the user.The new code flow is:
route definition in EntityApiService -> EntityService.queryEntitiesSource -> LocalEntityProvider.queryEntitiesSource
this returns a tuple of
(EntityQueryResultMetadata, Source[Entity])
. TheEntityQueryResultMetadata
is fully materialized, but theSource[Entity]
is a stream. This tuple is processed throughEntityStreamingUtils.createResponseSource
to incrementally send response bytes to the user.In this PR:
EntityStreamingUtils.createResponseSource
is newSqlStreamingAction
instead ofReadAction
EntityStreamingUtils.gatherEntities
method is extracted out ofEntityService
and modified to allow results which are not sorted strictly by entity id. This method still requires all rows of a given entity id to be contiguous in the result set - but they do not have to be sorted purely ascending or descending.PR checklist
model/
, then you should publish a new officialrawls-model
and updaterawls-model
in Orchestration's dependencies.