Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Allow incremental sync from CLI #3171

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zack-codaio
Copy link
Contributor

Very non-elegant first stab at this. Mostly looking for directional confirmation before I think about how to clean up types and structure it better.

Some assumptions (please correct me if these are off base):

  • If we do MyTable>incremental, we'll run the continuation loop for the sync. If we see a completion, we'll shove that into the execution context and stop. Is it possible for a sync formula to return a continuation and a completion in the same shot?
  • If we have a chained incremental sync, we'll just run the sync one more time after the initial sync. We don't go above that.
  • In contrast with executeGetPermissions, this chained command isn't actually happening with the set of results in each continuation call. Instead, we are letting all of the continuations run and then doing another sync call.

Some follow-ups:

  • Chained incremental works in the test case, but I need to actually check it with a "real" pack through the actual CLI. I'm not sure if we should be baking in more handling for completions inside of execution.ts itself, it feels like a lot of the handling is actually in the fake pack and just assuming that there's a bit in the executionContext.
  • Because of where the chaining happens (after the full sync, rather than interleaved with continuation chunks) I need to update the types. We want to return Sync as a valid formula type from parsing the chained calls, but we don't want to pass it to the function that operates on continuation chunks.

Copy link
Collaborator

@jonathan-codaio jonathan-codaio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is reasonable

@@ -1097,7 +1140,7 @@ async function chainCommandOnSyncResult({
}: {
rows: any[];
formulaSpecification: SyncFormulaSpecification;
chainedCommand: ChainedCommandFormulaSpecification;
chainedCommand: GetPermissionsFormulaSpecification;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth seeing if you can still accept the full type and have a case here that calls into executeSyncFormulaWithContinuations(), so doing mutual recursion. If it's clean that seems conceptually reasonable. If it's ugly we can stick with something closer to what you have but clean up the names and types so it's clearer that not all chain types are covered by this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue I was seeing was that this chainCommandOnSyncResult() function is actually being called on chunks defined by continuations rather than on completions

I'm leaning towards having getFormulaSpecAndChainedCommand() return some bit that specifies whether the chained command should be happening on each continuation page (like we do with permissions) or if it should happen after the full first sync up to a completion happens. If we do that, then I don't know if we want to put those effects into the same function, since they'll be called at different times and you'd have to know that to work within the function properly.

Type wise, getFormulaSpecAndChainedCommand() could still return ChainedCommandFormulaSpecification, but then maybe that would be a union type of something like InterleavedChainedCommandFormulaSpecification (GetPermissions for now) and SubsequentChainedCommandFormulaSpecification (Sync)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah great observation. I hadn't noticed the difference that one does chaining by interleaving and one does chaining only at the end. That does complicate the conceptual model and reuse of the existing code.

I think you're on the right track to add a dimension that distinguishes interleaving from post-completion, and with dedicated handler functions for each case I think the opportunities for recursion should become clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing getFormulaSpecAndChainedCommands() to return an interleavedChainedCommand and a subsequentChainedCommand, and returning the command in the right slot based on the formula type.

Then, when those commands are provided, we can pass them through to the continuation loop or the completion loop. Also pulled all of that junk into its own function again executeSyncFormulaWithOptionalChaining() since it was getting a little unwieldy to keep in the main function, and pulling it out makes printing the right thing easier (just print whatever we choose to return).

@@ -260,6 +262,30 @@ export async function executeFormulaOrSyncFromCLI({
bundlePath,
});
printFull(chainedCommand ? chainedCommandResults : result);
} else if (formulaSpecification.type === FormulaType.Sync && chainedCommand?.type === FormulaType.Sync) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hmm maybe we should move to a new enum for chained command types, it's not that intuitive that Sync chained with Sync means "incremental sync", at least to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one comment that I haven't addressed directly. AFAICT, the default sync (at least through the simulator) does just stop if we hit a completion without a continuation, so I'm not sure if there's currently a way to just say "run the sync, looping through all continuations AND completions until there's actually nothing left"

Given that, the default Sync seems to be "run the Sync through the first continuation loop", and Sync > Sync really is what we're doing, just with a different executionContext in the second run.

That being said, I could definitely change the chain from >incremental to >sync if you think that would be clearer.

Also open to adding some new enum or type if you think that's the way to go, it just felt a little bit better to me to operate with the existing formula specifications.

@@ -1097,7 +1140,7 @@ async function chainCommandOnSyncResult({
}: {
rows: any[];
formulaSpecification: SyncFormulaSpecification;
chainedCommand: ChainedCommandFormulaSpecification;
chainedCommand: GetPermissionsFormulaSpecification;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue I was seeing was that this chainCommandOnSyncResult() function is actually being called on chunks defined by continuations rather than on completions

I'm leaning towards having getFormulaSpecAndChainedCommand() return some bit that specifies whether the chained command should be happening on each continuation page (like we do with permissions) or if it should happen after the full first sync up to a completion happens. If we do that, then I don't know if we want to put those effects into the same function, since they'll be called at different times and you'd have to know that to work within the function properly.

Type wise, getFormulaSpecAndChainedCommand() could still return ChainedCommandFormulaSpecification, but then maybe that would be a union type of something like InterleavedChainedCommandFormulaSpecification (GetPermissions for now) and SubsequentChainedCommandFormulaSpecification (Sync)

@@ -260,6 +262,30 @@ export async function executeFormulaOrSyncFromCLI({
bundlePath,
});
printFull(chainedCommand ? chainedCommandResults : result);
} else if (formulaSpecification.type === FormulaType.Sync && chainedCommand?.type === FormulaType.Sync) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one comment that I haven't addressed directly. AFAICT, the default sync (at least through the simulator) does just stop if we hit a completion without a continuation, so I'm not sure if there's currently a way to just say "run the sync, looping through all continuations AND completions until there's actually nothing left"

Given that, the default Sync seems to be "run the Sync through the first continuation loop", and Sync > Sync really is what we're doing, just with a different executionContext in the second run.

That being said, I could definitely change the chain from >incremental to >sync if you think that would be clearer.

Also open to adding some new enum or type if you think that's the way to go, it just felt a little bit better to me to operate with the existing formula specifications.

@@ -426,6 +445,15 @@ export function makeFormulaSpec(manifest: BasicPackDefinition, formulaNameInput:
formulaName: formulaOrSyncName,
};
}
if (metadataFormulaTypeStr === 'incremental') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on if this should be here. I was originally hard coding this formulaSpec return in getFormulaSpecAndChainedCommands(), but if I put the parsing here then it's colocated with the :permissions parsing and we can just always run makeFormulaSpec() a single time.

The negative is that doing an explicit MyTableName:incremental doesn't really do what you might expect, which in my mind would be to run the incremental sync with a pre-specified completion in the executionContext. What actually happens is that there's not currently a way to specify the completion from the CLI (AFAICT), and we'll just do a regular Sync, and the :incremental part is kind of a noop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to have this colocated here. If you're able to detect that MyTableName:incremental was called directly and not via chaining, you can just have it throw an error for now and indicate that direct invocation is not yet supported, but add a TODO and that's something we can consider adding if useful. (You'd probably just have to pass a JSON string representing the completion on the CLI like we do for other complex formula types.)

@@ -1097,7 +1140,7 @@ async function chainCommandOnSyncResult({
}: {
rows: any[];
formulaSpecification: SyncFormulaSpecification;
chainedCommand: ChainedCommandFormulaSpecification;
chainedCommand: GetPermissionsFormulaSpecification;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing getFormulaSpecAndChainedCommands() to return an interleavedChainedCommand and a subsequentChainedCommand, and returning the command in the right slot based on the formula type.

Then, when those commands are provided, we can pass them through to the continuation loop or the completion loop. Also pulled all of that junk into its own function again executeSyncFormulaWithOptionalChaining() since it was getting a little unwieldy to keep in the main function, and pulling it out makes printing the right thing easier (just print whatever we choose to return).

@zack-codaio zack-codaio force-pushed the zack-incremental-sync-via-cli branch from 1ca2714 to 8ed1762 Compare March 7, 2025 19:10
Copy link
Collaborator

@jonathan-codaio jonathan-codaio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, overall looks good, I think mostly it's just about hardening the types but the implementation is looking nice.

@@ -1044,7 +1149,8 @@ async function executeSyncFormulaWithContinuations({
);
}
result.push(...response.result);
if (chainedCommand) {
// Skip Sync here, since we'll handle incremental syncs after this full sync loop happens
if (chainedCommand && chainedCommand.type !== FormulaType.Sync) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you type chainedCommand at line 1117 as InterleavedChainedCommandFormulaSpecification then you can just delete the extra check here since if you have a chained command here it'll be guaranteed to be applicable. This is already the case at runtime, just need to tighten up the types.

@@ -426,6 +445,15 @@ export function makeFormulaSpec(manifest: BasicPackDefinition, formulaNameInput:
formulaName: formulaOrSyncName,
};
}
if (metadataFormulaTypeStr === 'incremental') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to have this colocated here. If you're able to detect that MyTableName:incremental was called directly and not via chaining, you can just have it throw an error for now and indicate that direct invocation is not yet supported, but add a TODO and that's something we can consider adding if useful. (You'd probably just have to pass a JSON string representing the completion on the CLI like we do for other complex formula types.)

Comment on lines +344 to +345
interleavedChainedCommand?: InterleavedChainedCommandFormulaSpecification;
subsequentChainedCommand?: SubsequentChainedCommandFormulaSpecification;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would just beef up the types here. It'll be a little verbose but it will ultimately both model the behavior more directly and be more apparent from the types that incremental is involved.

There's a comment that says We will only return at most a single command across these two types which is generally signal to user a union type, which is more overhead, but it makes this self-enforcing. So we can have something like:

enum ChainedCommandType {
  Interleaved = 'Interleaved'
  Subsequent = 'Subsequent'
}

interface InterleavedChainedCommand {
  type: ChainedCommandType.Interleaved;
  // I think we can dispense with InterleavedChainedCommandFormulaSpecification now, for our
  // current purposes ChainedCommand defined below is our new union type, but if we add more
  // chained command in the future the formulaSpec here can expand to a union type.
  formulaSpec: GetPermissionsFormulaSpecification;
}

interface SubsequentChainedCommand {
  type: ChainedCommandType.Subsequent;
  // These two fields can later become part of another union type if there are more subsequent
  // commands in the future. The point here is that the command text 'incremental' is
  // bound together wit ha sync formula spec, so you can do business logic by looking
  // for the command called 'incremental' which is really clear and explicit, and still use
  // the existing formulaSpec to pass around to existing helpers.
  commandText: 'incremental';
  formulaSpec: SyncFormulaSpecification;
}

type ChainedCommand = InterleavedChainedCommand | SubsequentChainedCommand;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants