Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalogDiff object gives incomplete information about primary key changes #16092

Closed
teallarson opened this issue Aug 29, 2022 · 11 comments · Fixed by airbytehq/airbyte-protocol#50
Assignees
Labels
team/compose type/bug Something isn't working

Comments

@teallarson
Copy link
Contributor

teallarson commented Aug 29, 2022

When I change a primary key in a source that has an existing connection and then run a refresh, I get back the following in connection.cataglogDiff:

Screen Shot 2022-08-26 at 8.09.24 PM.png

which doesn't give much useful information about how the table changed in this case. In this table, I got back an entry that looked like this both when changing the primary key in a table with a single primary key, and in adding or removing a field from a composite primary key on a table.

It would be nice to get the information about the primary key change, perhaps in a similar format to how we get back information about field type changes.

@gosusnp
Copy link
Contributor

gosusnp commented Aug 29, 2022

From our previous discussion, should we do the same for cursor fields? It could be a different story, but the logic itself feels very close here.

@teallarson
Copy link
Contributor Author

A source-defined cursor changing could change what data is being synced. So yes, it seems relevant to expose that. Good catch!

@alovew
Copy link
Contributor

alovew commented Sep 2, 2022

This came up in a conversation I was having with @andyjih, and I think we might want to think about distinguishing two things: schema updates and connection updates.

I think we had been using catalogDiff on the backend for source schema updates, which just means changes to the schema of the streams within the connection. A user changing a primary key or cursor field is a change within their connection catalog settings, not a schema update.

This is a bug we should fix but also wanted to highlight how maybe either the naming or how our models/API endpoints are structured are causing some confusion. Maybe instead of catalogDiff we think about different kinds of changes like schemaDiff (returning schema changes from the source schema) and connectionDiff (which includes all connection changes, including possibly the schema diff).

Just some thoughts I wanted to get down!

@teallarson
Copy link
Contributor Author

That makes sense! In this particular context I was thinking about Sources that have a source-defined primary key/cursor field, not when a user changes it in the UI.

@krishnaglick
Copy link
Contributor

During grooming it was decided that this is not likely occurring frequently so it's been moved to Low priority.
This will likely need some FE work to display when a primary key changes.

@teallarson
Copy link
Contributor Author

teallarson commented Oct 26, 2023

@flutra-osmani / @malikdiarra since you've both been working w/ schema changes lately...

We still return an empty array of transforms + label the connection has having a non-breaking change when a primary key is added (but was an existing column). Here's how that ends up looking to a user (this is the "fully open" state):
Screenshot 2023-10-26 at 10 36 21 AM

Can we either (a) not label this as a schema change on the connection or (b) return the field in the transforms array with a new transform type to reflect what happened ("primary key changed" or something?)

@pmossman
Copy link
Contributor

pmossman commented Nov 6, 2023

Refining notes:

  • Noticed while watching full story, still a poor UX that wasn't prioritized.
  • Core issue is that a primary key change triggers a diff, but isn't reported as a transform in the array that is returned to the front-end

@flutra-osmani
Copy link
Contributor

I have tried to reproduce this locally but I don't see it. This is my setup:

  1. I created a new stream pk with two fields and no index in a Postgres source, used in this connection. I sync the data and one row that exists in the pk stream is synced in the destination. When you create a stream with no PK (primary key) you'll have Full refresh | Overwrite sync mode automatically selected, and no cursor or PK defined (unlike the stream above in this screenshot).
Screenshot 2023-11-08 at 3 09 04 PM 2. Then I add an index to the existing field `username` in the source and press `Refresh source schema`. No schema changes are detected. No pop-up window shows up. This is the payload sent with the request `/connections/get` Screenshot 2023-11-08 at 3 12 22 PM

@flutra-osmani
Copy link
Contributor

Similarly, I removed the index from stream pk and then refreshed the schema again and a pop up was displayed for a few seconds indicating that no schema change was detected.

So, at least this seems to be behaving as (a) not label this as a schema change on the connection

@teallarson
Copy link
Contributor Author

teallarson commented Nov 9, 2023

https://www.loom.com/share/76b7c452581f4e12a58432fe984d67e5
I think I misspoke earlier and said index. It does have to have a primary key constraint to get picked up during discoverSchema and added to the catalog that gets diffed.

Edit: this should solve the weird UX I think. It seems like a "nice to have" to someday show the PK/cursor changes... but not worth the amount of changes required in the protocol to do it.

@flutra-osmani
Copy link
Contributor

https://www.loom.com/share/76b7c452581f4e12a58432fe984d67e5 I think I misspoke earlier and said index. It does have to have a primary key constraint to get picked up during discoverSchema and added to the catalog that gets diffed.

I see, thanks for the context.

Edit: this should solve the weird UX I think. It seems like a "nice to have" to someday show the PK/cursor changes... but not worth the amount of changes required in the protocol to do it.

Yes, agree. I think your change is a good compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/compose type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants