Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement safe rename_table #54

Closed
wants to merge 16 commits into from

Conversation

pirj
Copy link
Contributor

@pirj pirj commented Oct 3, 2021

partially addresses #53 (rename_table, but not rename_column)

Copy link
Contributor Author

@pirj pirj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open for any feedback.

I think I'm missing rename_table in IdemPotentStatements, should I add it there?

lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
@pirj pirj force-pushed the implement-safe-rename_table branch from fd2ba97 to c954c32 Compare October 3, 2021 14:18
@rchoquet rchoquet requested review from ThHareau and rchoquet October 4, 2021 15:06
README.md Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
lib/safe-pg-migrations/plugins/statement_insurer.rb Outdated Show resolved Hide resolved
@rchoquet
Copy link
Contributor

rchoquet commented Oct 5, 2021

I think I'm missing rename_table in IdemPotentStatements, should I add it there?

Yes, the rationale behind Idempotent statements is to be able to retry migration if part of it failed.
Let's say if you do

rename_table :foo, :bar
add_column :bar, :col, :text

if the add_column fails for any reason (let's say a lock timeout), you would not be able to retry the migration as the table foo would not exist anymore (and thus the RENAME statement would fail)

(it has to do with disabling transaction in migration, which could be improved)

@rchoquet
Copy link
Contributor

rchoquet commented Oct 5, 2021

And thanks a lot for the contribution!

@pirj pirj force-pushed the implement-safe-rename_table branch from c954c32 to 82b2c7f Compare October 6, 2021 10:46
pirj added 15 commits October 6, 2021 21:38
$ rake test
MiniTest::Unit::TestCase is now Minitest::Test. From /Users/pirj/source/safe-pg-migrations/test/useless_statement_logger_test.rb:5:in `<top (required)>'
If we execute our code, that is designed to be atomic, in a new
transaction, and it will be a nested transaction, if the transaction
fails, the outer, parent transaction will not be rolled back due to the
uncontrollable Active Record behaviour.
@pirj pirj force-pushed the implement-safe-rename_table branch from ac9ae73 to 2ee1e05 Compare October 6, 2021 18:38
@pirj pirj mentioned this pull request Oct 6, 2021
@pirj
Copy link
Contributor Author

pirj commented Oct 6, 2021

Tests pass. Please have another look.

@pirj pirj requested a review from rchoquet October 6, 2021 18:41
yield
else
# Open a new transaction
transaction(requires_new: true, joinable: false) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need those options if we are not in a transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the sound of my OCD chiming in.
You're right, we don't need requires_new since we make sure there is no parent transaction.

I'm on the fence of keeping joinable: false or not.

If we keep it as false, that means that if the block passed to all_or_nothing_transaction opens a transaction, and that transaction will fail, what is outside the nested transaction will not fail. This would be perplexing for a migration to be applied in parts out of order, or if certain parts are swallowed without notice.

If we set joinable to true, it doesn't really mean that we are protected from this behaviour. If the nested transaction will be opened with requires_new: true, it would work the same way as if we open the parent one with joinable: false.

That all leads me to a though it's not such a bad idea to override transaction, and do a super(requires_new: false, joinable: true). Does this sound reasonable to you?

Thanks for bringing this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires_new: false, joinable: true seems to be the default options for transaction, unless I missed something.
(which looks like a sane default to me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, yes. I think it boils down to:

  1. Make a note that the block passed to this method should open transactions with the same default options. To make sure they are all joined into one.
  2. Double-check if transaction_open? that it is a joinable transaction.

Do you agree?

all_or_nothing_transaction do
super(table_name, new_name) # Actually rename the table

execute "CREATE VIEW #{quoted_table_name} AS SELECT * FROM #{quoted_new_name}"
Copy link
Contributor

@rchoquet rchoquet Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought about it, there is one gotcha with this approach: ActiveRecord does not manage to find the primary key when using a view (ie: Model.primary_key is nil).

In the happy path, it should not be problem, as this information is cached until the application is reloaded, but if for any reason (let's say rollback of the application code) the application reloads, it will fail to determine the primary key from the view, so a simple Model.find(1) would fail.

I see two options:

  • print a message to indicate that the primary key should be specified explicitly in the model class (self.primary_key = :id)
  • do it the other way around: create the view with the new table name, and then in a second migration, drop the view and rename the table. This way developers are forced to indicate self.primary_key = :id as a lot of tests will probably fail with the model relying on the view. A bit less elegant than your approach though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

I see what you mean. It is practically possible even without rollback. Say, during the deployment, the traffic increases, and we start up another node with the old code that will query the schema info and would fail to fetch PK info.

Actually, I favour the second approach. We don't leave the rubbish temporary views behind with a barely visible "remove me later" schema comment. There is no warning to ignore.
The first migration runs and can make sure that revision A (cached PK info) and revision B (model points to the view with the new name and explicit self.primary_key =) are compatible to the schema.

The only problem I can think of is that those two migrations (create new view, drop the view and rename the table) should not be deployed together. Wondering if there is a mechanism to prevent this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of, we could probably do something at a higher level, to prevent those two migrations from running in a single db:migrate, but it's not bullet proof (and probably complicated to implement).
I think outputting a warning to the user could do the trick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that's the drawback of this approach, we can not simply override rename_table anymore, we most likely need to provide additional DSL method)

# add_column :accounts, :primary, :boolean
# end
# end
def all_or_nothing_transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the naming, "all or nothing" and "transaction" seem to be redundant (but I get the idea). I'd go for something like ensure_in_transaction, up to you though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more like ensure_not_to_open_a_nested_transaction.

@pirj
Copy link
Contributor Author

pirj commented Oct 10, 2021

It seems to me that the new approach, when we introduce a view with the new name, is prone to the same problem with inferring PKs. Imagine, the deployment went fine, and the new code has self.table_name= set to the new name. This code will similarly fail to infer PKs from the view, won't it?

The only reasonable solution I can think of is to explicitly tell AR what the key is, as you suggest in the option 1.

There is one subtle, but important difference between "create a view with the new name; drop the view and rename the table" vs "rename the table and create a view with the old name". For the first approach, we could potentially raise an error if self.primary_key= is not explicitly set on the model.

However, it is not strictly necessary that:

  • models code is even loaded during the migrations
  • the application is a monolith, and all models pointing to the renamed table are in the same repository where migrations are
  • we might not be able to reliably find the model that is backed by the renamed table

This leads me to a thought that what we can reasonably recommend is:

  1. Explicitly set self.primary_key= on models backed by the renamed table, and deploy
  2. Use our enhanced rename_table helper in the migration, and deploy
  3. Remove the now redundant self.primary_key= and remove the view for the old name

This, certainly, is not mistake-proof and should be done consciously and thoroughly.

It still feels like an improvement over the approach currently suggested by strong_migrations.

What do you think? No offence taken if you feel that this approach fails to align with the philosophy of this gem.

@rchoquet
Copy link
Contributor

It seems to me that the new approach, when we introduce a view with the new name, is prone to the same problem with inferring PKs. Imagine, the deployment went fine, and the new code has self.table_name= set to the new name. This code will similarly fail to infer PKs from the view, won't it?

Yes you're right, the rationale behind this proposal being safer than the other one was that tests would most likely catch this (as a few basic ActiveRecord operations are broken).

I'll think a bit about it, I feel like the developers still have to do the heavy lifting with this approach (well, renaming tables should not be really frequent, so maybe it's acceptable)

@pirj
Copy link
Contributor Author

pirj commented Dec 12, 2021

It doesn't seem to be worth it to recommend this approach with some use cases that have undesirable side effects.
I really hope Postgres will eventually provide a mechanism to get PKs via a view, or Rails will be able to get PKs of the underlying table from a simple view. Closing until then.

@pirj pirj closed this Dec 12, 2021
@fatkodima
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants