-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast_import: put job status to s3 #11284
base: main
Are you sure you want to change the base?
Conversation
if std::fs::exists(&status_dir)?.not() { | ||
std::fs::create_dir(&status_dir).context("create status directory")?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about a separate workdir
object with some internal folders/file management, maybe indeed makes sense. For now just copied
if std::fs::exists(&status_dir)?.not() { | ||
std::fs::create_dir(&status_dir).context("create status directory")?; | ||
} | ||
let status_file = status_dir.join("fast_import"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about name of the file. Need opinion on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the same name as in the fast import (Command::PgData
) flow: $root/status/pgdata/status
. Or we can change it for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I want to break compatibility now without a specific need. I think current names & files are ok, I can put command name and args into the file as well
7986 tests run: 7603 passed, 0 failed, 383 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
c4ace56 at 2025-03-18T21:09:06.202Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This require a cplane side change as well, right?
Currently cplane import operation simply waits for the job to complete here.
if std::fs::exists(&status_dir)?.not() { | ||
std::fs::create_dir(&status_dir).context("create status directory")?; | ||
} | ||
let status_file = status_dir.join("fast_import"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the same name as in the fast import (Command::PgData
) flow: $root/status/pgdata/status
. Or we can change it for both.
compute_tools/src/bin/fast_import.rs
Outdated
let res_obj = if res.is_ok() { | ||
serde_json::json!({"done": true}) | ||
} else { | ||
serde_json::json!({"done": false, "error": res.unwrap_err().to_string()}) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could match to avoid the unwrap:
let res_obj = match res {
Ok(_) => serde_json::json!({"done": true}),
Err(err) => serde_json::json!({"done": false, "error": err.to_string()})
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, fixed!
serde_json::json!({"done": false, "error": res.unwrap_err().to_string()}) | ||
}; | ||
std::fs::write(&status_file, res_obj.to_string()).context("write status file")?; | ||
aws_s3_sync::upload_dir_recursive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does any retries. If we face a transient S3 error, the job will fail and the S3 status won't reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added retries config to the clients
Problem
fast_import
binary is being run inside neonvms, and they do not support properkubectl describe logs
now, there are a bunch of other caveats as well: neondatabase/autoscaling#1320Anyway, we needed a signal if job finished successfully or not, and if not — at least some error message for the cplane operation. And after a short discussion, that s3 object is the most convenient at the moment.
Summary of changes
If
s3_prefix
was provided tofast_import
call, any job run puts a status object file into{s3_prefix}/status/fast_import
with contents{"done": true}
or{"done": false, "error": "..."}
. Added a test as well