-
Notifications
You must be signed in to change notification settings - Fork 1
Add script to convert Meta Tectonic traces into object traces (for use with libCacheSim) #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @shermanjlim, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new C++ utility that facilitates the conversion of Meta Tectonic block storage traces into a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a C++ script to convert Meta Tectonic traces into a format compatible with libCacheSim, along with corresponding updates to the README. The script is a solid addition, but I've identified some critical robustness issues related to CSV parsing that could cause crashes on malformed input. My review includes suggestions to handle these errors gracefully. I've also provided feedback to improve code maintainability and adhere to standard C++ practices, such as using named constants and directing error output to stderr.
| op_time = std::stoull(cells[0]); | ||
| block_id = cells[1]; | ||
| io_size = std::stoull(cells[3]); | ||
| io_offset = std::stoull(cells[4]); | ||
| rs_shard_id = cells[10]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSV parsing logic here is not robust and could cause the program to crash if it encounters malformed data.
- Out-of-bounds access: The code accesses
cells[10]without checking if the line has at least 11 columns. This will lead to undefined behavior on shorter lines. - Unhandled exceptions:
std::stoullwill throw an exception if a string is not a valid number, which will terminate the program.
You should add validation to handle these cases gracefully, for instance by wrapping the parsing in a try-catch block and checking cells.size().
if (cells.size() < 11) {
std::cerr << "Skipping malformed line (not enough columns): " << line << std::endl;
continue;
}
try {
op_time = std::stoull(cells[0]);
block_id = cells[1];
io_size = std::stoull(cells[3]);
io_offset = std::stoull(cells[4]);
rs_shard_id = cells[10];
} catch (const std::exception& e) {
std::cerr << "Skipping line due to conversion error: " << e.what() << " in line: " << line << std::endl;
continue;
}| std::cout << "Usage: " << argv[0] | ||
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's standard practice to print usage information to std::cerr rather than std::cout. This allows users to redirect standard output (e.g., to a file) without capturing usage or error text.
| std::cout << "Usage: " << argv[0] | |
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; | |
| std::cerr << "Usage: " << argv[0] | |
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; |
| // open files | ||
| std::ofstream output_csv{argv[1]}; | ||
| if (!output_csv) { | ||
| std::cout << "Error: Could not open file " << argv[1] << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (size_t i = 2; i < argc; ++i) { | ||
| input_csvs.emplace_back(argv[i]); | ||
| if (!input_csvs.back()) { | ||
| std::cout << "Error: Could not open file " << argv[i] << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic prints an error but continues if an input file fails to open. This could lead to an incomplete conversion without a clear failure signal. It's more robust to exit immediately upon failing to open a required file. Also, error messages should be directed to std::cerr.
| for (size_t i = 2; i < argc; ++i) { | |
| input_csvs.emplace_back(argv[i]); | |
| if (!input_csvs.back()) { | |
| std::cout << "Error: Could not open file " << argv[i] << std::endl; | |
| } | |
| } | |
| for (size_t i = 2; i < argc; ++i) { | |
| input_csvs.emplace_back(argv[i]); | |
| if (!input_csvs.back()) { | |
| std::cerr << "Error: Could not open file " << argv[i] << std::endl; | |
| return EXIT_FAILURE; | |
| } | |
| } |
| size_t op_time; // 0 | ||
| std::string block_id; // 1 | ||
| size_t io_size; // 3 | ||
| size_t io_offset; // 4 | ||
| std::string rs_shard_id; // 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using magic numbers for column indices, as indicated by the comments, makes the code harder to read and maintain. If the column order of the input trace changes, you'd have to find all occurrences of these numbers. It would be better to define these as named constants.
For example:
constexpr size_t kColOpTime = 0;
constexpr size_t kColBlockId = 1;
// ...etc.
// Then use them in the loop:
op_time = std::stoull(cells[kColOpTime]);
Description
Fixes #7