Fix HTML format fallback to generate both XLSX and CSV as documented#386
Conversation
Co-authored-by: gphuang <13152353+gphuang@users.noreply.github.com>
Co-authored-by: gphuang <13152353+gphuang@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes an inconsistency in the HTML format handler where it warned about generating "xlsx+csv" files but only actually generated XLSX files. The implementation has been updated to match the documented behavior by generating both XLSX and CSV files when HTML format is requested.
Key changes:
- HTML format handler now generates both XLSX and CSV files as documented
- Warning message improved from "using xlsx+csv" to "generating xlsx+csv instead" for clarity
- Documentation updated across all functions to explicitly mention HTML format fallback behavior
Comments suppressed due to low confidence (1)
primus/backends/megatron/training/mlflow_artifacts.py:370
- Variable dfs is not used.
dfs = generate_perf_report_pytorch(trace_file, output_csvs_dir=csv_subdir)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # CSV: Multiple files in a subdirectory | ||
| csv_subdir = os.path.join(output_dir, report_name) | ||
| os.makedirs(csv_subdir, exist_ok=True) | ||
| generate_perf_report_pytorch(trace_file, output_csvs_dir=csv_subdir) |
There was a problem hiding this comment.
Inconsistent variable naming: The return value from generate_perf_report_pytorch is stored in 'xlsx_dfs' for consistency (line 383), but the CSV generation call on line 393 ignores the return value. For consistency with line 370 (the "all"/"csv" branch), this should be stored in a variable like 'csv_dfs' even if not immediately used, to maintain code consistency.
| generate_perf_report_pytorch(trace_file, output_csvs_dir=csv_subdir) | |
| csv_dfs = generate_perf_report_pytorch(trace_file, output_csvs_dir=csv_subdir) |
| if output_format == "html": | ||
| warning_rank_0("[TraceLens] HTML format not yet supported, using xlsx+csv") | ||
| # Fall through to xlsx | ||
| warning_rank_0("[TraceLens] HTML format not yet supported, generating xlsx+csv instead") | ||
| # Generate both XLSX and CSV formats as fallback | ||
| # XLSX: Single file with multiple tabs | ||
| xlsx_path = os.path.join(output_dir, f"{report_name}_analysis.xlsx") | ||
| dfs = generate_perf_report_pytorch(trace_file, output_xlsx_path=xlsx_path) | ||
| xlsx_dfs = generate_perf_report_pytorch(trace_file, output_xlsx_path=xlsx_path) | ||
| if os.path.exists(xlsx_path): | ||
| log_rank_0( | ||
| f"[TraceLens] Generated XLSX report with {len(xlsx_dfs)} tabs: {os.path.basename(xlsx_path)}" | ||
| ) | ||
| generated_files.append(xlsx_path) | ||
|
|
||
| # CSV: Multiple files in a subdirectory | ||
| csv_subdir = os.path.join(output_dir, report_name) | ||
| os.makedirs(csv_subdir, exist_ok=True) | ||
| generate_perf_report_pytorch(trace_file, output_csvs_dir=csv_subdir) | ||
|
|
||
| # Collect all generated CSV files | ||
| csv_files = glob.glob(os.path.join(csv_subdir, "*.csv")) | ||
| if csv_files: | ||
| log_rank_0(f"[TraceLens] Generated {len(csv_files)} CSV files for {report_name}") | ||
| generated_files.extend(csv_files) |
There was a problem hiding this comment.
Code duplication: The HTML format handler duplicates the logic from the "all" branch (lines 356-376). Consider refactoring to avoid duplication - for example, treating output_format == "html" the same as "all" by changing the condition on line 356 to 'if output_format in ("all", "xlsx", "html")' and line 366 to 'if output_format in ("all", "csv", "html")', then only keeping the warning message unique to the HTML case.
The HTML format handler displayed a warning saying "using xlsx+csv" but only generated XLSX files, creating an inconsistency between the warning message and actual behavior.
Changes:
The implementation now generates the expected output:
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.