Skip to content

feat: added a spreadsheet to PDF converter for Excel files with MIME …#940

Open
konradlang wants to merge 9 commits into
masterfrom
939-feature-convert-spreadsheets-excel-to-pdf-and-include-in-merged-pdf
Open

feat: added a spreadsheet to PDF converter for Excel files with MIME …#940
konradlang wants to merge 9 commits into
masterfrom
939-feature-convert-spreadsheets-excel-to-pdf-and-include-in-merged-pdf

Conversation

@konradlang
Copy link
Copy Markdown
Collaborator

@konradlang konradlang commented Oct 13, 2025

Negotiator pull request:

Description:

Added a spreadsheet to PDF converter for Excel files with MIME type "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (XLSX files) in the XlsxConverter.java file.

Features:

  • Converts XLSX spreadsheets to PDF format
  • Preserves table structure with cell borders
  • Displays sheet names
  • Handles multiple sheets in a workbook
  • Properly extracts cell values of various types
  • A4 page size with appropriate margins

Changes Made:

  • Added dependency in pom.xml:
  • Added poi-ooxml version 5.4.1 to support reading XLSX files
  • Implemented XLSX to PDF conversion in XlsxConverter.java:
  • Uses Apache POI (XSSFWorkbook) to read Excel spreadsheets
  • Uses Apache PDFBox to generate PDF output with table rendering
  • Processes all sheets in the workbook
  • Renders cells with borders and proper formatting
  • Handles different cell types (STRING, NUMERIC, BOOLEAN, FORMULA, BLANK)
  • Includes sheet names as headers
  • Implements text truncation for long cell values
  • Handles pagination (currently stops when page is full - can be enhanced for multi-page support)
  • Code quality:

New tests added:

  • testconvertAttachmentsToPdf_WithXlsxAttachment_ConvertsSuccessfully - Tests successful conversion of a valid XLSX file to PDF, verifies PDF header
  • testconvertAttachmentsToPdf_WithXlsxNullPayload_SkipsAttachment - Tests that null payload is handled correctly
  • testconvertAttachmentsToPdf_WithXlsxEmptyPayload_SkipsAttachment - Tests that empty payload is handled correctly
  • testconvertAttachmentsToPdf_WithInvalidXlsxBytes_SkipsAttachment - Tests that invalid XLSX data is skipped
  • testconvertAttachmentsToPdf_WithCorruptedXlsxFile_SkipsAttachment - Tests that corrupted files are handled gracefully
  • testconvertAttachmentsToPdf_WithMultipleXlsxSheets_ConvertsSuccessfully - Tests conversion of workbooks with multiple sheets
  • testconvertAttachmentsToPdf_WithEmptyXlsxWorkbook_ConvertsSuccessfully - Tests empty workbook handling
  • testconvertAttachmentsToPdf_WithMixedDocxAndXlsx_ProcessesBoth - Tests processing mixed file types (DOCX + XLSX)
  • testconvertAttachmentsToPdf_WithXlsxWithFormulas_ConvertsSuccessfully - Tests XLSX files containing formulas
  • Helper Methods Added:
  • createValidXlsxBytes() - Creates or loads a valid XLSX test file
  • createMinimalValidXlsxBytes() - Creates a minimal XLSX with Apache POI programmatically
  • createEmptyXlsxBytes() - Creates an empty workbook for testing
  • createXlsxWithMultipleSheets() - Creates multi-sheet workbook for testing
  • createXlsxWithFormulas() - Creates workbook with Excel formulas
  • testXlsxConverter_WithNullBytes_ThrowsIllegalArgumentException - Tests lines 41-42 (null byte handling)
  • testXlsxConverter_WithEmptyBytes_ThrowsIllegalArgumentException - Tests lines 41-42 (empty byte handling)
  • testXlsxConverter_WithVariousCellTypes_HandlesAllTypes - Tests lines 150-156 (STRING, NUMERIC, BOOLEAN, FORMULA, BLANK, default cell types)
  • testXlsxConverter_WithLongText_TruncatesCorrectly - Tests lines 163-164 (text truncation logic)
  • testXlsxConverter_WithShortText_DoesNotTruncate - Tests line 166 (text not truncated when short)
  • testXlsxConverter_WithManyRows_HandlesPageBreak - Tests lines 108, 111 (pagination break logic)
  • testXlsxConverter_WithEmptyCells_HandlesCorrectly - Tests line 126 (empty cell value handling)
  • testXlsxConverter_WithSingleColumn_UsesMinWidth - Tests line 102 (minimum column width calculation)
  • testXlsxConverter_WithBooleanAndBlankCells_HandlesCorrectly - Tests lines 153, 155 (BOOLEAN and BLANK cell types)

Checklist:

Make sure you tick all the boxes below if they are true or do not apply before you ask for review

  • I have performed a self-review of my code
  • I have made my code as simple as possible
  • I have added relevant tests for my changes and the code coverage has not dropped substantially
  • I have removed all commented code
  • I have updated the documentation in all relevant places (Javadoc, Swagger, MDs...)
  • I have described the PR and added a meaningful title in the Conventional Commits format

…type application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (XLSX files)
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 92.77108% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.45%. Comparing base (b5441be) to head (6fb19fc).
⚠️ Report is 190 commits behind head on master.

Files with missing lines Patch % Lines
...bmri_eric/negotiator/attachment/XlsxConverter.java 92.68% 2 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #940      +/-   ##
============================================
+ Coverage     85.34%   85.45%   +0.10%     
- Complexity     2149     2175      +26     
============================================
  Files           264      265       +1     
  Lines          5945     6028      +83     
  Branches        378      390      +12     
============================================
+ Hits           5074     5151      +77     
- Misses          622      624       +2     
- Partials        249      253       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@konradlang konradlang marked this pull request as ready for review October 13, 2025 14:29
@konradlang konradlang assigned konradlang and unassigned konradlang Oct 13, 2025
@konradlang konradlang requested a review from svituz October 20, 2025 14:54
Comment on lines 119 to 124
}
case CONTENT_TYPE_DOCX, CONTENT_TYPE_TIKA_OOXML -> new DocxConverter();
case CONTENT_TYPE_DOC, CONTENT_TYPE_TIKA_MSOFFICE -> new DocConverter();
case CONTENT_TYPE_TIKA_XLSX -> new XlsxConverter();
default -> {
throw new IllegalArgumentException("Unsupported content type");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't noticed this piece of code last time. IMHO this switch is a code smell. Each converter should have the file types it supports: boolean supports(String fileType)
and the AttachmentConversionService would just do something like this:

public FileToPdfConverter getConverter(String fileExtension) {
        return converters.stream()
                .filter(c -> c.supports(fileExtension))
                .findFirst()
                .orElseThrow(() -> new IllegalArgumentException("No converter for: " + fileExtension));
    }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will not be changed in this PR - opened an new CR for this refactoring:
[FEATURE] Refatoring of attachment conversion services to provide supported file types.

PDDocument document = new PDDocument();
ByteArrayOutputStream pdfOutputStream = new ByteArrayOutputStream()) {

// Process each sheet in the workbook
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all such comments, to explain the logic break down the functionality into appropriately named methods


List<byte[]> result = conversionService.listToPdf(List.of(attachmentId));

// Should convert successfully with boolean and blank cells
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem with the comments as above :)


// ============================================
// XLSX Converter Tests
// ============================================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to create a new test file, instead of separation with comments, this will get outdated really quickly

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Nov 4, 2025

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Nov 4, 2025

Copilot AI review requested due to automatic review settings January 28, 2026 09:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds XLSX to PDF conversion functionality to support Excel spreadsheet attachments. The implementation uses Apache POI to read XLSX files and PDFBox to generate PDF output with formatted tables, properly handling multiple sheets and various cell types.

Changes:

  • Added XLSX to PDF conversion capability for spreadsheet attachments
  • Implemented comprehensive test coverage for various XLSX scenarios
  • Refactored test helper utilities to promote code reuse

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
backend/pom.xml Added Apache POI dependency (version 5.4.1) for XLSX support
backend/src/main/java/eu/bbmri_eric/negotiator/attachment/XlsxConverter.java New converter implementing XLSX to PDF conversion with table rendering
backend/src/main/java/eu/bbmri_eric/negotiator/attachment/AttachmentConversionServiceImpl.java Added XLSX content type mapping to route XLSX files to new converter
backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Comprehensive test suite covering various XLSX conversion scenarios
backend/src/test/java/eu/bbmri_eric/negotiator/attachment/AttachmentTestHelper.java Extracted shared test helper methods for creating test documents
backend/src/test/java/eu/bbmri_eric/negotiator/attachment/AttachmentConversionServiceTest.java Updated to use centralized test helper methods

Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
Comment thread backend/src/test/java/eu/bbmri_eric/negotiator/attachment/XlsxConverterTest.java Outdated
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Feb 2, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Feb 2, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Convert Spreadsheets (Excel) to PDF and include in merged pdf

4 participants