Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with very large spreadsheet file #11

Open
4er4er4er opened this issue Oct 20, 2020 · 1 comment
Open

Trouble with very large spreadsheet file #11

4er4er4er opened this issue Oct 20, 2020 · 1 comment

Comments

@4er4er4er
Copy link

The attached script test5.txt, based on a user's example, produces a 123MB file MOD45.xlsx. When I try to open the file, I get this message,

image

and when I click "yes" I eventually get a message about "unreadable content" in /xl/worksheets/sheet6.xml:

image

The user reports that "read table" aborted with a related message,

Error writing table Pedidos1 with table handler amplxl:
Could not extract sheet xl/worksheets/sheet5.xml

but I did not see that. Anyhow it appears that the size of the spreadsheet file may be the cause of the problem, because I did not encounter errors with similar but smaller examples.

@nfbvs
Copy link
Contributor

nfbvs commented Nov 2, 2020

The error message reported by the user does not seem to be related with the issue. This message appears when a given table is mentioned in the relations table but the corresponding sheet does not exist, so the .xlsx file was somehow damaged.

For the spreadsheet dimensions there is no standard, so each vendor has its own implementation. For example:

  • MS Excel has a limit of 1,048,576 rows by 16,384 columns;
  • LibreOffice Calc has a limit of 1,048,576 rows and 1,024 columns;
  • Google Sheets has a limit of 18,278 columns and 5,000,000 cells for the hole workbook. You can add any number of rows as long as the number of cells is not exceeded.

Table Ventas1 generated by test5.txt has 2,016,002 rows, so Excel complains (Excel usually just says something was repaired but does not specify the reason).

As an alternative the user could try the default tab table handler or in the future amplcsv as they are faster than amplxl and could scale enough for what the user needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants