Skip to content

Extracts sheets from OpenOffice spreadsheets in a streaming fashion

License

Notifications You must be signed in to change notification settings

cheddar-me/ods_extractor

Repository files navigation

ODS Extractor

This gem will help you export multiple sheets from OpenOffice Calc documents. The documents often contain multiple sheets, and even though there is a command on OpenOffice to export a single sheet as CSV it will only export the sheet which is currently selected. You will also need a full install of OpenOffice to do this (which, in a server environment, will mean also a full GUI environment).

There are ways to use the --headless option to extract the sheets that you want, but you need to create a macro and load it into openoffice, and then trigger it from the commandline before exporting. Or you need to drive OO from UNO which you also need to install (see below in "Other solutions").

But instead of using a full OpenOffice/LibreOffice install we can also solve this from the other end and just parse the ODS document ourselves. This is easier than it might seem at first glance, but there are is a pitfall. There are gems which manipulate spreadsheet documents in this way - but they first load the entire document into memory, usually as a DOM. However this quickly breaks down when huge documents are involved.

This gem uses a SAX parser to ingest the spreadsheets in a streaming fashion. An ODS document is just a ZIP with a huge XML inside of it. The gem will open the ZIP file and parse the worksheets as it inflates them, without saving temporary files to disk. For parsing the XML a SAX parser from Nokogiri is used - you are likely to have Nokogiri already installed, as Rails uses it to sanitize HTML. The extraction of the rows happens as the XML is fed to Nokogiri directly from the ZIP file, without having to create any intermediate files on the file system.

Usage

You need access to an IO object with random access which contains your ODS file. Imagine you want to capture all rows from the Expenses sheet in your ODS, and you want to process them inline, as they get parsed:

# The output receives the sheet name and the row contents. `use_header_rows`
# will yield `Hash` objects with headers from the first row as keys.
output = ODSExtractor::RowOutput.new(use_header_row: true) do |sheet_name:, row:|
  LineItem.create!(expenditure: row.fetch("Expenditure"), amount: row.fetch('Amount').to_i)
end

File.open(__dir__ + '/big_spreadsheet.ods') do |f|
  ODSExtractor.extract(input_io: f, output: out, sheet_names: "Expenses")
end

Installation

Add this line to your application's Gemfile:

gem 'ods_extractor'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install ods_extractor

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/julik/ods_extractor. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the ODSExtractor project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Other solutions

About

Extracts sheets from OpenOffice spreadsheets in a streaming fashion

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published