Skip to content

Conversation

jeffeaton
Copy link
Collaborator

In read_dhs_flat(), replace iotools::input.file() with vroom::vroom() for reading fixed-width text files.

vroom::vroom is much faster and memory efficient. Makes all parsing faster and improves ability to read very large datasets without exhausting system memory (i.e. India DHS surveys).

Also replaces Map() with for() loop when assigning variable labels to avoid full dataset copy in memory.

Work in progress

This PR works, but two remaining things to do:

  • vroom::vroom() is pretty chatty and throws lots of messages / warnings. Probably want to silence some of these.
  • Run a systematic download of all datasets to ensure no edge cases or unexpected issues.

In read_dhs_flat(), replace iotools::input.file() with vroom::vroom() for reading fixed-width text files.

vroom::vroom is much faster and memory efficient. Makes all parsing faster and improves ability to read very large datasets without exhausting system memory (i.e. India DHS surveys).

Also replaces Map() with for() loop when assigning variable labels to avoid full dataset copy in memory.
@jeffeaton jeffeaton marked this pull request as draft April 21, 2024 18:17
@jeffeaton jeffeaton requested a review from OJWatson April 21, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant