You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A full reprex may not possible, because multiple GBs of data are involved, but I have a use-case that is roughly described below. In essence, I am passing an absolute path to a shapefile (the same shapefile for all workers) and the absolute path for a raster (a different raster for each worker, 1170 total), along with a couple of bookkeeping integers. Each worker then loads the raster and shapefile, and computes some summary statistics of the raster for the area within each polygon. These summaries are written to a csv for the next step of analysis. More specifically...
This particular implementation of the operation might be memory inefficient - the shapefile could probably be loaded once and shared across threads, but it's not that big <100 MB - but I would not expect the memory footprint of the operation to grow over time (or at least not by a substantial amount). However, that's not what I'm observing in practice and the memory requirement grows over time, eventually reaching near critical levels (usually after a handful to several rasters are processed on each thread). Interestingly, the speed at which things deteriorate seems to depend upon the value of scheduling in furrr_options(). When scheduling = 1L, the system starts to use swap memory on the second set of operations on each process. When scheduling = 10L or 20L, I can get a bit further (as many as 15-18 rasters handled by each process). There's also some apparent dependency on the number of inputs passed, such that feeding future_pwalk() more inputs keeps things afloat a bit longer, where as memory balloons apparently much more quickly when doing a subset of everything I'd like process.
This is not the first time I've encountered apparent memory leaks in processing large volumes of data, so I am looking to get a better handle on how to troubleshoot and course-correct. I appreciate any suggestions or direction!
The text was updated successfully, but these errors were encountered:
A full reprex may not possible, because multiple GBs of data are involved, but I have a use-case that is roughly described below. In essence, I am passing an absolute path to a shapefile (the same shapefile for all workers) and the absolute path for a raster (a different raster for each worker, 1170 total), along with a couple of bookkeeping integers. Each worker then loads the raster and shapefile, and computes some summary statistics of the raster for the area within each polygon. These summaries are written to a csv for the next step of analysis. More specifically...
This particular implementation of the operation might be memory inefficient - the shapefile could probably be loaded once and shared across threads, but it's not that big <100 MB - but I would not expect the memory footprint of the operation to grow over time (or at least not by a substantial amount). However, that's not what I'm observing in practice and the memory requirement grows over time, eventually reaching near critical levels (usually after a handful to several rasters are processed on each thread). Interestingly, the speed at which things deteriorate seems to depend upon the value of scheduling in
furrr_options()
. Whenscheduling = 1L
, the system starts to use swap memory on the second set of operations on each process. Whenscheduling = 10L
or20L
, I can get a bit further (as many as 15-18 rasters handled by each process). There's also some apparent dependency on the number of inputs passed, such that feedingfuture_pwalk()
more inputs keeps things afloat a bit longer, where as memory balloons apparently much more quickly when doing a subset of everything I'd like process.This is not the first time I've encountered apparent memory leaks in processing large volumes of data, so I am looking to get a better handle on how to troubleshoot and course-correct. I appreciate any suggestions or direction!
The text was updated successfully, but these errors were encountered: