-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsl Workflow Notes #8
Comments
Quick comment on the Questions... you only want to shutdown the executors when you are finished with all job requests. So, pretty much at the end, unless you are done with parsl and want to free up some memory before you do other things in your program. Note how the |
Thank you, Matt! That makes sense. When we get the workflow to the point where it runs all steps in order, we won't shut down the executor in between steps. But as I play with the configuration to troubleshoot errors, I will make sure to shut down the executor before defining it again. |
After meeting with Robyn, here are a few troubleshooting approaches:
|
Update on error in staging step with lake change sample data. Same error, but more explicit message printed in the log:
Notes:
|
good sleuthing... seems like we need to differentiate between permissions issues that arise:
In both cases, we probably simply need to ensure files we plan to write to will be set as writable on the node it occurs on, either via a |
Thanks Matt, I will look into the In the meantime, I was able to stage the 3 original GeoPackage files that Ingmar provided as a sample in parallel with
I then was not able to rasterize the the staged files that were created, so that will be my task on Monday. The error reported in |
After changing my logging configuration option for
I checked that I able to read in the tiles separately as geodataframes. Thanks to the print statement at the start and the lack of a print statement that rasterization was complete, I could tell the source of the error (or part of the source) was probably within
While I am unsure of what option 2's code snippet does (to figure out tomorrow), it inspired me to change the |
Successfully created web tiles in parallel from geotiff's and no errors occurred according to the log, but got same repeated |
Resuming workflowAfter re-installing
3 erroring filesTo determine which files are failing to rasterize, I did some simple string manipulation to trim the file paths from the String trimming
Erroring files are:
Perhaps a more succinct version of this check (with list comprehension) would be a good Turns out this was a little unnecessary since these files did print error messages in |
I did some general exploration of the erroring files (plotted, checked for NaN's in gdf format, etc.). Then I put the 3 filepaths into a list, batched them, and rasterized them in parallel for all z-levels with no errors. It seems they only errored when processing with all other staged files. This leads me to think perhaps we should just try to rasterize the files again here if the criteria for the error message is met, and then if there is still a problem for that second try, then produce the error message and move on. |
I edited rasterize_vector() with retry integrated
I anticipated that the same files would error as last time, which would be noted in However, there were errors in creating parent geotiffs, so I will continue to try to test the modification I made to |
Closing this issue as the parsl workflow is now functional and is being integrated to use kubernetes and parsl with the Arctic Data Center cluster. |
Sticking points, suggested code, and questions that arise while working through parsl workflow (
parsl_breakdown.ipynb
in the branchparsl-workflow-breakdown
). Processed lake change sample data provided by Ingmar (GeoPackage files separated by UTM zones).Sticking Points
Result:
staged
dir was created and files were being written fine, but then 47 minutes into the process (usually takes ~50 minutes), 14117 out of 19088 staged files were written, an error was returned:Next, I changed the input file sample size to 10 gpkg files (I downloaded 10 new
lake_sample.gpkg
files from the Google Drive that Ingmar uploaded last week) and staged in parallel with staging batch size = 2Result:
data:image/s3,"s3://crabby-images/9d9aa/9d9aae512ac1d39a3647e8ef2546d861139a63f3" alt="image"
staged
dir was created and files were being written fine (again), but then errored with the same message after 183 minutes (a very sad end to the workday):Also, I tried just jumping to the rasterization step and pointing to the complete
staged
dir from my lake change sample run through withoutparsl
. While running rasterization in parallel, and no errors resulted, but thegeotiff
andweb_tile
folders were not created.Suggested Code
HighThroughputExecutor
configuration forparsl
workflow without kubernetes (run locally):parsl
HighThroughputExecutor
documentation and compared to the same code chunk used in the Scalable Computing CourseHighThroughputExecutor
parsl.clear()
is important to reset theparsl
config with each run of the scriptQuestions to investigate:
How often should the following be run? After each parallel operation (as in, after staging in parallel, rasterizing in parallel, creating web tiles in parallel etc.) or just at the end of the script (I think it is the latter)?
The text was updated successfully, but these errors were encountered: