topsStack computeCoherence unnecessarily accesses burst_0*.slc.vrt in DataAccessorPy.py, producing HPC problems #247

falkamelung · 2021-03-04T21:51:53Z

We got into trouble using topsStack on HPC (TACC's Stampede2) because of too heavy IO loads on the shared disk by running more than ~200 ISCE processes simultaneously. The admins advised us for each step to copy all required input files to a local /tmp disk. This works fine for all processing steps except for run_10_filter_coherence where we get the error

GDAL open (R): /tmp/merged/SLC/20160629/20160629.slc.full.vrt
ERROR 4: /tmp/merged/SLC/20160629/../../../coreg_secondarys/20160629/IW1/burst_01.slc.vrt: No such file or directory
Error. Cannot open the file /tmp/merged/SLC/20160629/20160629.slc.full.vrt in read mode.
Error in file /home/conda/feedstock_root/build_artifacts/isce2_1605839897087/work/build/components/iscesys/ImageApi/InterleavedAccessor/src/GDALAccessor.cpp at line 77 Exiting

This error occurs in runBurstIfg.py in the slc2.createImage() call here. More precisely, it fails in DataAccessorPy.py here. I have no solution but it is similar to #245 (comment)

To be clear, the processing works fine on /scratch. This happens only when using two disks. Here our modified config_igram_filt_coh_*:

cat config_igram_filt_coh_20160605_20160629
##########################
###################################
[Function-1]
FilterAndCoherence :
input : /tmp/merged/interferograms/20160605_20160629/fine.int
filt : /scratch/05861/tg851601/GalapagosSenDT128/merged/interferograms/20160605_20160629/filt_fine.int
coh : /scratch/05861/tg851601/GalapagosSenDT128/merged/interferograms/20160605_20160629/filt_fine.cor
strength : 0.2
slc1 : /tmp/merged/SLC/20160605/20160605.slc.full
slc2 : /tmp/merged/SLC/20160629/20160629.slc.full
complex_coh : /scratch/05861/tg851601/GalapagosSenDT128/merged/interferograms/20160605_20160629/fine.cor
range_looks : 15
azimuth_looks : 5

It reads the fine.int from the node-local /tmp but writes all outfiles to the shared /scratch. In the same way it reads 20160605.slc.full from /tmp but writes fine.cor to /scratch. The error occurs when trying to access burst_01.slc.vrt. Copying of this file to /tmp is not an possible because it also wants burst*.slc and azimuth_*off* requiring too much space on /tmp.

My question:

Does it really needs burst_01.slc.vrt and burst_01.slc.vrt to calculate the complex coherence? If not, how to tell isce not to open these files?

If this can be resolved we can run ~5000 processes (or more) simultaneously, so this is important.

Our python scripts for job submission on HPC and copying to local disk are on a public GitHub (MinSAR) but still messy. We plan to share asap.

The text was updated successfully, but these errors were encountered:

piyushrpt · 2021-03-04T23:00:45Z

Access to these are not unnecessary

merged vrt points to collection of individual vrts. These individual vrts are the only interpretation of data type and layout of bursts in the raw file. This lets you mosaic data in any format - flat files, tiffs etc.

In your workflow, if you are already paying the extra cost of generating a physical .slc.full - you can overwrite the slc.vrt file with image.renderVRT() call after your gdal_translate of gdal.Translate() call. ISCE is seeing a .vrt which still points to pieces from individual bursts. Once you have full file, update this to point to the merged physical file.

falkamelung · 2021-03-08T22:16:52Z

Thank you. We don't have the *.slc.full , so what you are suggesting is not possible.

MintPy does not use the complex coherence. We therefore would like to add an option to make this calculation optional, probably stackSentinel.py --no_complex_coherence.

Does anybody know what the complex coherence is used for and whether it used in routine workflows? It was introduced recently: #97. Alternatively, instead of introducing an option to skip it, we could change the default to what it was prior to this PR and introduce an option to calculate it stackSentinel.py --complex_coherence. Any comments/thoughts/objections?

yunjunz · 2021-03-17T03:42:46Z

I like the idea of stackSentinel.py --cpx_coh / --complex_coherence, it takes less disk usage and less computing time for the default setting, which might be more of the common scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topsStack computeCoherence unnecessarily accesses burst_0*.slc.vrt in DataAccessorPy.py, producing HPC problems #247

topsStack computeCoherence unnecessarily accesses burst_0*.slc.vrt in DataAccessorPy.py, producing HPC problems #247

falkamelung commented Mar 4, 2021

piyushrpt commented Mar 4, 2021

falkamelung commented Mar 8, 2021

yunjunz commented Mar 17, 2021 •

edited

Loading

topsStack computeCoherence unnecessarily accesses burst_0*.slc.vrt in DataAccessorPy.py, producing HPC problems #247

topsStack computeCoherence unnecessarily accesses burst_0*.slc.vrt in DataAccessorPy.py, producing HPC problems #247

Comments

falkamelung commented Mar 4, 2021

My question:

piyushrpt commented Mar 4, 2021

falkamelung commented Mar 8, 2021

yunjunz commented Mar 17, 2021 • edited Loading

yunjunz commented Mar 17, 2021 •

edited

Loading