Skip to content

CUDA pipeline for computing APR #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 77 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
e6aa9c9
Bspline filters fixed for CUDA pipeline
krzysg Aug 1, 2022
b563da4
Debug messages turned off
krzysg Aug 1, 2022
3db510f
Fixed Inv Bspline in X direction (CUDA pipeline)
krzysg Aug 1, 2022
18fce44
Inverse Bspline pipeline for CUDA fixed
krzysg Aug 2, 2022
ad5f194
Downsample and downsample gradient corrected to match GPU
krzysg Aug 3, 2022
557eff3
GPU pipeline fixes - Full Gradient test is working now
krzysg Aug 9, 2022
57765a7
Merge branch 'develop' into cuda
krzysg Aug 9, 2022
3da13ba
Merge branch 'develop' into cuda
krzysg Aug 9, 2022
d958161
GPU and CPU give same resutls in Release mode - turned off unsafe opt…
krzysg Aug 10, 2022
4ace238
Quick fix of processOnGpu() - not it gets correct bspline data for ea…
krzysg Aug 10, 2022
b050e07
Added new test file for LIS CUDA, GPU now handles boundary (without p…
krzysg Nov 14, 2022
570ab20
Local Intensity Scale (LIS) not works in X-dir as expected. GPU and C…
krzysg Jan 31, 2023
17e5d8e
Local Intensity Scale (LIS) now works in Z-dir as expected. GPU and C…
krzysg Feb 1, 2023
5ad9865
Updated compareMeshes to show maximum error found
krzysg Feb 17, 2023
af1c3ac
LIS in X-dir redesigned so code is clearer and faster. Also new test …
krzysg Feb 17, 2023
521d826
LIS in Z-dir redesigned so code is clearer and faster. Also new test …
krzysg Feb 24, 2023
b297adf
Local Intensity Scale (LIS) now works in Y-dir as expected. GPU and C…
krzysg Mar 13, 2023
2cdf3fe
Whole LIS pipeline is matching exactly CPU implementation + tests upd…
krzysg Mar 16, 2023
e093c01
Quick fix of linking error
krzysg Mar 16, 2023
053380d
maximum error diff. GPU vs CPU for compute gradient set to 0
krzysg Mar 16, 2023
97cf75e
rescaleAndThreshold in now only rescaling (to reflect changed in CPU …
krzysg Mar 17, 2023
83c2a31
rescaleAndThreshold in now only rescaling (to reflect changed in CPU …
krzysg Mar 17, 2023
5b5a719
constant_intensity_scale handling in LIS added for GPU
krzysg Mar 17, 2023
5d0375a
Removed unused threshold functions
krzysg Mar 20, 2023
53ef94b
FullPipeline test moved to new file
krzysg Mar 20, 2023
ac2c22e
PixelDataDim updated with maximum dimension lenght and nuber of dimen…
krzysg Mar 20, 2023
122a96a
GradLisLevels test working now
krzysg Mar 20, 2023
6a5db35
full pipeline tests fixed
krzysg Mar 24, 2023
4088e9d
Changes from old branches added + modified to GenInfo instead of APRA…
krzysg Jul 20, 2023
b8f2504
Added debug printout to GenInfo
krzysg Jul 21, 2023
6400a9a
Moved old CUDA tests to new file
krzysg Aug 11, 2023
4b35b8e
Moved old CUDA tests to new file
krzysg Aug 11, 2023
1ed5d4f
Added CUDA_ARCHITECTURES set to OFF (keep current behaviour) to suppr…
krzysg Oct 30, 2023
93ac120
Temporary test updated to print particles using LinearAccess iterator
krzysg Nov 8, 2023
09bf86a
Merge branch 'master' into cuda
krzysg Nov 8, 2023
b7ae1cb
Merge branch 'master' into cuda
krzysg Nov 10, 2023
6181da6
Merge branch 'master' into cuda
krzysg Nov 10, 2023
ed09686
Merge branch 'master' into cuda
krzysg Nov 13, 2023
70543d2
TODO about some problems with edge case
krzysg Nov 30, 2023
dd3d448
Fixed test where out of range idx was given
krzysg Dec 6, 2023
1a112ec
Pulling Scheme tests (and OVPC on CPU) finished.
krzysg Dec 13, 2023
64ca641
Fixes for tests
krzysg Dec 14, 2023
9f31bfd
Fixed OVPC - clamping values of input levels is necessary
krzysg Jan 9, 2024
2707207
Updated OVPC (PS) for CUDA - now it gives correct ans same results as…
krzysg Feb 5, 2024
3cb4529
PullingSchemeCudaTest finished, added init file for LinearAcccess test
krzysg Feb 16, 2024
027e52a
Finished LinearAccess tests (for linear structure only), added draft …
krzysg Feb 21, 2024
e83b952
Check also total_number_particles in LinearAccess test
krzysg Feb 23, 2024
2cc5bca
LinearAccessCuda implemented (it is not used yet in CUDA pipeline)
krzysg Aug 2, 2024
e1b63d7
Compiler warnings fixed
krzysg Aug 2, 2024
4c88fae
Removed debug outputs from LinearAccessCuda test.
krzysg Aug 6, 2024
169cd9d
Added two more test for full pipeline (including PS, and LinearAccess)
krzysg Aug 6, 2024
dadf92f
-ffast-math must be removed - some optimizations still make GPU and C…
krzysg Aug 8, 2024
27a8dc3
(nasty) fix for computeLevels in CUDA - added TODO to make it more re…
krzysg Aug 8, 2024
bb3b3f4
Fix for bsplineYdir for very small input images + test for full pipel…
krzysg Aug 9, 2024
a8c4d77
Fixed Local Intensity Scale (LIS) for super small inputs
krzysg Aug 14, 2024
e6e4327
ParticleCellTreeCuda is now main stuff for CUDA
krzysg Aug 19, 2024
00aac97
computeOvpcCuda now using 'stream' instead of hardcoded values
krzysg Aug 20, 2024
1fba1bc
ParticleCellTreeCuda moved and handle now cpu2gpu transfer
krzysg Aug 20, 2024
3474250
LinearAccessCuda is now using ParticleCellTreeCuda
krzysg Aug 20, 2024
1d4e549
OVPC added to GpuTask
krzysg Aug 21, 2024
9ff0580
Full GPU pipeline works1
krzysg Aug 21, 2024
c10225d
Some debug prints removed
krzysg Aug 21, 2024
6b7a87d
Test for full pipeline cleaned up
krzysg Aug 21, 2024
3c601be
doAll() removed from Gpu pipeline
krzysg Aug 21, 2024
d2fd1d0
GPU pipeline now works for APRConverter!
krzysg Aug 22, 2024
9604c63
Linear acces now is using correct cuda stream, bspline params are com…
krzysg Mar 17, 2025
9572e10
added error handling for bspline y-dir
krzysg Mar 19, 2025
514c03c
Removed (most of the) warnings
krzysg Mar 19, 2025
005a4ba
added error handling for bspline x-dir, other steps temporarily blocked
krzysg Jun 25, 2025
5534860
added error handling for bspline z-dir
krzysg Jun 25, 2025
1b54cd0
Stream operations on GPU are working now as expected. All tests are f…
krzysg Aug 1, 2025
7f6e2d3
Reverting APRCOnverter type to previoius value
krzysg Aug 5, 2025
b01df31
Fixed CUDA-streams sync issues when copying back to CPU
krzysg Aug 5, 2025
bf4cdb0
Little bit cleanup of CUDA in APRConverter, Added move assignment ope…
krzysg Aug 6, 2025
cd7d594
Fixed move construtor/assignment - pinned memory is now also moved.
krzysg Aug 11, 2025
0c702f3
Fixes needed by CUDA 13.0 - now code compiles
krzysg Aug 19, 2025
1f27876
Initial impl. of CUDA multistreams, it takes many images but STILL on…
krzysg Aug 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ project(APR DESCRIPTION "Adaptive Particle Representation library")

message(STATUS "CMAKE VERSION ${CMAKE_VERSION}")

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

if(POLICY CMP0135)
Expand Down Expand Up @@ -171,17 +171,17 @@ if(WIN32)


else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 ")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 ")

if(CMAKE_COMPILER_IS_GNUCC)
set(CMAKE_CXX_FLAGS_RELEASE "-O4 -ffast-math")
set(CMAKE_CXX_FLAGS_RELEASE "-O4")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g -Wall -pedantic")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Bdynamic")
if(NOT WIN32)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -ldl -lz")
endif()
elseif (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -ffast-math")
set(CMAKE_CXX_FLAGS_RELEASE "-O3")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g -Wall -pedantic")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -lz")
endif()
Expand Down Expand Up @@ -209,10 +209,11 @@ set_property(TARGET aprObjLib PROPERTY POSITION_INDEPENDENT_CODE ON)

if(APR_USE_CUDA)
message(STATUS "APR: Building CUDA for APR")
set(CMAKE_CUDA_STANDARD 14)
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_RUNTIME_LIBRARY "Static")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --default-stream per-thread -Xptxas -v -DAPR_USE_CUDA")
set(CMAKE_CUDA_FLAGS_RELEASE "-O3 --use_fast_math") # -lineinfo for profiling
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -rdc=true --fmad=false --default-stream per-thread -Wno-deprecated-gpu-targets -Xptxas -v -DAPR_USE_CUDA")
set(CMAKE_CUDA_FLAGS_RELEASE "-O3") # -lineinfo for profiling
set(CMAKE_CUDA_FLAGS_DEBUG "-O0 -g -G")
if(APR_BENCHMARK)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -DAPR_BENCHMARK")
Expand All @@ -226,6 +227,7 @@ if(APR_USE_CUDA)
src/algorithm/LocalIntensityScale.cu
src/algorithm/OVPC.cu
src/data_structures/APR/access/GPUAccess.cu
src/data_structures/APR/access/LinearAccessCuda.cu
src/numerics/miscCuda.cu
src/numerics/APRDownsampleGPU.cu
src/numerics/PixelNumericsGPU.cu
Expand All @@ -241,6 +243,7 @@ if(APR_BUILD_STATIC_LIB)
# generate static library used as a intermediate step in generating fat lib
set(STATIC_TARGET_NAME staticLib)
add_library(${STATIC_TARGET_NAME} STATIC $<TARGET_OBJECTS:aprObjLib> ${APR_CUDA_SOURCE_FILES})
set_property(TARGET ${STATIC_TARGET_NAME} PROPERTY CUDA_ARCHITECTURES OFF)
target_compile_features(${STATIC_TARGET_NAME} PUBLIC cxx_std_14)
set_target_properties(${STATIC_TARGET_NAME} PROPERTIES OUTPUT_NAME ${LIBRARY_NAME})
set_target_properties(${STATIC_TARGET_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION OFF)
Expand All @@ -262,7 +265,7 @@ if(APR_BUILD_SHARED_LIB)
# generate fat shared library
set(SHARED_TARGET_NAME sharedLib)
add_library(${SHARED_TARGET_NAME} SHARED $<TARGET_OBJECTS:aprObjLib> ${APR_CUDA_SOURCE_FILES})

set_property(TARGET ${SHARED_TARGET_NAME} PROPERTY CUDA_ARCHITECTURES OFF)
target_include_directories(${SHARED_TARGET_NAME} PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src> $<BUILD_INTERFACE:${PROJECT_BINARY_DIR}>)
set_target_properties(${SHARED_TARGET_NAME} PROPERTIES OUTPUT_NAME ${LIBRARY_NAME})
set_target_properties(${SHARED_TARGET_NAME} PROPERTIES LIBRARY_OUTPUT_NAME ${LIBRARY_NAME})
Expand Down
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
macro(buildTarget TARGET)
add_executable(${TARGET} ${TARGET}.cpp)
set_property(TARGET ${TARGET} PROPERTY CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(${TARGET} ${HDF5_LIBRARIES} ${TIFF_LIBRARIES} ${APR_BUILD_LIBRARY} Threads::Threads ${OPENMP_LINK})
endmacro(buildTarget)

Expand Down
2 changes: 1 addition & 1 deletion examples/Example_get_apr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ int runAPR(cmdLineOptions options) {
//the apr datastructure
APR apr;

APRConverter<float> aprConverter;
APRConverter<uint16_t> aprConverter;

//read in the command line options into the parameters file
aprConverter.par.Ip_th = options.Ip_th;
Expand Down
2 changes: 1 addition & 1 deletion examples/Example_get_apr.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ struct cmdLineOptions{
bool auto_parameters = false;

float Ip_th = 0;
float lambda = -1;
float lambda = 3.0;
float sigma_th = 0;
float rel_error = 0.1;
float grad_th = 1;
Expand Down
Loading
Loading