-
-
Notifications
You must be signed in to change notification settings - Fork 68
Add CUSOLVERRF.jl integration for GPU-accelerated sparse LU factorization #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1cdb1be
e40ad85
6c1633d
0e2e254
e61de2f
eee3ff4
cc7911b
235e333
f784d42
0ac5d28
d7f1f8c
0a075fe
1c1e917
b92906c
e88bad8
82fbc55
7a8dac7
288d382
62bc9ae
d559e8b
5175137
f1f3bb8
6db7c55
b8ca961
6a96db1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -121,6 +121,30 @@ sol = LS.solve(prob, LS.LUFactorization()) | |||||||||||||
|
||||||||||||||
For now, CUDSS only supports CuSparseMatrixCSR type matrices. | ||||||||||||||
|
||||||||||||||
For high-performance sparse LU factorization on GPUs, you can also use CUSOLVERRF.jl: | ||||||||||||||
|
||||||||||||||
```julia | ||||||||||||||
using CUSOLVERRF | ||||||||||||||
sol = LS.solve(prob, LS.CUSOLVERRFFactorization()) | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
CUSOLVERRF provides access to NVIDIA's cusolverRF library, which offers significant | ||||||||||||||
performance improvements for sparse LU factorization on GPUs. It supports both | ||||||||||||||
`:RF` (default) and `:KLU` symbolic factorization methods, and can reuse symbolic | ||||||||||||||
Comment on lines
+131
to
+133
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||||
factorization for matrices with the same sparsity pattern: | ||||||||||||||
|
||||||||||||||
```julia | ||||||||||||||
# Use KLU for symbolic factorization | ||||||||||||||
sol = LS.solve(prob, LS.CUSOLVERRFFactorization(symbolic = :KLU)) | ||||||||||||||
|
||||||||||||||
# Reuse symbolic factorization for better performance | ||||||||||||||
sol = LS.solve(prob, LS.CUSOLVERRFFactorization(reuse_symbolic = true)) | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
!!! note | ||||||||||||||
|
||||||||||||||
CUSOLVERRF only supports `Float64` element types with `Int32` indices. | ||||||||||||||
|
||||||||||||||
Note that `KrylovJL` methods also work with sparse GPU arrays: | ||||||||||||||
|
||||||||||||||
```julia | ||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,89 @@ | ||||||||||||
module LinearSolveCUSOLVERRFExt | ||||||||||||
|
||||||||||||
using LinearSolve: LinearSolve, @get_cacheval, pattern_changed, OperatorAssumptions | ||||||||||||
using CUSOLVERRF: CUSOLVERRF, RFLU, CUDA | ||||||||||||
using SparseArrays: SparseArrays, SparseMatrixCSC, nnz | ||||||||||||
using CUSOLVERRF.CUDA.CUSPARSE: CuSparseMatrixCSR | ||||||||||||
using LinearAlgebra: LinearAlgebra, Adjoint, ldiv!, lu! | ||||||||||||
using SciMLBase: SciMLBase, LinearProblem, ReturnCode | ||||||||||||
|
||||||||||||
function LinearSolve.init_cacheval(alg::LinearSolve.CUSOLVERRFFactorization, | ||||||||||||
A, b, u, Pl, Pr, | ||||||||||||
maxiters::Int, abstol, reltol, | ||||||||||||
verbose::Bool, assumptions::OperatorAssumptions) | ||||||||||||
nothing | ||||||||||||
end | ||||||||||||
|
||||||||||||
function LinearSolve.init_cacheval(alg::LinearSolve.CUSOLVERRFFactorization, | ||||||||||||
A::Union{CuSparseMatrixCSR{Float64, Int32}, SparseMatrixCSC{Float64, <:Integer}}, | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
b, u, Pl, Pr, | ||||||||||||
maxiters::Int, abstol, reltol, | ||||||||||||
verbose::Bool, assumptions::OperatorAssumptions) | ||||||||||||
# Create initial factorization with appropriate options | ||||||||||||
nrhs = b isa AbstractMatrix ? size(b, 2) : 1 | ||||||||||||
symbolic = alg.symbolic | ||||||||||||
# Convert to CuSparseMatrixCSR if needed | ||||||||||||
A_gpu = A isa CuSparseMatrixCSR ? A : CuSparseMatrixCSR(A) | ||||||||||||
RFLU(A_gpu; nrhs=nrhs, symbolic=symbolic) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
end | ||||||||||||
|
||||||||||||
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::LinearSolve.CUSOLVERRFFactorization; kwargs...) | ||||||||||||
A = cache.A | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
# Convert to appropriate GPU format if needed | ||||||||||||
if A isa SparseMatrixCSC | ||||||||||||
A_gpu = CuSparseMatrixCSR(A) | ||||||||||||
elseif A isa CuSparseMatrixCSR | ||||||||||||
A_gpu = A | ||||||||||||
else | ||||||||||||
error("CUSOLVERRFFactorization only supports SparseMatrixCSC or CuSparseMatrixCSR matrices") | ||||||||||||
end | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
if cache.isfresh | ||||||||||||
cacheval = @get_cacheval(cache, :CUSOLVERRFFactorization) | ||||||||||||
if cacheval === nothing | ||||||||||||
# Create new factorization | ||||||||||||
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1 | ||||||||||||
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
else | ||||||||||||
# Reuse symbolic factorization if pattern hasn't changed | ||||||||||||
if alg.reuse_symbolic && !pattern_changed(cacheval, A_gpu) | ||||||||||||
fact = cacheval | ||||||||||||
lu!(fact, A_gpu) | ||||||||||||
else | ||||||||||||
# Create new factorization if pattern changed | ||||||||||||
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1 | ||||||||||||
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
end | ||||||||||||
end | ||||||||||||
cache.cacheval = fact | ||||||||||||
cache.isfresh = false | ||||||||||||
end | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
F = @get_cacheval(cache, :CUSOLVERRFFactorization) | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
# Ensure b and u are on GPU | ||||||||||||
b_gpu = cache.b isa CUDA.CuArray ? cache.b : CUDA.CuArray(cache.b) | ||||||||||||
u_gpu = cache.u isa CUDA.CuArray ? cache.u : CUDA.CuArray(cache.u) | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
# Solve | ||||||||||||
copyto!(u_gpu, b_gpu) | ||||||||||||
ldiv!(F, u_gpu) | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
# Copy back to CPU if needed | ||||||||||||
if !(cache.u isa CUDA.CuArray) | ||||||||||||
copyto!(cache.u, u_gpu) | ||||||||||||
end | ||||||||||||
|
||||||||||||
SciMLBase.build_linear_solution(alg, cache.u, nothing, cache; retcode = ReturnCode.Success) | ||||||||||||
Comment on lines
+77
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||||||||
end | ||||||||||||
|
||||||||||||
# Helper function for pattern checking | ||||||||||||
function LinearSolve.pattern_changed(rf::RFLU, A::CuSparseMatrixCSR) | ||||||||||||
# For CUSOLVERRF, we need to check if the sparsity pattern has changed | ||||||||||||
# This is a simplified check - you might need a more sophisticated approach | ||||||||||||
size(rf) != size(A) || nnz(rf.M) != nnz(A) | ||||||||||||
end | ||||||||||||
|
||||||||||||
|
||||||||||||
end | ||||||||||||
Comment on lines
+88
to
+89
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,67 @@ | ||||||
using LinearSolve | ||||||
using CUSOLVERRF | ||||||
using CUDA | ||||||
using SparseArrays | ||||||
using LinearAlgebra | ||||||
using Test | ||||||
|
||||||
@testset "CUSOLVERRFFactorization" begin | ||||||
# Skip tests if CUDA is not available | ||||||
if !CUDA.functional() | ||||||
@info "CUDA not available, skipping CUSOLVERRF tests" | ||||||
return | ||||||
end | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test with a small sparse matrix | ||||||
n = 100 | ||||||
A = sprand(n, n, 0.1) + I | ||||||
b = rand(n) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test with CPU sparse matrix (should auto-convert to GPU) | ||||||
@testset "CPU Sparse Matrix" begin | ||||||
prob = LinearProblem(A, b) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test with default symbolic (:RF) | ||||||
sol = solve(prob, CUSOLVERRFFactorization()) | ||||||
@test norm(A * sol.u - b) / norm(b) < 1e-10 | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test with KLU symbolic | ||||||
sol_klu = solve(prob, CUSOLVERRFFactorization(symbolic = :KLU)) | ||||||
@test norm(A * sol_klu.u - b) / norm(b) < 1e-10 | ||||||
end | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test with GPU sparse matrix | ||||||
@testset "GPU Sparse Matrix" begin | ||||||
A_gpu = CUDA.CUSPARSE.CuSparseMatrixCSR(A) | ||||||
b_gpu = CuArray(b) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
prob_gpu = LinearProblem(A_gpu, b_gpu) | ||||||
sol_gpu = solve(prob_gpu, CUSOLVERRFFactorization()) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Check residual on GPU | ||||||
res_gpu = A_gpu * sol_gpu.u - b_gpu | ||||||
@test norm(res_gpu) / norm(b_gpu) < 1e-10 | ||||||
end | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test matrix update with same sparsity pattern | ||||||
@testset "Matrix Update" begin | ||||||
# Create a new matrix with same pattern but different values | ||||||
A2 = A + 0.1 * sprand(n, n, 0.01) | ||||||
b2 = rand(n) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
prob2 = LinearProblem(A2, b2) | ||||||
sol2 = solve(prob2, CUSOLVERRFFactorization(reuse_symbolic = true)) | ||||||
@test norm(A2 * sol2.u - b2) / norm(b2) < 1e-10 | ||||||
end | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# Test error handling for unsupported types | ||||||
@testset "Error Handling" begin | ||||||
# Test with Float32 (not supported) | ||||||
A_f32 = Float32.(A) | ||||||
b_f32 = Float32.(b) | ||||||
prob_f32 = LinearProblem(A_f32, b_f32) | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
||||||
# This should error since CUSOLVERRF only supports Float64 | ||||||
@test_throws Exception solve(prob_f32, CUSOLVERRFFactorization()) | ||||||
end | ||||||
end | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [JuliaFormatter] reported by reviewdog 🐶
Suggested change
|
Uh oh!
There was an error while loading. Please reload this page.