Skip to content

use per-task (not per-thread) flags, rounding mode, etcetera? #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevengj opened this issue Mar 22, 2022 · 3 comments · Fixed by #185
Closed

use per-task (not per-thread) flags, rounding mode, etcetera? #153

stevengj opened this issue Mar 22, 2022 · 3 comments · Fixed by #185

Comments

@stevengj
Copy link
Member

Right now we have an array of these, one per thread. This may no longer be safe since tasks can now migrate between threads.

Would be better to have this per task?

@jmkuhn
Copy link
Collaborator

jmkuhn commented Jan 28, 2025

After Julia JuliaLang/julia#57087, tests fail on arm64-apple-darwin22.6.0.

julia> versioninfo()
Julia Version 1.12.0-DEV.1942
Commit fbe8656579 (2025-01-28 03:43 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.6.0)
  CPU: 12 × Apple M2 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
  GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)

(@v1.12) pkg> test DecFP
     Testing DecFP
...
     Testing Running tests...
[ Info: TESTING Dec32    nthreads = 8 ...
[ Info: TESTING Dec64    nthreads = 8 ...
Error During Test at /Users/john.m.kuhn/.julia/packages/DecFP/up2RW/test/runtests.jl:17
  Test threw exception
  Expression: #= /Users/john.m.kuhn/.julia/packages/DecFP/up2RW/test/runtests.jl:17 =# @sprintf("%.0f", T(i)) == string(i)
  BoundsError: attempt to access 8-element Vector{DecFP.DecFPRoundingMode} at index [9]
  Stacktrace:
   [1] throw_boundserror(A::Vector{DecFP.DecFPRoundingMode}, I::Tuple{Int64})
     @ Base ./essentials.jl:15
   [2] getindex
     @ ./essentials.jl:916 [inlined]
   [3] Dec64
     @ ~/.julia/packages/DecFP/up2RW/src/DecFP.jl:628 [inlined]
   [4] macro expansion
     @ ~/opt/julia/1.12-DEV-thread/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:676 [inlined]
   [5] testthreads(T::Type{Dec64}, i::Int64, mode::RoundingMode{:Down})
     @ Main ~/.julia/packages/DecFP/up2RW/test/runtests.jl:17
ERROR: LoadError: TaskFailedException

    nested task error: There was an error during testing
Stacktrace:
 [1] threading_run(fun::var"#9#10"{var"#11#12"{Type{Dec64}, Vector{Tuple{Int64, RoundingMode}}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:196
 [2] macro expansion
   @ ./threadingconstructs.jl:213 [inlined]
 [3] top-level scope
   @ ~/.julia/packages/DecFP/up2RW/test/runtests.jl:454
 [4] include(mapexpr::Function, mod::Module, _path::String)
   @ Base ./Base.jl:304
 [5] top-level scope
   @ none:6
 [6] eval(m::Module, e::Any)
   @ Core ./boot.jl:486
 [7] exec_options(opts::Base.JLOptions)
   @ Base ./client.jl:295
 [8] _start()
   @ Base ./client.jl:557
in expression starting at /Users/john.m.kuhn/.julia/packages/DecFP/up2RW/test/runtests.jl:32
ERROR: Package DecFP errored during testing

@stevengj
Copy link
Member Author

stevengj commented Jan 28, 2025

e.g. instead of looking up roundingmode[Threads.threadid()], use

get!(task_local_storage(), ROUNDINGMODE, DecFPRoundNearest)

where const ROUNDINGMODE = :DecFP_roundingmode_abb78e082af23329 is a unique key symbol (since this is an IdDict), e.g. tagged with the last 64 bits 0xabb78e082af23329 of the package UUID to eliminate accidental collisions.

Unfortunately, roundingmode[Threads.threadid()] is 2.5ns on my machine while get!(task_local_storage(), ROUNDINGMODE, ...) is 15ns. In contrast, a Dec64 multiplication takes 20ns, so this will lead to almost a 2x slowdown in some operations. 😢

The only alternative I see would be to make it a global constant, and tell people to be wary of changing the rounding mode in multi-threaded programs.

That doesn't work for the exception flags. In principle we could allocate a new Ref{UInt32}() object on each call, but that means that all DecFP math code will allocate like crazy? Update: no, Julia stack-allocates a local Ref these days, it seems, as long as escape-analysis works.

@stevengj
Copy link
Member Author

stevengj commented Jan 28, 2025

Good news: in benchmarks of actual calculations, the overhead of a task-local rounding mode seems much lower than I feared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants