-
Notifications
You must be signed in to change notification settings - Fork 0
Implement unboxed_ary
to avoid repeated access to packed data
#1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
byroot
wants to merge
2
commits into
master
Choose a base branch
from
ary-batch-info
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7e84d79
to
29e6af0
Compare
Ruby objects use a lot of bit packing to save on memory, and also have multiple layouts, it's particularly true for Arrays which can be embedded, heap allocated, or shared. This saves memory, but can cause a lot of execution overhead when the same piece of information is packed and unpacked many times. Instead if we primarily work with a "normalized" representation of the object, that we keep in sync with the actual RArray, we can save quite a bit of overhead as showcase in the benchmark. Of course applying this to more than a couple functions is a lot of work and it's unclear whether it's worth it or not. ``` $ make -j benchmark ITEM="rb_ary_push" BUILT_RUBY="./miniruby" COMPARE_RUBY="./miniruby-baseline" /opt/rubies/3.3.4/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./miniruby-baseline -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'rb_ary_push' -o -name '*rb_ary_push*.yml' -o -name '*rb_ary_push*.rb' | sort) compare-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info e9407cf) +PRISM [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info 46757e76db) +PRISM [arm64-darwin23] warming up.. | |compare-ruby|built-ruby| |:-------|-----------:|---------:| |64*2k | 848.804| 1.151k| | | -| 1.36x| |128*1k | 1.651k| 2.217k| | | -| 1.34x| ```
29e6af0
to
3ac5702
Compare
I think I got bit again by
It's still substantial, but probably not quite enough to justify such a huge refactoring. |
byroot
pushed a commit
that referenced
this pull request
Dec 30, 2024
When searching for native extensions, if the name does not end in ".so" then we create a new string and append ".so" so it. If the native extension is in static_ext_inits, then we could trigger a GC in the rb_filesystem_str_new_cstr. This could cuase the GC to free lookup_name since we don't use the local variable anymore. This bug was caught in this ASAN build: http://ci.rvm.jp/results/trunk_asan@ruby-sp1/5479182 ==435614==ERROR: AddressSanitizer: use-after-poison on address 0x715a63022da0 at pc 0x5e7463873e4e bp 0x7fff383c8b00 sp 0x7fff383c82c0 READ of size 14 at 0x715a63022da0 thread T0 #0 0x5e7463873e4d in __asan_memcpy (/tmp/ruby/build/trunk_asan/ruby+0x214e4d) (BuildId: 607411c0626a2f66b4c20c02179b346aace20898) #1 0x5e7463b50a82 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10 ruby#2 0x5e7463b50a82 in ruby_nonempty_memcpy /tmp/ruby/src/trunk_asan/include/ruby/internal/memory.h:671:16 ruby#3 0x5e7463b50a82 in str_enc_new /tmp/ruby/src/trunk_asan/string.c:1035:9 ruby#4 0x5e74639b97dd in search_required /tmp/ruby/src/trunk_asan/load.c:1126:21 ruby#5 0x5e74639b97dd in require_internal /tmp/ruby/src/trunk_asan/load.c:1274:17 ruby#6 0x5e74639b83c1 in rb_require_string_internal /tmp/ruby/src/trunk_asan/load.c:1401:22 ruby#7 0x5e74639b83c1 in rb_require_string /tmp/ruby/src/trunk_asan/load.c:1387:12
byroot
pushed a commit
that referenced
this pull request
May 4, 2025
…uby#13231) This change addresses the following ASAN error: ``` ==36597==ERROR: AddressSanitizer: heap-use-after-free on address 0x512000396ba8 at pc 0x7fcad5cbad9f bp 0x7fff19739af0 sp 0x7fff19739ae8 WRITE of size 8 at 0x512000396ba8 thread T0 [643/756] 36600=optparse/test_summary #0 0x7fcad5cbad9e in free_fast_fallback_getaddrinfo_entry /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/raddrinfo.c:3046:22 #1 0x7fcad5c9fb48 in fast_fallback_inetsock_cleanup /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1179:17 ruby#2 0x7fcadf3b611a in rb_ensure /home/runner/work/ruby-dev-builder/ruby-dev-builder/eval.c:1081:5 ruby#3 0x7fcad5c9b44b in rsock_init_inetsock /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1289:20 ruby#4 0x7fcad5ca22b8 in tcp_init /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/tcpsocket.c:76:12 ruby#5 0x7fcadf83ba70 in vm_call0_cfunc_with_frame /home/runner/work/ruby-dev-builder/ruby-dev-builder/./vm_eval.c:164:15 ... ``` A `struct fast_fallback_getaddrinfo_shared` is shared between the main thread and two child threads. This struct contains an array of `fast_fallback_getaddrinfo_entry`. `fast_fallback_getaddrinfo_entry` and `fast_fallback_getaddrinfo_shared` were freed separately, and if `fast_fallback_getaddrinfo_shared` was freed first and then an attempt was made to free a `fast_fallback_getaddrinfo_entry`, a `heap-use-after-free` could occur. This change avoids that possibility by separating the deallocation of the addrinfo memory held by `fast_fallback_getaddrinfo_entry` from the access and lifecycle of the `fast_fallback_getaddrinfo_entry` itself.
byroot
pushed a commit
that referenced
this pull request
May 9, 2025
[Bug #18119] When we create classes, it pushes the class to the subclass list of the superclass. This access needs to be synchronized because multiple Ractors may be creating classes with the same superclass, which would cause race conditions and cause the linked list to be corrupted. For example, we can reproduce with this script crashing: workers = (0...8).map do Ractor.new do loop do 100.times.map { Class.new } Ractor.yield nil end end end 100.times { Ractor.select(*workers) } With ASAN enabled, we can see that there are use-after-free errors: ==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88 WRITE of size 8 at 0x5030000974f0 thread T4 #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24 #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9 ruby#2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21 ruby#3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13 ruby#4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9 ruby#5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13 ruby#6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13 ruby#7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13 ruby#8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9 ruby#9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38 ... 0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8) freed by thread T4 here: #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233) #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9 ruby#2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13 ruby#3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5 ruby#4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9 ruby#5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9 ruby#6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21 ruby#7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13 ruby#8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9 ruby#9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13 ... previously allocated by thread T5 here: #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233) #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12 ruby#2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5 ruby#3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12 ruby#4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34 ruby#5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13 ruby#6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38 ruby#7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9 ruby#8 0x62f9e56fca7a in make_metaclass class.c:786:5 ruby#9 0x62f9e59db982 in rb_class_initialize object.c:2101:5
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ruby objects use a lot of bit packing to save on memory, and also have multiple layouts, it's particularly true for Arrays which can be embedded, heap allocated, or shared.
This saves memory, but can cause a lot of execution overhead when the same piece of information is packed and unpacked many times. As functions often unpack data for themselves, but then call other sub routines that need to do the same unpacking again.
Instead if we primarily work with a "normalized" representation of the object, that we keep in sync with the actual RArray, we can save quite a bit of overhead as showcase in the benchmark.
Of course applying this to more than a couple functions is a lot of work and it's unclear whether it's worth it or not.
Also, whenever code using this yield or call into arbitrary Ruby code, we have to assume the unboxed representation may have fallen out of sync. So it may not work so well in some cases.
Profile before the change: https://share.firefox.dev/3ZIgQsw
We can see lots of
FL_*
calls, several of them repeated.Whereas after the change: https://share.firefox.dev/3Zq7nVn
Most costly
FL_*
calls have been eliminated, as most of the frequent checks have been batched in theuary_*
methods.