-
Notifications
You must be signed in to change notification settings - Fork 0
Implement unboxed_ary
to avoid repeated access to packed data
#1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
byroot
wants to merge
2
commits into
master
Choose a base branch
from
ary-batch-info
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7e84d79
to
29e6af0
Compare
Ruby objects use a lot of bit packing to save on memory, and also have multiple layouts, it's particularly true for Arrays which can be embedded, heap allocated, or shared. This saves memory, but can cause a lot of execution overhead when the same piece of information is packed and unpacked many times. Instead if we primarily work with a "normalized" representation of the object, that we keep in sync with the actual RArray, we can save quite a bit of overhead as showcase in the benchmark. Of course applying this to more than a couple functions is a lot of work and it's unclear whether it's worth it or not. ``` $ make -j benchmark ITEM="rb_ary_push" BUILT_RUBY="./miniruby" COMPARE_RUBY="./miniruby-baseline" /opt/rubies/3.3.4/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./miniruby-baseline -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'rb_ary_push' -o -name '*rb_ary_push*.yml' -o -name '*rb_ary_push*.rb' | sort) compare-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info e9407cf) +PRISM [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info 46757e76db) +PRISM [arm64-darwin23] warming up.. | |compare-ruby|built-ruby| |:-------|-----------:|---------:| |64*2k | 848.804| 1.151k| | | -| 1.36x| |128*1k | 1.651k| 2.217k| | | -| 1.34x| ```
29e6af0
to
3ac5702
Compare
I think I got bit again by
It's still substantial, but probably not quite enough to justify such a huge refactoring. |
byroot
pushed a commit
that referenced
this pull request
Dec 30, 2024
When searching for native extensions, if the name does not end in ".so" then we create a new string and append ".so" so it. If the native extension is in static_ext_inits, then we could trigger a GC in the rb_filesystem_str_new_cstr. This could cuase the GC to free lookup_name since we don't use the local variable anymore. This bug was caught in this ASAN build: http://ci.rvm.jp/results/trunk_asan@ruby-sp1/5479182 ==435614==ERROR: AddressSanitizer: use-after-poison on address 0x715a63022da0 at pc 0x5e7463873e4e bp 0x7fff383c8b00 sp 0x7fff383c82c0 READ of size 14 at 0x715a63022da0 thread T0 #0 0x5e7463873e4d in __asan_memcpy (/tmp/ruby/build/trunk_asan/ruby+0x214e4d) (BuildId: 607411c0626a2f66b4c20c02179b346aace20898) #1 0x5e7463b50a82 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10 ruby#2 0x5e7463b50a82 in ruby_nonempty_memcpy /tmp/ruby/src/trunk_asan/include/ruby/internal/memory.h:671:16 ruby#3 0x5e7463b50a82 in str_enc_new /tmp/ruby/src/trunk_asan/string.c:1035:9 ruby#4 0x5e74639b97dd in search_required /tmp/ruby/src/trunk_asan/load.c:1126:21 ruby#5 0x5e74639b97dd in require_internal /tmp/ruby/src/trunk_asan/load.c:1274:17 ruby#6 0x5e74639b83c1 in rb_require_string_internal /tmp/ruby/src/trunk_asan/load.c:1401:22 ruby#7 0x5e74639b83c1 in rb_require_string /tmp/ruby/src/trunk_asan/load.c:1387:12
byroot
pushed a commit
that referenced
this pull request
May 4, 2025
…uby#13231) This change addresses the following ASAN error: ``` ==36597==ERROR: AddressSanitizer: heap-use-after-free on address 0x512000396ba8 at pc 0x7fcad5cbad9f bp 0x7fff19739af0 sp 0x7fff19739ae8 WRITE of size 8 at 0x512000396ba8 thread T0 [643/756] 36600=optparse/test_summary #0 0x7fcad5cbad9e in free_fast_fallback_getaddrinfo_entry /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/raddrinfo.c:3046:22 #1 0x7fcad5c9fb48 in fast_fallback_inetsock_cleanup /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1179:17 ruby#2 0x7fcadf3b611a in rb_ensure /home/runner/work/ruby-dev-builder/ruby-dev-builder/eval.c:1081:5 ruby#3 0x7fcad5c9b44b in rsock_init_inetsock /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1289:20 ruby#4 0x7fcad5ca22b8 in tcp_init /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/tcpsocket.c:76:12 ruby#5 0x7fcadf83ba70 in vm_call0_cfunc_with_frame /home/runner/work/ruby-dev-builder/ruby-dev-builder/./vm_eval.c:164:15 ... ``` A `struct fast_fallback_getaddrinfo_shared` is shared between the main thread and two child threads. This struct contains an array of `fast_fallback_getaddrinfo_entry`. `fast_fallback_getaddrinfo_entry` and `fast_fallback_getaddrinfo_shared` were freed separately, and if `fast_fallback_getaddrinfo_shared` was freed first and then an attempt was made to free a `fast_fallback_getaddrinfo_entry`, a `heap-use-after-free` could occur. This change avoids that possibility by separating the deallocation of the addrinfo memory held by `fast_fallback_getaddrinfo_entry` from the access and lifecycle of the `fast_fallback_getaddrinfo_entry` itself.
byroot
pushed a commit
that referenced
this pull request
May 9, 2025
[Bug #18119] When we create classes, it pushes the class to the subclass list of the superclass. This access needs to be synchronized because multiple Ractors may be creating classes with the same superclass, which would cause race conditions and cause the linked list to be corrupted. For example, we can reproduce with this script crashing: workers = (0...8).map do Ractor.new do loop do 100.times.map { Class.new } Ractor.yield nil end end end 100.times { Ractor.select(*workers) } With ASAN enabled, we can see that there are use-after-free errors: ==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88 WRITE of size 8 at 0x5030000974f0 thread T4 #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24 #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9 ruby#2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21 ruby#3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13 ruby#4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9 ruby#5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13 ruby#6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13 ruby#7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13 ruby#8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9 ruby#9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38 ... 0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8) freed by thread T4 here: #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233) #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9 ruby#2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13 ruby#3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5 ruby#4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9 ruby#5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9 ruby#6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21 ruby#7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13 ruby#8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9 ruby#9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13 ... previously allocated by thread T5 here: #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233) #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12 ruby#2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5 ruby#3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12 ruby#4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34 ruby#5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13 ruby#6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38 ruby#7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9 ruby#8 0x62f9e56fca7a in make_metaclass class.c:786:5 ruby#9 0x62f9e59db982 in rb_class_initialize object.c:2101:5
byroot
pushed a commit
that referenced
this pull request
Jun 17, 2025
Fix use-after-free in constant cache [Bug #20921] When we create a cache entry for a constant, the following sequence of events could happen: - vm_track_constant_cache is called to insert a constant cache. - In vm_track_constant_cache, we first look up the ST table for the ID of the constant. Assume the ST table exists because another iseq also holds a cache entry for this ID. - We then insert into this ST table with the iseq_inline_constant_cache. - However, while inserting into this ST table, it allocates memory, which could trigger a GC. Assume that it does trigger a GC. - The GC frees the one and only other iseq that holds a cache entry for this ID. - In remove_from_constant_cache, it will appear that the ST table is now empty because there are no more iseq with cache entries for this ID, so we free the ST table. - We complete GC and continue our st_insert. However, this ST table has been freed so we now have a use-after-free. This issue is very hard to reproduce, because it requires that the GC runs at a very specific time. However, we can make it show up by applying this patch which runs GC right before the st_insert to mimic the st_insert triggering a GC: diff --git a/vm_insnhelper.c b/vm_insnhelper.c index 3cb23f0..a93998136a 100644 --- a/vm_insnhelper.c +++ b/vm_insnhelper.c @@ -6338,6 +6338,10 @@ vm_track_constant_cache(ID id, void *ic) rb_id_table_insert(const_cache, id, (VALUE)ics); } + if (id == rb_intern("MyConstant")) rb_gc(); + st_insert(ics, (st_data_t) ic, (st_data_t) Qtrue); } And if we run this script: Object.const_set("MyConstant", "Hello!") my_proc = eval("-> { MyConstant }") my_proc.call my_proc = eval("-> { MyConstant }") my_proc.call We can see that ASAN outputs a use-after-free error: ==36540==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000049528 at pc 0x000102f3ceac bp 0x00016d607a70 sp 0x00016d607a68 READ of size 8 at 0x606000049528 thread T0 #0 0x102f3cea8 in do_hash st.c:321 #1 0x102f3ddd0 in rb_st_insert st.c:1132 ruby#2 0x103140700 in vm_track_constant_cache vm_insnhelper.c:6345 ruby#3 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 ruby#4 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 ruby#5 0x1030bc1e0 in vm_exec_core insns.def:263 ruby#6 0x1030b55fc in rb_vm_exec vm.c:2585 ruby#7 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 ruby#8 0x102a82588 in rb_ec_exec_node eval.c:281 ruby#9 0x102a81fe0 in ruby_run_node eval.c:319 ruby#10 0x1027f3db4 in rb_main main.c:43 ruby#11 0x1027f3bd4 in main main.c:68 ruby#12 0x183900270 (<unknown module>) 0x606000049528 is located 8 bytes inside of 56-byte region [0x606000049520,0x606000049558) freed by thread T0 here: #0 0x104174d40 in free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54d40) #1 0x102ada89c in rb_gc_impl_free default.c:8183 ruby#2 0x102ada7dc in ruby_sized_xfree gc.c:4507 ruby#3 0x102ac4d34 in ruby_xfree gc.c:4518 ruby#4 0x102f3cb34 in rb_st_free_table st.c:663 ruby#5 0x102bd52d8 in remove_from_constant_cache iseq.c:119 ruby#6 0x102bbe2cc in iseq_clear_ic_references iseq.c:153 ruby#7 0x102bbd2a0 in rb_iseq_free iseq.c:166 ruby#8 0x102b32ed0 in rb_imemo_free imemo.c:564 ruby#9 0x102ac4b44 in rb_gc_obj_free gc.c:1407 ruby#10 0x102af4290 in gc_sweep_plane default.c:3546 ruby#11 0x102af3bdc in gc_sweep_page default.c:3634 ruby#12 0x102aeb140 in gc_sweep_step default.c:3906 ruby#13 0x102aeadf0 in gc_sweep_rest default.c:3978 ruby#14 0x102ae4714 in gc_sweep default.c:4155 ruby#15 0x102af8474 in gc_start default.c:6484 ruby#16 0x102afbe30 in garbage_collect default.c:6363 ruby#17 0x102ad37f0 in rb_gc_impl_start default.c:6816 ruby#18 0x102ad3634 in rb_gc gc.c:3624 ruby#19 0x1031406ec in vm_track_constant_cache vm_insnhelper.c:6342 #20 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 ruby#21 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 ruby#22 0x1030bc1e0 in vm_exec_core insns.def:263 ruby#23 0x1030b55fc in rb_vm_exec vm.c:2585 ruby#24 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 ruby#25 0x102a82588 in rb_ec_exec_node eval.c:281 ruby#26 0x102a81fe0 in ruby_run_node eval.c:319 ruby#27 0x1027f3db4 in rb_main main.c:43 ruby#28 0x1027f3bd4 in main main.c:68 ruby#29 0x183900270 (<unknown module>) previously allocated by thread T0 here: #0 0x104174c04 in malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x54c04) #1 0x102ada0ec in rb_gc_impl_malloc default.c:8198 ruby#2 0x102acee44 in ruby_xmalloc gc.c:4438 ruby#3 0x102f3c85c in rb_st_init_table_with_size st.c:571 ruby#4 0x102f3c900 in rb_st_init_table st.c:600 ruby#5 0x102f3c920 in rb_st_init_numtable st.c:608 ruby#6 0x103140698 in vm_track_constant_cache vm_insnhelper.c:6337 ruby#7 0x1030b91d8 in vm_ic_track_const_chain vm_insnhelper.c:6356 ruby#8 0x1030b8cf8 in rb_vm_opt_getconstant_path vm_insnhelper.c:6424 ruby#9 0x1030bc1e0 in vm_exec_core insns.def:263 ruby#10 0x1030b55fc in rb_vm_exec vm.c:2585 ruby#11 0x1030fe0ac in rb_iseq_eval_main vm.c:2851 ruby#12 0x102a82588 in rb_ec_exec_node eval.c:281 ruby#13 0x102a81fe0 in ruby_run_node eval.c:319 ruby#14 0x1027f3db4 in rb_main main.c:43 ruby#15 0x1027f3bd4 in main main.c:68 ruby#16 0x183900270 (<unknown module>) This commit fixes this bug by adding a inserting_constant_cache_id field to the VM, which stores the ID that is currently being inserted and, in remove_from_constant_cache, we don't free the ST table for ID equal to this one. Co-Authored-By: Alan Wu <[email protected]>
byroot
pushed a commit
that referenced
this pull request
Jun 18, 2025
In commit d42b9ff, an optimization was introduced that can speed up Regexp#match by 15% when it matches with strings of different encodings. This optimization, however, does not work across ractors. To fix this, we only use the optimization if no ractors have been started. In the future, we could use atomics for the reference counting if we find it's needed and if it's more performant. The backtrace of the misbehaving native thread: ``` * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296 frame ruby#2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124 frame ruby#3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892 frame ruby#4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64 frame ruby#5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame ruby#6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5 frame ruby#7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13 frame ruby#8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14 ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5 ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20 frame ruby#9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27 frame ruby#10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial] frame ruby#11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial] frame ruby#12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12 frame ruby#13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11 frame ruby#14: 0x000000010022826c ruby`rb_str_sub ``` You can reproduce this by running: ``` RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb ``` However, you need to run it with multiple ractors at once. Co-authored-by: jhawthorn <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ruby objects use a lot of bit packing to save on memory, and also have multiple layouts, it's particularly true for Arrays which can be embedded, heap allocated, or shared.
This saves memory, but can cause a lot of execution overhead when the same piece of information is packed and unpacked many times. As functions often unpack data for themselves, but then call other sub routines that need to do the same unpacking again.
Instead if we primarily work with a "normalized" representation of the object, that we keep in sync with the actual RArray, we can save quite a bit of overhead as showcase in the benchmark.
Of course applying this to more than a couple functions is a lot of work and it's unclear whether it's worth it or not.
Also, whenever code using this yield or call into arbitrary Ruby code, we have to assume the unboxed representation may have fallen out of sync. So it may not work so well in some cases.
Profile before the change: https://share.firefox.dev/3ZIgQsw
We can see lots of
FL_*
calls, several of them repeated.Whereas after the change: https://share.firefox.dev/3Zq7nVn
Most costly
FL_*
calls have been eliminated, as most of the frequent checks have been batched in theuary_*
methods.