Skip to content

Implement unboxed_ary to avoid repeated access to packed data #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

byroot
Copy link
Owner

@byroot byroot commented Dec 5, 2024

Ruby objects use a lot of bit packing to save on memory, and also have multiple layouts, it's particularly true for Arrays which can be embedded, heap allocated, or shared.

This saves memory, but can cause a lot of execution overhead when the same piece of information is packed and unpacked many times. As functions often unpack data for themselves, but then call other sub routines that need to do the same unpacking again.

Instead if we primarily work with a "normalized" representation of the object, that we keep in sync with the actual RArray, we can save quite a bit of overhead as showcase in the benchmark.

Of course applying this to more than a couple functions is a lot of work and it's unclear whether it's worth it or not.

Also, whenever code using this yield or call into arbitrary Ruby code, we have to assume the unboxed representation may have fallen out of sync. So it may not work so well in some cases.

$ make -j benchmark ITEM="rb_ary_push" BUILT_RUBY="./miniruby" COMPARE_RUBY="./miniruby-baseline"
/opt/rubies/3.3.4/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::./miniruby-baseline -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'rb_ary_push' -o -name '*rb_ary_push*.yml' -o -name '*rb_ary_push*.rb' | sort)
compare-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info https://github.com/byroot/ruby/commit/e9407cf40626ed55e3a4642f7d7dc083a0f97225) +PRISM [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info 46757e76db) +PRISM [arm64-darwin23]
warming up..

|        |compare-ruby|built-ruby|
|:-------|-----------:|---------:|
|64*2k   |     848.804|    1.151k|
|        |           -|     1.36x|
|128*1k  |      1.651k|    2.217k|
|        |           -|     1.34x|

Profile before the change: https://share.firefox.dev/3ZIgQsw

Capture d’écran 2024-12-05 à 20 32 13

We can see lots of FL_* calls, several of them repeated.

Whereas after the change: https://share.firefox.dev/3Zq7nVn

Capture d’écran 2024-12-05 à 20 33 37

Most costly FL_* calls have been eliminated, as most of the frequent checks have been batched in the uary_* methods.

@byroot byroot force-pushed the ary-batch-info branch 3 times, most recently from 7e84d79 to 29e6af0 Compare December 5, 2024 19:37
Ruby objects use a lot of bit packing to save on memory, and also
have multiple layouts, it's particularly true for Arrays which can be
embedded, heap allocated, or shared.

This saves memory, but can cause a lot of execution overhead when
the same piece of information is packed and unpacked many times.

Instead if we primarily work with a "normalized" representation
of the object, that we keep in sync with the actual RArray, we
can save quite a bit of overhead as showcase in the benchmark.

Of course applying this to more than a couple functions is a lot
of work and it's unclear whether it's worth it or not.

```
$ make -j benchmark ITEM="rb_ary_push" BUILT_RUBY="./miniruby" COMPARE_RUBY="./miniruby-baseline"
/opt/rubies/3.3.4/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::./miniruby-baseline -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'rb_ary_push' -o -name '*rb_ary_push*.yml' -o -name '*rb_ary_push*.rb' | sort)
compare-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info e9407cf) +PRISM [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info 46757e76db) +PRISM [arm64-darwin23]
warming up..

|        |compare-ruby|built-ruby|
|:-------|-----------:|---------:|
|64*2k   |     848.804|    1.151k|
|        |           -|     1.36x|
|128*1k  |      1.651k|    2.217k|
|        |           -|     1.34x|
```
@byroot
Copy link
Owner Author

byroot commented Dec 6, 2024

I think I got bit again by make benchmark, the gains aren't that good:

/opt/rubies/3.3.4/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::./miniruby-baseline -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby --disable-gem" \
	            --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'rb_ary_push' -o -name '*rb_ary_push*.yml' -o -name '*rb_ary_push*.rb' | sort) 
compare-ruby: ruby 3.4.0dev (2024-12-05T19:11:39Z ary-batch-info e9407cf406) +PRISM [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-12-06T08:47:22Z ary-batch-info 3ac5702064) +PRISM [arm64-darwin23]
warming up..
# Iteration per second (i/s)

|        |compare-ruby|built-ruby|
|:-------|-----------:|---------:|
|64*2k   |     845.438|   949.870|
|        |           -|     1.12x|
|128*1k  |      1.697k|    1.847k|
|        |           -|     1.09x|

It's still substantial, but probably not quite enough to justify such a huge refactoring.

byroot pushed a commit that referenced this pull request Dec 30, 2024
When searching for native extensions, if the name does not end in ".so"
then we create a new string and append ".so" so it. If the native extension
is in static_ext_inits, then we could trigger a GC in the rb_filesystem_str_new_cstr.
This could cuase the GC to free lookup_name since we don't use the local
variable anymore.

This bug was caught in this ASAN build:
http://ci.rvm.jp/results/trunk_asan@ruby-sp1/5479182

    ==435614==ERROR: AddressSanitizer: use-after-poison on address 0x715a63022da0 at pc 0x5e7463873e4e bp 0x7fff383c8b00 sp 0x7fff383c82c0
    READ of size 14 at 0x715a63022da0 thread T0
        #0 0x5e7463873e4d in __asan_memcpy (/tmp/ruby/build/trunk_asan/ruby+0x214e4d) (BuildId: 607411c0626a2f66b4c20c02179b346aace20898)
        #1 0x5e7463b50a82 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10
        ruby#2 0x5e7463b50a82 in ruby_nonempty_memcpy /tmp/ruby/src/trunk_asan/include/ruby/internal/memory.h:671:16
        ruby#3 0x5e7463b50a82 in str_enc_new /tmp/ruby/src/trunk_asan/string.c:1035:9
        ruby#4 0x5e74639b97dd in search_required /tmp/ruby/src/trunk_asan/load.c:1126:21
        ruby#5 0x5e74639b97dd in require_internal /tmp/ruby/src/trunk_asan/load.c:1274:17
        ruby#6 0x5e74639b83c1 in rb_require_string_internal /tmp/ruby/src/trunk_asan/load.c:1401:22
        ruby#7 0x5e74639b83c1 in rb_require_string /tmp/ruby/src/trunk_asan/load.c:1387:12
byroot pushed a commit that referenced this pull request May 4, 2025
…uby#13231)

This change addresses the following ASAN error:

```
==36597==ERROR: AddressSanitizer: heap-use-after-free on address 0x512000396ba8 at pc 0x7fcad5cbad9f bp 0x7fff19739af0 sp 0x7fff19739ae8
  WRITE of size 8 at 0x512000396ba8 thread T0
  [643/756] 36600=optparse/test_summary
      #0 0x7fcad5cbad9e in free_fast_fallback_getaddrinfo_entry /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/raddrinfo.c:3046:22
      #1 0x7fcad5c9fb48 in fast_fallback_inetsock_cleanup /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1179:17
      ruby#2 0x7fcadf3b611a in rb_ensure /home/runner/work/ruby-dev-builder/ruby-dev-builder/eval.c:1081:5
      ruby#3 0x7fcad5c9b44b in rsock_init_inetsock /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1289:20
      ruby#4 0x7fcad5ca22b8 in tcp_init /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/tcpsocket.c:76:12
      ruby#5 0x7fcadf83ba70 in vm_call0_cfunc_with_frame /home/runner/work/ruby-dev-builder/ruby-dev-builder/./vm_eval.c:164:15
...
```

A `struct fast_fallback_getaddrinfo_shared` is shared between the main thread and two child threads.
This struct contains an array of `fast_fallback_getaddrinfo_entry`.

`fast_fallback_getaddrinfo_entry` and `fast_fallback_getaddrinfo_shared` were freed separately, and if `fast_fallback_getaddrinfo_shared` was freed first and then an attempt was made to free a `fast_fallback_getaddrinfo_entry`, a `heap-use-after-free` could occur.

This change avoids that possibility by separating the deallocation of the addrinfo memory held by `fast_fallback_getaddrinfo_entry` from the access and lifecycle of the `fast_fallback_getaddrinfo_entry` itself.
byroot pushed a commit that referenced this pull request May 9, 2025
[Bug #18119]

When we create classes, it pushes the class to the subclass list of the
superclass. This access needs to be synchronized because multiple Ractors
may be creating classes with the same superclass, which would cause race
conditions and cause the linked list to be corrupted.

For example, we can reproduce with this script crashing:

    workers = (0...8).map do
      Ractor.new do
        loop do
          100.times.map { Class.new }
          Ractor.yield nil
        end
      end
    end

    100.times { Ractor.select(*workers) }

With ASAN enabled, we can see that there are use-after-free errors:

    ==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88
    WRITE of size 8 at 0x5030000974f0 thread T4
        #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24
        #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
        ruby#6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13
        ruby#7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13
        ruby#8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9
        ruby#9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38
    ...
    0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8)
    freed by thread T4 here:
        #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9
        ruby#2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13
        ruby#3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5
        ruby#4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9
        ruby#5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
    ...
    previously allocated by thread T5 here:
        #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12
        ruby#2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5
        ruby#3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12
        ruby#4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34
        ruby#5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13
        ruby#6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38
        ruby#7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9
        ruby#8 0x62f9e56fca7a in make_metaclass class.c:786:5
        ruby#9 0x62f9e59db982 in rb_class_initialize object.c:2101:5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant