Skip to content

Latest commit

 

History

History
1840 lines (1522 loc) · 93.1 KB

3.2.md

File metadata and controls

1840 lines (1522 loc) · 93.1 KB
title prev next description
Ruby 3.2 changes
3.3
3.1
Ruby 3.2 full and annotated changelog

Ruby 3.2

  • Released at: Dec 25, 2022 (NEWS.md file)
  • Status (as of Jan 05, 2025): 3.2.3 is current stable
  • This document first published: Feb 4, 2022
  • Last change to this document: Jan 05, 2025

🇺🇦 🇺🇦 Before you start reading the changelog: A full-scale Russian invasion into my home country continues. The only reason I am alive and able to work on the changelog is Armed Force of Ukraine, and international support with weaponry, funds and information. I am in my home city Kharkiv, preparing to join the army. Please care to read two of my appeals to Ruby community before proceeding: first, second.
We need all support we can get to push inviders out and to bring peace to our land. Please spread information, lobby our cause and donate.🇺🇦 🇺🇦

The post dedicated to my war year and work on Ruby 3.2 is published on Feb 7.

Note: As already explained in Introduction, this site is dedicated to changes in the language, not the implementation, therefore the list below lacks mentions of lots of important optimization introduced in 3.2, including YJIT improvements and object shapes. That's not because they are not important, just because this site's goals are different.

In preparation of this entry, Cookpad's article on notable changes in Ruby 3.2 written by core developers Koichi Sasada (ko1) and Yusuke Endoh (mame) provided invaluable insight. Thank you!

Highlights

Language changes

Anonymous arguments passing improvements

If the method declaration includes anonymous positional or keyword arguments (* or ** without associated name), those arguments can now be passed to the next method with some_method(*) or some_method(**) syntax.

  • Reason: While it can be considered too cryptic shorcut by some, the new feature is consistent with passing of all arguments with ....
  • Discussion: Feature #18351
  • Documentation: Methods: Array/Hash Argument
  • Code:
    def only_keywords(**) # accept keyword arguments
      p(**) # and pass them to the next method
    end
    
    def only_positional(*) # accept positional arguments
      p(*) # and pass them to the next method
    end
    
    def both(*, **) # effectively the same as ...
      p(*, **)
    end
    
    only_keywords(a: 1, b: 2)
    # prints "{:a=>1, :b=>2}"
    only_positional(1, 2, 3)
    # prints
    #  1
    #  2
    #  3
    both(1, 2, 3, a: :b)
    # prints
    #   1
    #   2
    #   3
    #   {:a=>:b}
    
    # Realistic usage: a small wrapper method, just "fall through" to the next one
    def get(url, **) = send_request(:get, url, **)
    
    # Named and unnamed could be freely mixed:
    def mixed_naming(*a, **)
      p a
      p(**)
    end
    
    mixed_naming(1, 2, 3, a: :b)
    # prints:
    #  [1, 2, 3]
    #  {:a=>:b}
    
    # But using anonymous forwarding with named arguments is an error
    def forward(*args) = p(*)
    # no anonymous rest parameter (SyntaxError)
    
    # Interestingly enough, not only calling methods, but also "repacking" values
    # into variables work:
    def repack(*, **)
      x = *       # this is syntax error
      x = [*]     # but this will work and put [1, 2] in x
      x, y = [*]  # and this will work and put 1 in x and 2 in y
      z = {**}    # this will put {a: :b} into z
    end
    repack(1, 2, a: :b)
    
    # While the latter example might seem just a curiosity, it could help with
    # quick debugging of path-through code.
    # Imagine the `get` method above fails in one particular case.
    # We can adjust it this way, temporarily:
    def get(url, **)
      binding.irb if {**}.dig(:headers, :content_type) == 'application/json'
      send_request(:get, url, **)
    end
    
    # Parentheses are important for correct parsing.
    # Imagine this:
    def test(*)
      # this, depending on further code, will be either a SyntaxError,
      # or interpreted as call_something() * next_statement
      call_something *
      # ...
    end
    
    # Procs don't support anonymous arguments:
    proc { |*| p(*) }
    # Somewhat confusingly, this definitions raises:
    #   no anonymous rest parameter (SyntaxError)
    # ...meaning surrounding method doesn't have them.
    
    # And if it does, they would be used, not proc's arguments:
    def test(*)
      proc { |*| p(*) }.call(1)
    end
    
    test(2)
    # prints 2 -- even inside proc, method's arguments are used for forwarding
  • Note: Whether anonymous arguments should be supported in procs is discussed.

Constant assignment evaluation order changed

For a long time, statements like module_expression::CONST_NAME = value_expression first evaluated value_expression and then module_expression. This was changed to calculate module_expression first.

  • Reason: Just making it consistent with other assignment expressions, which tend to calculate the left part before the right.
  • Discussion: Bug #15928
  • Documentation:
  • Code:
    # synthetic demonstrational example:
    def make_a_class
      puts "Making a class"
      Class.new
    end
    
    make_a_class::CONST = 42.tap { puts "Calculating the value" }
    # Prints:
    #  In Ruby 3.1:
    #    Calculating the value
    #    Making a class
    #  In Ruby 3.2:
    #    Making a class
    #    Calculating the value
    
    # Even simpler:
    NonExistentModule::CONST = 42.tap { puts "Calculating the value" }
    # Ruby 3.1:
    #   Prints "Calculating the value"
    #   raises "uninitialized constant NonExistentModule" (NameError)
    # Ruby 3.2:
    #   just raises "uninitialized constant NonExistentModule" (NameError)
  • Note: The problem is rarely relevant, but might eventually manifest itself in complicated metaprogramming or autoloading. Or, like in the last example, some effectful value might be calculated before discovering there is nowhere to put it in. According to discussion on the tracker, the old behavior was never intentional, it was just too hard to fix.

Behavior of module reopening/redefinition with included modules changed

When some module/class name is available at the top level context from the included modules, and a new class/module is defined, previously it was considered a reopening of existing module; since Ruby 3.2, it is a creation of a new module.

  • Reason: As one file's code has no control what is included in other files (may be non-obvious to code's author), cryptic behaviors might've emerged by treating included modules as reopenable on a top level.
  • Discussion: Feature #18832
  • Documentation:
  • Code:
    require 'net/http'
    
    # ...might've happened in some of required files
    include Net
    
    p HTTP
    #=> Net::HTTP -- from included Net module
    
    # plan to define some of our app-specific HTTP services here
    module HTTP
      # ...
    end
    # Ruby 3.1: HTTP is not a module (TypeError)
    #   Because it assumes you reopening HTTP class from included Net
    #   The error is hard to understand and even harder to bypass
    # Ruby 3.2: Successfully defines a new empty module, unrelated to Net::HTTP
    
    p HTTP
    #=> HTTP -- now it is a new module,
    #           and Net::HTTP is available only by fully-qualified name

Keyword argument separation leftovers

A few edge cases after big keyword argument separation were fixed:

  • Erroneous autosplatting of positional arguments in procs. Discussion: Bug #18633
    test = proc { |arg, **keywords| p(arg:, keywords:) }
    test.call(1, 2)
    # Prints: {:arg=>1, :keywords=>{}}, 2 is lost as an extra argument, as expected
    # But...
    test.call([1, 2])
    # Ruby 3.1: prints {:arg=>1, :keywords=>{}}, extra unexpected splatting & loss of 2s
    # Ruby 3.2: prints {:arg=>[1, 2], :keywords=>{}}, as expected
  • The methods that splat arguments were fixed to consistently treat keyword arguments according to ruby2_keywords tag. Discussion: Bug #18625, Bug #16466
    def method_with_keywords(**kw) = p kw
    
    # This should never work: method is not marked with ruby2_keywords,
    # so it shouldn't ever unpack positional arguments into keyword arguments.
    def method_with_positional(*args) = method_with_keywords(*args)
    
    method_with_positional(a: 1)
    # Ruby 3.1 and 3.2: behaves as expected:
    #   wrong number of arguments (given 1, expected 0) (ArgumentError)
    
    # This should work: method is marked with ruby2_keywords, so it
    # is able to repack hash as keyword_args
    ruby2_keywords def old_method(*args) = method_with_keywords(*args)
    
    old_method(a: 1)
    # Ruby 3.1 and 3.2: behaves as expected:
    #   prints {:a => 1}
    
    # This shouldn't work, but erroneously worked in 3.1: after `bad_old_method`
    # delegated the hash, it preserved "I am Ruby 2 keywords" through other
    # methods (even if they aren't marked to be compatible).
    ruby2_keywords def bad_old_method(*args) = method_with_positional(*args)
    
    bad_old_method(a: 1)
    # Ruby 3.1: prints {:a => 1}
    # Ruby 3.2: wrong number of arguments (given 1, expected 0) (ArgumentError)

Removals

  • Constants:
    • Fixnum and Bignum (deprecated since unification into Integer in 2.4)
    • Random::DEFAULT (deprecated in favor of per-Ractor random generator since 3.0)
  • Methods:
    • Dir.exists?, File.exists? (deprecated since 2.1 as a general rule of having "bare verb" as a method name)
    • Object#=~ (deprecated since 2.6)
    • Object#taint, #untaint, #tainted?, #trust, #untrust, #untrusted? (deprecated together with a general concept of "safety" since 2.7)

Core classes and modules

Kernel#binding raises if accessed not from Ruby

binding is a method that returns "current context" (Binding) object, allowing access to local variables, self, evaluating code in that context, etc. The problem solved was that it was accessible from C methods, too, but as C methods call don't push new "execution frames," the binding returned was of the last calling Ruby method, which was useless and misleading. Since Ruby 3.2, the method raises an exceptions in such situations.

  • Discussion: Bug #18487
  • Documentation: Kernel#binding (no mention for the behavior in non-Ruby frame)
  • Code: To demonstrate the practical implications, the C code should be written, but to get the gist of what's happening, we can do this:
    # The callable binding object, that will "bind" itself to argument, and call it in the context
    binding_caller = Kernel.instance_method(:binding).method(:bind_call)
    
    binding_caller.call(nil).local_variables
    # [:binding_caller] -- it is performed in the current context
    
    # method just accepts block and just calls it
    def test1(&)
      local_val = 'test'
      yield(nil)
    end
    
    test1(&binding_caller).local_variables
    #=> [:local_val] -- we got the binding of test1, as expected
    
    def test2(&)
      local_val = 'test'
      # Expectations: we pass block further, so it will be performed
      # inside #map, and the binding was to "insides" of each argument
      [1].map(&)
    end
    
    # Reality: #map is method defined in C, it doesn't has its own
    # "context frame",
    test2(&binding_caller).first.local_variables
    # So, in Ruby 3.1:
    #   => [:local_val] -- we still received binding of the previous method
    #                      in call chain
    # In Ruby 3.2:
    #   Cannot create Binding object for non-Ruby caller (RuntimeError)
  • Note: TracePoint#binding was also adjusted for C methods, see below.

Class and Module

Class#attached_object

For singleton classes, returns the object this class is for; otherwise, raises TypeError.

  • Reason: The "what is this singleton class around of" is useful in metaprogramming, introspection and code analysis, especially when some class methods are added via class << self.
  • Discussion: Feature #12084
  • Documentation: Class#attached_object
  • Code:
    String.attached_object
    # raises `String' is not a singleton class (TypeError)
    "foo".singleton_class.attached_object
    #=> "foo"
    
    # or
    class A
      class << self
        # here we are inside singleton class
        p attached_object
        #=> A
      end
    end
    
    # Usage for advanced metaprogramming:
    
    module MyCoolPlugin
      def self.prepended(mod)
        if mod.singleton_class? && mod.attached_object < Enumerable
          mod.include MyCoolPlugin::ServicesForEnumerable
        end
      end
    end
    
    class Simple
      class << self
        # prepends only simple version of MyCoolPlugin
        prepend MyCoolPlugin
      end
    end
    
    class MyArray < Array
      class << self
        # Prepend MyCoolPlugin, and includes MyCoolPlugin::ServicesForEnumerable
        prepend MyCoolPlugin
      end
    end
    
    # Usage for documentation/code analysis purposes:
    
    require 'active_support/all'
    Time.zone #=> nil, defined by ActiveSupport
    
    # Application-specific Time extensions
    class MyTime < Time
    end
    
    m = MyTime.method(:zone)
    # => #<Method: MyTime(Time).zone() /...path to implementation>
    
    # Now, if we want to path just this method to some
    # documentation or introspection system, it has enough information
    # to tell who it belongs to
    m.owner
    #=> #<Class:Time>
    m.receiver
    #=> MyTime
    
    # But before 3.2, there was no way to programmatically go from
    # singleton class #<Class:Time>, to the regular class Time, which
    # the method is defined in, from the human point of view.
    
    # Now, there is:
    m.owner.attached_object
    # => Time
    # ...an our documentation/introspection system can properly describe
    # it as a class method of Time.
  • Note: The method will not work with "special" Ruby objects (nil, true, and false) which have their singleton_class implementations redefined to return regular class:
    nil.singleton_class #=> NilClass, not #<Class:nil>
    class << nil
      attached_object # raises `NilClass' is not a singleton class (TypeError)
    end

Module#const_added

A "hook" method, called after the constant was defined in a module.

  • Reason: The method was proposed as helpful for autoloader libraries (like zeitwerk), but it also can be useful in metaprogramming, like "store a registry of nested classes of a particular type." Or "validate that some parent constant is redefined in a particular way."
  • Discussion: Feature #17881
  • Documentation: Module#const_added
  • Code:
    module Test
      def self.const_added(name)
        puts "const_added: #{name} = #{const_get(name)}"
      end
    
      # The method is called AFTER the constant is actually
      # defined, so its value is already available.
      FOO = 1
      # Prints:
      #   const_added: FOO = 1
    
      # Each constant override triggers a method again:
      FOO = 2
      # Prints:
      #  const_added: FOO = 2
    
      # Nested class definition:
      class Nested
        puts "Class definition body"
      end
      # Prints:
      #   const_added: Nested = Test::Nested
      #   Class definition body
    
      # Note that const_added invoked at the BEGINNING of class being defined,
      # not after its body is processed.
    end
  • Note: To understand the last example—why const_added was called before class definition is fully finished—you should consider that the class/module name becomes immediately known to Ruby after its definition is opened, allowing things like this:
    module Test
      class Nested
        # outdated way of writing def self.class_method, but it works,
        # because Nested is already known name here.
        def Nested.class_method
        end
    
        # or this:
        weird_class_method
        # raises nice "undefined local variable or method `weird_class_method' for Test::Nested"
        # ...because the name `Test::Nested` is already associated with current class
      end
    end

Module#undefined_instance_methods

Lists methods that some module or class removed explicitly with undef.

  • Reason: Honestly, I have no idea. The discussion ticket doesn't clarify this either! I assume it is just for completeness, to make everything that Module can do with method definitions to be accessible programmatically. Or, might help with debuggin of some evil/buggy code that does undefining of wrong methods. --zverok
  • Discussion: Feature #12655
  • Documentation: Module#undefined_instance_methods
  • Code:
    class ImmutableArray < Array
      undef :select!, :reject! #, ... etc
    end
    
    class UserArray < ImmutableArray
      undef :map
    end
    
    ImmutableArray.undefined_instance_methods #=> [:select!, :reject!]
    UserArray.undefined_instance_methods #=> [:map] -- only methods undefined by this module, not ancestors

Refinements

There are several new methods that improve the discovery and ability to debug for complicated code when it uses refinements. Those methods are improving answers to questions like "what methods are available in the current context and why?", "what methods will be available if I'll use that module" etc.

Module#refinements

Returns list of refinements the module defines.

  • Discussion: Feature #12737
  • Documentation: Module#refinements
  • Code:
    module MathShortcuts
      refine Numeric do
        def sqrt = Math.sqrt(self)
        # ...
      end
    
      refine String do
        def calculate(binding) = "#{self} = #{eval(self, binding)}"
      end
    end
    
    module NoRefinements
    end
    
    MathShortcuts.refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>]
    NoRefinemens.refinements #=> []
    
    # Introspection: what would this refinement refine?..
    # The method also added in 3.2, see below
    MathShortcuts.refinements[0].refined_class #=> Numeric
    # Introspection: what methods would it add?
    # (false means "don't include methods defined in ancestors")
    MathShortcuts.refinements[0].instance_methods(false) #=> [:sqrt]

Refinement#refined_class

  • Discussion: Feature #12737
  • Documentation: Refinement#refined_class
  • Code: See example above that demonstrates usage of #refined_class together with Module#refinements.
  • Follow-ups: 3.3: Renamed to #target, because not only classes can be refined, modules too. 3.4: old name is removed.

Module.used_refinements

Returns instances of Refinement used in the current context.

  • Discussion: Feature #14332
  • Documentation: Module.used_refinements
  • Code:
    # See MathShortcuts module definition above
    
    class Calculator
      using MathShortcuts
    
      # Works inside refined module...
      p Module.used_refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>]
    
      def hypotenuse(c1, c2)
        # ...and inside its methods
        p Module.used_refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>]
    
        "(c1**2 + c2**2).sqrt".calculate(binding)
      end
    end
    
    # Use method with refinements, triggering all the debug print
    puts Calculator.new.hypotenuse(5, 6) #=> (c1**2 + c2**2).sqrt = 7.810249675906654
    
    # Outside of refined class, refinements are empty
    p Module.used_refinements #=> []
  • Notes:
    • Note that used_refinements is a class method of a Module, and put there just for organizational purposes, while returning refinements list of the current context. There is no way to ask arbitrary module which refinements it uses (e.g., there is no Calculator.used_refinements).
    • Just as a point of interest, the method with this name was proposed short after introducing of concept of refinements and was discussed even before 2.0 release. It eventually became Module.used_modules introduced in 2.4: that method just returned a list of modules with refinements, enabled in the current scope via using. The result of this method is not very fine-grained (as one refining method can refine many objects at once, and it is impossible to inspect which exactly and what methods were added). After introduction of the Refinement class in 3.1 it became reasonable to give easier access to particular refinements available in the current context.

Integer#ceildiv

The integer division that always rounds up.

  • Reason: There are many simple use cases like pagination (when "21 items / 10 per page" should yield "3 pages"). It seems that the method is a direct equivalent of a.fdiv(b).ceil, and as such, annoyingly unnecessary, but fdiv, due to floating point imprecision, might produce surprising results in edge cases:
    99999999999999999.fdiv(1).ceil
    # => 100000000000000000
    99999999999999999.ceildiv(1)
    # => 99999999999999999
  • Discussion: Feature #18809
  • Documentation: Integer#ceildiv
  • Code:
    9.ceildiv(3) #=> 3
    10.ceildiv(3) #=> 4
    -10.ceildiv(3) #=> -3 -- always rounds up, regardless of the sign
    # If the divisor is not integer, the result is equivalent to dividing by divisor.round
    10.ceildiv(2.1) #=> 5 -- like 10.ceildiv(2)
    10.ceildiv(2.6) #=> 4 -- like 10.ceildiv(3)
  • Note: Unlike most of other operations, #ceildiv ignores numeric coercion protocols:
    class StringNumber
      def initialize(val) = @val = val.to_s
      def coerce(other) = [other, @val.to_i]
    end
    
    10 / StringNumber.new('3') #=> 3, argument is first converted with #coerce if possible
    10.fdiv StringNumber.new('3') #=> 3.3333333333333335, same
    10.ceildiv StringNumber.new('3') # ArgumentError
    It is already fixed in the current master branch and will behave as expected at Ruby 3.2.1

Strings and regexps

Byte-oriented methods

Several method were added that operate on multibyte strings at byte-offset level, regardless of the encoding.

  • Reason: Low-level processing of strings (like networks middleware, or efficient search algorithms, or packing/unpacking) might need an ability to operate on a level of single bytes, regardless of original string's encoding. It is especially important while handling variable-length encodings like UTF-8. Before methods introduction, the only way to perform byte-level processing was to forcing string encoding into ASCII-8BIT, process, and then force encoding back.
  • Discussion: Feature #13110 (String#byteindex, String#byterindex, MatchData#byteoffset), Feature #18598 (String#bytesplice)
  • Documentation: String#byteindex, String#byterindex, String#bytesplice, MatchData#byteoffset
  • Code:
    str = 'Слава Україні'
    
    str.index('а') #=> 2, character index
    str.byteindex('а') #=> 4, byte index, because Cyrilic letters in UTF-8 take 2 bytes each
    
    str.rindex('а') #=> 9: character index of the last entrance of character 'а'
    str.byterindex('а') #=> 17: byte index
    
    match = str.match(/Слава\s+(?<name>.+)/)
    match.offset(1)     #=> [6, 13]
    match.byteoffset(1) #=> [11, 25]
    match.offset(:name)     #=> [6, 13]
    match.byteoffset(:name) #=> [11, 25]
    
    str = 'війна'
    str.bytesplice(2..5, '...') #=> "..." -- returns replacement string
    str #=> "в...на" -- original string's bytes 2-3, 4-5 (e.g. chars 1 and 2) are replaced
    
    # Unlike byteslice getter, bytesplice setter checks character boundaries:
    str = 'війна'
    str.byteslice(1..3) #=> "\xB2і" -- works, even if the slice is mid-character
    str.bytesplice(1..3, '...') # offset 1 does not land on character boundary (IndexError)
  • Fullow-ups:
    • After 3.2 release, bytesplice behavior had changed to return self instead of replacement string.
    • 3.3: parameters added to bytesplice to allow partial copy of the buffer.
    • 3.4: MatchData#bytebegin and #byteend added.

String#dedup as an alias for -"string"

The method produces frozen and deduplicated string without changing the receiver.

  • Reason: Since Ruby 2.5, -"string" produces a frozen and deduplicated copy: all instances with the same content take the same place in memory. But it is a less-known fact, that is also hard to guess from the code and quick look into the docs. At the same time, it became a useful idiom for reducing a memory footprint of long-running applications. The #dedup alias is focused on the behavior, and also more chainable than unary -.
  • Discussion: Feature #18595
  • Documentation: String#dedup
  • Code:
    protocols = %w[http https]
    domains = %w[company.com api.company.com]
    
    # if in various places of the program we constructing the same URLs many times...
    # ...there might be many similar strings sitting everywhere and taking memory
    urls = 100.times.map { protocols.sample + '://' + domains.sample }
    urls.uniq.count
    # => 4 -- we store 4 same strings again and again
    urls.map(&:object_id).uniq.count
    # => 100 -- but it is 100 different objects
    
    urls.map!(&:dedup)
    urls.map(&:object_id).uniq.count
    # => 4
    
    # The `map(&:dedup)` above could previously been written as
    urls.map!(&:-@)
    # ...which calls unary minus on arguments
    # But it is both uglier, and shows the intention worse.

Regexp.new: passing flags as a string is supported

  • Reason: most of those working with regexps are used to short flag names like /foo/i or /bar/m. At the same time, when Regexp.new is constructed dynamically, there was necessary to use numeric flags Regexp::IGNORECASE | Regexp::MULTILINE. They are are more formal (and can be thought as more obvious), but string ones are those most of the Rubyists remember.
  • Discussion: Feature #18788
  • Documentation: Regexp.new (options argument)
  • Code:
    Regexp.new('username', 'i') #=> /username/i
    # All known options work:
    Regexp.new(<<~'HTML', 'imx')
      <(\w+) .*?>
        [^<]+
      </\1>
    HTML
    #=> same as
    #  %r{<(\w+) .*?>
    #    [^<]+
    #  </\1>}imx
    
    # Unknown option raises
    Regexp.new('foo', 'g') # unknown regexp option: g (ArgumentError)
  • Notes: One quirk that might be surprising with a wrong use of the new feature is that Regexp.new treats any truthy value of unrecognized type as "ignore case". So,
    # This might erroneously thought to "work":
    Regexp.new('foo', %w[i]) #=> /foo/i, looks like array of options is also acceptable?..
    # ...but actually it is that any truthy value is treated as "ignorecase = true":
    Regexp.new('foo', %w[abc]) #=> /foo/i

Regexp: ReDoS vulnerability prevention

  • Reason: The ReDoS attack is overloading the system by providing malformed regexp or string to match. The possibility for this attack is mostly theoretical, but still reported as a security vulnerability in some contexts. New Ruby version introduces several features that might mitigate the attack (or, at least, a vulnerability report):
    • Setting explicit timeout for Regexp execution;
    • Regexp.linear_time? analysis method;
    • (CRuby-specific) Cache-based optimization: many Regexps now perform in linear time even on very long strings (at the cost of increased memory consumption);
  • Discussion: Feature #17837 (timeout), Feature #19104 (cache-based optimization), Feature #19194 (.linear_time?)
  • Documentation: Regexp.timeout, Regexp.timeout=, Regexp.new (timeout: keyword argument), Regexp.linear_time?
  • Code:
    Regexp.linear_time?(/a+$/)      #=> true
    Regexp.linear_time?(/(a+)\1*$/) #=> false, backtracking is complicated
    
    Regexp.timeout = 0.005
    # Just a demo: simple yet very ambigous regexp applied to very large string
    /(a+)\1*$/.match?('a' * 1_000_000)
    # Depending on your machine's performance, might raise:
    #   `match?': regexp match timeout (Regexp::TimeoutError)
    
    # When applied to a smaller string
    /(a+)\1*$/.match?('a' * 1_000)
    #=> true
    
    # This works, too:
    Regexp.new(/(a+)*$/, timeout: 0.005).match?('a' * 1_000_000)
    # Might raise:
    #   `match?': regexp match timeout (Regexp::TimeoutError)
  • Note: While Regexp.linear_time? is part of the official language API, its results for the same regexps might change between versions and implementations.

Time.new can parse a string

The new protocol for Time.new is introduced, that parses Time from string.

  • Reason: Before Ruby 3.2, there core class Time provided no way to to get back a Time value from any serialization, including even simple Time#inspect or #to_s. The Time.parse provided by standard library time (not core functionality, doesn't work without explicit require 'time'), and tries to parse every imaginable format, while Time.new with string is stricter.
  • Discussion: Feature #18033
  • Documentation: Time.new
  • Code:
    Time.new('2023-01-29 00:29:30')
    # => 2023-01-29 00:29:30 +0200
    
    # Desired timezone can be provided as part of a string:
    Time.new('2023-01-29 00:29:30 +08:00')
    #=> 2023-01-29 00:29:30 +0800
    # ...or like with other .new protocols, as a separate in: argument:
    Time.new('2023-01-29 00:29:30', in: '+08:00')
    #=> 2023-01-29 00:29:30 +0800
    
    # The accepted format is much stricter than Time.parse:
    require 'time'
    Time.parse('Jan 29, 2023')
    #=> 2023-01-29 00:00:00 +0200
    Time.new('Jan 29, 2023')
    # in `initialize': can't parse: "Jan 29, 2023" (ArgumentError)
    
    # Even incomplete time is considered an error (but see Notes below):
    Time.new('2023-01-29 00:29')
    # in `initialize': missing sec part: 00:29 (ArgumentError)
  • Notes:
    • A few improvements are planned to be made to the parser strictness and robustness in 3.2.1 (see Bug #19296, Bug #19293), for example:
    # This works, but is considered a bug, the method should allow
    # only fully-specified time
    Time.new("2023-01-29")
    #=> 2023-01-29 00:00:00 +0200
    • Time.new('2023') works, too, but it is a feature that worked before (force-conversion of singular year argument to integer), see Bug #19293. It will probably be deprecated, but can't be quickly removed due to backward compatibility.
  • Follow-ups:: 3.3: Time.new became stricting, accepting only fully-specified date-time.

Struct and Data

Struct can be initialized by keyword arguments by default

The default behavior of Struct since 3.2 is to accept both positional and keyword arguments in constructor.

  • Reason: Since introduction of Struct.new(<members>, keyword_init: true) in 2.5, it was frequently criticized as clumsy
  • Discussion: Feature #16806
  • Documentation: Struct.new
  • Code:
    User = Struct.new(:id, :name)
    # This works:
    User.new(1, 'Joan') #=> #<struct User id=1, name="Joan">
    # Since 3.2, this works too:
    User.new(id: 1, name: 'Joan') #=> #<struct User id=1, name="Joan">
    
    # keyword_arguments: true/false still can be provided to make the behavior stricter:
    
    User = Struct.new(:id, :name, keyword_init: true)
    User.new(id: 1, name: 'Joan') #=> #<struct User id=1, name="Joan">
    User.new(1, 'Joan')
    # in `initialize': wrong number of arguments (given 2, expected 0) (ArgumentError)
    
    User = Struct.new(:id, :name, keyword_init: false)
    User.new(1, 'Joan') #=> #<struct User id=1, name="Joan">
    User.new(id: 1, name: 'Joan')
    # => #<struct User id={:id=>1, :name=>"Joan"}, name=nil>
    # Note it is not ArgumentError, but interpreting all keyword args as one positional hash
  • Notes:
    • The incompatibility might be introduced by code that expected singular hash as an argument for a Struct initialization:
      Wrapper = Struct.new(:json_data)
      Wrapper.new(user: {name: 'Joan'})
      # Ruby 3.0: works
      #   #<struct Wrapper json_data={:user=>{:name=>"Joan"}}>
      # Ruby 3.1: warns, yet works
      #   warning: Passing only keyword arguments to Struct#initialize will behave differently from Ruby 3.2. Please use a Hash literal like .new({k: v}) instead of .new(k: v).
      #   #<struct Wrapper json_data={:user=>{:name=>"Joan"}}>
      # Ruby 3.2: breaks
      #   in `initialize': unknown keywords: user (ArgumentError)
      
      # Fixed by explicitly setting `keyword_init: false` in struct definition:
      Wrapper = Struct.new(:json_data, keyword_init: false)
      Wrapper.new(user: {name: 'Joan'})
      # => #<struct Wrapper json_data={:user=>{:name=>"Joan"}}>
      # ...on Ruby 2.5-3.2, without any warnings
      
      # or, alternatively, as always wrapping hashes in {} explicitly,
      # as 3.1's warning suggested:
      Wrapper.new({user: {name: 'Joan'}})
      # => #<struct Wrapper json_data={:user=>{:name=>"Joan"}}>
    • While the new behavior is convenient, one should be especially careful when redefining #initialize for Structs to not break it:
      User = Struct.new(:id, :new) do
        # suppose we want to convert id to Integer before initializing.
        # Note that it could be in `args.first`, or in `kwargs[:id]` now, so it is either this:
        def initialize(*args, **kwargs)
          if !args.empty?
            args[0] = args[0].to_i
          elsif kwargs.key?(:id)
            kwargs[:id] = kwargs[:id].to_i
          end
          super(*args, **kwargs)
        end
      
        # or just post-processing...
        def initialize(...)
          super(...)
          self.id = self.id.to_i
        end
      end

Data: new immutable value object class

A new class for containing value objects: it is somewhat similar to Struct (and reuses some of the implementation internally), but is intended to be immutable, and have more modern and cleaner API.

  • Reason: Before 3.2, Struct was an ubiquitous data holder class in Ruby, but being designed a long time ago, it has its drawbacks, making it not suitable for all situations: it is mutable by design (have argument setters), and have APIs of both "value-alike" and "container-alike" types. But there is a lot of code using Struct in various ways (and for good reasons), so it can't be just redesigned. Several approaches was considered (including adding a "configure" API to Struct, allowing to specify "should it be mutable, should it be iterable, should it be hash-alike"), but in the end, a new class with smaller and stricter API was designed.
  • Discussion: Feature #16122
  • Documentation: Data
  • Code: Data is completely new, well-documented class. So we wouldn't try to demonstrate all details of its behavior, just give a brief overview.
    Point = Data.define(:x, :y)
    
    # Both positional and keyword arguments can be used
    p1 = Point.new(1, 0)        #=> #<data Point x=1, y=0>
    p2 = Point.new(x: 0, y: 1)  #=> #<data Point x=0, y=1>
    
    # all arguments are mandatory
    Point.new(1) # missing keyword: :y (ArgumentError)
    
    # #initialize might be redefined to provide default arguments or argument conversions
    Point3D = Data.define(:x, :y, :z) do
      def initialize(x:, y:, z: 0) = super
    end
    
    Point3D.new(x: 1, y: 2)
    # => #<data Point3D x=1, y=2, z=0>
    
    # the redefinition above is enough to handle keyword AND position arguments:
    Point3D.new(1, 2)
    # => #<data Point3D x=1, y=2, z=0>
    
    # there is no setters or any other way to change already created object
    p1.x = 5 # undefined method `x=' for #<data Point x=1, y=0> (NoMethodError)
    p1.instance_variable_set('@z', 100) # can't modify frozen Point: #<data Point x=1, y=0> (FrozenError)
    
    # #with method can be used to construct new instances,
    # replacing only parts of the data:
    p1.with(y: 100) #=> #<data Point x=1, y=100>
  • Notes:
    • The class with the same name (Data) existed before for internal purposes—as a recommended empty base class for classes defined in C extensions. It was deprecated since Ruby 2.5, and removed in Ruby 3.0.
    • On Data immutability: note that only the Data-derived object itself is frozen, but there is no deep freezing of instance variables. So this is still possible (and up to user code to prevent, if undesirable):
      Result = Data.define(:array)
      res = Result.new([1, 2, 3])
      res.instance_variable_set('@size', 3) #=> can't modify frozen Result, as expected
      # but...
      res.array << 4 # works
      res
      #=> #<data Result array=[1, 2, 3, 4]>
      
      # Can shoot yourself in the foot in code doing something like...
      case res
      in Result(array:) # unpack into local variable
        array.reverse!    # process it inplace, considering it independent local variable...
        # ...pass processed somewhere else...
      end
      
      # ...but actually data WAS changed:
      res
      #=> #<data Result array=[4, 3, 2, 1]>
  • Follow-ups:
    • #with method in Ruby 3.2.0 was naive and just copies all old and new attributes to the new instance, without invoking any custom initialization methods. It was fixed to call #initialize in 3.2.2:
      Point = Data.define(:x, :y) do
        def initialize(x:, y:) = super(x: x.to_i, y: y.to_i)
      end
      
      p = Point.new('1', '2')
      # => #<data Point x=1, y=2> -- conversion performed through #initialize
      p.with(y: '3')
      # => #<data Point x=1, y="3"> -- #initialize is bypassed
      
      # Probably since Ruby 3.2.1:
      p.with(y: '3')
      # => #<data Point x=1, y=3>

Pattern matching

  • "Find pattern" value in [*, pattern, *] is no longer experimental. Feature #18585

MatchData: added #deconstruct and #deconstruct_keys

As a part of the effort to make core classes more pattern matching friendly, MatchData (the result of regexp matching) now can be deconstructed.

  • Discussion: Feature #18821
  • Documentation: MatchData#deconstruct, MatchData#deconstruct_keys
  • Code:
    case connection_string.match(%r{postgres://(\w+):(\w+)@(.+)})
    in 'admin', password, server
      # do connection with admin rights
    in ^DEV_USERS, _, 'dev-server.local'
      # connect to dev server with any password
    in user, password, server
      # do regular connection
    end
    
    # Might be used just for quick and expressive unpacking of match results
    connection_string = 'postgres://admin:[email protected]'
    connection_string.match(%r{postgres://(\w+):(\w+)@(.+)}) => user, password, server
    user     #=> "admin"
    password #=> "secret"
    server   #=> "foo.amazonaws.com"
    
    # When named capture group is used, MatchData also provides hash unpacking:
    connection_string.match(%r{postgres://(?<user>\w+):(?<password>\w+)@(?<server>.+)}) => user:, password:, server:
    user     #=> "admin"
    password #=> "secret"
    server   #=> "foo.amazonaws.com"
#### `Time#deconstruct_keys`[](#timedeconstruct_keys)

`Time` now can be used in pattern matching too.

* **Discussion:** <a class="tracker feature" href="https://bugs.ruby-lang.org/issues/19071">Feature #19071</a>
* **Documentation:** <a class="ruby-doc" href="https://docs.ruby-lang.org/en/3.2/Time.html#method-i-deconstruct_keys"><code>Time#deconstruct_keys</code></a>
* **Code:**
  ```ruby
  # `deconstruct_keys(nil)` shows all available keys:
  Time.now.deconstruct_keys(nil)
  # => {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0, :hour=>17, :min=>5, :sec=>56, :subsec=>(148452241/200000000), :dst=>false, :zone=>"EET"}

  # Usage in pattern-matching:
  case timestamp
  in year: ...2022
    puts "Far past!"
  in year: 2022, month: 1..3
    puts "Last year's first quarter"
  in year: 2023, month:, day:
    puts "#{day} of #{month}th month!"
  # ...
  end

  # Check if it is the first Thursday of the current month:
  if Time.now in wday: 4, day: ..7
    # ...
  • Notes:
    • It was decided that #deconstruct method for Time doesn't make much sense, because the reasonable order for all of the time components is hard to define.
    • Standard library classes Date and DateTime also receive similar implementations (Date#deconstruct_keys, DateTime#deconstruct_keys):
      require 'date'
      
      Date.today.deconstruct_keys(nil)
      #=> {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0}
      DateTime.now.deconstruct_keys(nil)
      # => {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0, :hour=>17, :min=>19, :sec=>15, :sec_fraction=>(478525469/500000000), :zone=>"+02:00"}

Enumerables and collections

Enumerator.product

Generates an enumerator from several other, yielding all possible combinations of their elements.

  • Discussion: Feature #18685
  • Documentation: Enumerator.product, Enumerator::Product
  • Code:
    enumerator = Enumerator.product(1.., %w[test me])
    # => #<Enumerator::Product: ...>
    
    enumerator.take(6)
    # => [[1, "test"], [1, "me"], [2, "test"], [2, "me"], [3, "test"], [3, "me"]]
    
    # The arguments can be any object responding to `each_entry`,
    # not necessary enumerator/enumerable
    class ThreeBears
      def each_entry
        yield 'Papa Bear'
        yield 'Mama Bear'
        yield 'Little Bear'
      end
    end
    Enumerator.product([1, 2], ThreeBears.new).to_a
    # => [[1, "Papa Bear"], [1, "Mama Bear"], [1, "Little Bear"],
    #     [2, "Papa Bear"], [2, "Mama Bear"], [2, "Little Bear"]]
  • Notes:
    • It is currently discussed that protocol for Enumerator.product is unlike Array#product (which is a method of the first argument of the expression).
    • If one of the enumerators is effectful (can be iterated through only once), the current implementation would exhaust it on the first go:
      require 'stringio'
      
      # This will work as expected
      io = StringIO.new('abc')
      Enumerator.product(io.each_char, [1, 2, 3]).to_a
      # => [["a", 1], ["a", 2], ["a", 3], ["b", 1], ["b", 2], ["b", 3], ["c", 1], ["c", 2], ["c", 3]]
      
      # But this will produce less data than the full cross-product
      # This will work as expected
      io = StringIO.new('abc')
      Enumerator.product([1, 2, 3], io.each_char).to_a
      #=> [[1, "a"], [1, "b"], [1, "c"]]
    This is probably a bug.

Hash#shift always returns nil if the hash is empty

There was a bug/inconsistency with returning the default value if it is defined.

  • Discussion: Bug #16908
  • Documentation: Hash#shift
  • Code:
    h = {a: 1}
    h.shift #=> [:a, 1]
    h.shift #=> nil, as expected
    # but if the default for hash is defined...
    h.default = :foo
    h.shift
    # 3.1: => :foo -- hard to explain, it isn't even [key, value] pair
    # 3.2: => nil

Set became a built-in class

Previously a part of standard library, Set (a collection of unique elements) was promoted to core class. No more need to require 'set' to use the class.

  • Discussion: Feature #16989
  • Documentation: Set (still mentions require 'set', though)
  • Notes: As of 3.2, the only change is making the library auto-required without changing the implementation. Set is still not as integrated in Ruby as other collections, like Hash and Array: Set is implemented in Ruby, uses Hash as its internal storage (by creating a hash of set_element => true pairs), and doesn't have its own literal. There are distant plans to improve it, but with no particular schedule.

Thread::Queue: timeouts for pop and push

timeout: <number> parameter was added to methods Queue#pop, SizedQueue#push, SizedQueue#pop.

  • Reason: As thread queue is meant as a method of inter-thread communication, it is useful to provide a way for not hung a thread forever while waiting for input from other thread (or waiting for place in queue in case of SizedQueue#push)
  • Discussion: Feature #18774, Feature #18944
  • Documentation: Thread::Queue#pop, Thread::SizedQueue#pop, Thread::SizedQueue#push
  • Code:
    queue = Thread::Queue.new
    sender = Thread.new do
      queue.push(1)
      queue.push(2)
    end
    
    # Expects 3 values from sender
    receiver = Thread.new do
      # This will print 1, 2, and then make receiver sleep forever:
      #   3.times.each { p queue.pop }
    
      # But this prints 1, 2, waits for 0.5 seconds and then prints `nil`
      3.times.map { p queue.pop(timeout: 0.5) }
    end
    [sender, receiver].each(&:join)
    
    sized = Thread::SizedQueue.new(2)
    sized.push(1, timeout: 0.5) #=> success, returns the queue object
    sized.push(2, timeout: 0.5) #=> success, returns the queue object
    sized.push(3, timeout: 0.5) #=> waits 0.5 seconds, returns nil
    sized.size #=> 2, only 1 and 2 were pushed successfully

Procs and methods

Proc#dup returns an instance of subclass

  • Reason: Just for consistency with other core classes behavior.
  • Discussion: Bug #17545
  • Documentation:
  • Code:
    class MyProc < Proc
      # some additional custom methods...
    end
    
    MyProc.new { }.dup
    # 3.1: => #<Proc:...>
    # 3.2: => #<MyProc:...>
  • Notes:
    • In general, inheriting from core classes is a questionable practice, and you probably should avoid it;
    • Despite producing an instance of a subclass now, #dup doesn't call #initialize_dup constructor, so custom data that you've associated with a subclass instance can't be preserved:
      class TaggedProc < Proc
        attr_reader :tag
      
        def initialize(tag, &block)
          @tag = tag
          super(&block)
        end
      
        def initialize_dup(other) # this will NOT be invoked
          @tag = other.tag
          super
        end
      end
      
      t = TaggedProc.new('test') { }
      t.tag #=> 'test'
      t.dup.tag #=> nil
    This is a bug.
  • Follow-ups: 3.3: #dup properly invokes #initialize_dup.

Proc#parameters: new keyword argument lambda: true/false

parameters(lambda: true) returns Proc parameters description as if the proc was lambda (e.g. the parameters without defaults was mandatory), regardless of Proc's real "lambdiness."

  • Reason: The regular (non-lambda) proc always reports its positional arguments is optional. It corresponds to its behavior, but loses the information which of them have default values defined. It might be inconvenient when using procs in metaprogramming, like building wrapper objects, or defining methods based on procs.
  • Discussion: Feature #15357
  • Documentation: Proc#parameters
  • Code:
    prc = proc { |x, y=0| p(x:, y:) }
    prc.parameters
    # => [[:opt, :x], [:opt, :y]] -- for proc, all parameters are optional
    # Whih corresponds to how it actually behaves: all params can be skipped:
    prc.call
    #=> {:x=>nil, :y=>0}
    
    prc.parameters(lambda: true)
    # => [[:req, :x], [:opt, :y]] -- in stricter lambda protocol, first parameter is required
    # Which corresponds to how the corresponding lambda would treat
    # its parameters:
    lambda { |x, y=0| p(x:, y:) }.call
    # wrong number of arguments (given 0, expected 1..2) (ArgumentError)
    
    # The `lambda: false` call works, too, although arguably less useful:
    l = ->(x, y=0) { }
    l.parameters
    # => [[:req, :x], [:opt, :y]]
    l.parameters(lambda: false)
    # => [[:opt, :x], [:opt, :y]]

Method#public?, #protected?, and #private? are removed

Predicates to check method visibility added in Ruby 3.1 were reverted.

  • Reason: The new feature implementation have led to several bugs with Method class behavior; while investigating the root cause for those bugs, Matz have decided that method's visibility is not its inherent property, but rather a property of the module/object that owns the method, and as such, is already present in form of Module#{private,public,protected}_instance_methods and Object#{private,public,protected}_methods
  • Discussion: Feature #11689#note-24
  • Notes: The discussion whether the feature should be un-reverted is still ongoing!

UnboundMethod: more consistent reporting on what module it belongs to

Since 3.2, UnboundMethod's #inspect and comparison with other UnboundMethod instances only considers the module it is defined in, not the actual module it was unbound from.

  • Reason: The change just aligns auxiliary methods with the main UnboundMethod implementation. No usage of unbound method is affected by what was the original class or object it was unbound from, only by the place of definition.
  • Discussion: Feature #18798
  • Affected methods: UnboundMethod#==, #inspect (documentation not updated)
  • Code:
    tally = Array.instance_method(:tally)
    p tally
    # 3.1: => #<UnboundMethod: Array(Enumerable)#tally(*)>
    # 3.2: => #<UnboundMethod: Enumerable#tally(*)>
    # The former reports "it was defined in Enumerable, but unbound from Array"
    
    orig_tally = Enumerable.instance_method(:tally)
    tally == orig_tally
    # 3.1: false -- because it was unbound from different class
    # 3.2: true
    
    # In reality, both are the same, and can be rebound to any class including Enumerable:
    orig_tally.bind("test".each_char).call
    # => {"t"=>2, "e"=>1, "s"=>1}
    tally.bind("test".each_char).call
    # => {"t"=>2, "e"=>1, "s"=>1} -- on 3.1, this worked, even if tally was "unbound from Array"
    
    # Therefore, reporting `tally` as belonging to Array and unequal to `orig_tally` was misleading
  • Note: While it might seem like a weird unnecessary quirk, unbinding methods and then rebinding them to different objects is useful metaprogramming technique when redefining some core methods to preserve and reuse the initial implementation.

IO and network

IO: support for timeouts for blocking IO

IO#timeout getter and setter were added to the base class, and are respected on blocking operations.

  • Discussion: Feature #18630
  • Documentation: IO#timeout, IO#timeout=
  • Code:
    STDIN.timeout = 5
    print "Tell me what: "
    answer = gets
    # If you didn't print anything for 5 seconds, this raises:
    # in `gets': Blocking operation timed out! (IO::TimeoutError)
    
    STDIN.timeout = nil # to remove the timeout
    answer = gets
    # will wait till input appears or process will be killed
    
    STDIN.timeout = 0
    answer = gets
    # Will raise IO::TimeoutError immediately,
    # useful for quick "take something from input buffer if it isn't empty"
  • Note: IO#timeout in general affects reading and writing operations (including network ones, defined on Socket). Operations like IO.open and IO#close are not affected.

IO#path

Any IO object can be constructed with additional argument path:, which will be available as a path attribute.

  • Reason: IO object could be created from low-level file descriptor (for example, returned by some C extension), but there was no way to specify it corresponds to some specific filesystem path.
  • Discussion: Feature #19036
  • Documentation: IO#Open options, IO#path
  • Code:
    # Always worked:
    f = File.open('README.md')
    f.path #=> 'README.md'
    
    # IO created from system-level file descriptor (which might've been returned by a C library)
    io = IO.new(f.fileno)
    # => #<IO:fd 5>
    io.path
    # 3.1: NoMethodError (undefined method `path')
    # 3.2: => nil
    
    # IO can't guess file path from the descriptor, but path can be provided explicitly:
    io = IO.new(f.fileno, path: 'README.md')
    # => #<IO:README.md>
    io.path
    # => "README.md"
    
    # One generalization of the new feature was to introspection of standard IO streams:
    STDOUT.path
    # 3.1: NoMethodError (undefined method `path')
    # 3.2: => "<STDOUT>"

Exceptions

Exception#detailed_message

The method can be redefined for providing custom "decoration" of exception messages, without redefining the main #message.

  • Reason: Standard libraries like did_you_mean (adds "did you mean other name" to NoMethodError) or error_highlight (printing of failed part of code and highlighting the problematic part) previously adjusted Exception#message method. It might not always be convenient: say, if an application wants to benefit from those gems, but also need to report "clear" error messages to a monitoring system, it required workaround. Starting from Ruby 3.2, there is a clear distinction:
    • #message is an original message with which the exception was raised;
    • #detailed_message might be redefined by some libraries or user's code for convenience and better reporting (most probably);
    • #full_message (introduced in 2.5) is what the interpreter prints: detailed message + error backtrace.
  • Discussion: Feature #18564
  • Documentation: Exception#detailed_message
  • Code:
    # Default implementation:
    begin
      raise RuntimeError, 'test'
    rescue => e
      puts e.message
      # test
    
      puts e.detailed_message # adds error class
      # test (RuntimeError)
    
      puts e.full_message # adds backtrace
      # test.rb:3:in `<main>': test (RuntimeError)
    end
    
    # NoMethodError employs did_you_mean to lookup for the right name,
    # and error_highight to show where exactly the error happened:
    begin
      'foo'.lenthg
    rescue => e
      puts e.message
      # undefined method `lenthg' for "foo":String
    
      puts e.detailed_message # class name + highlighted part of code + "Did you mean?"
      # undefined method `lenthg' for "foo":String (NoMethodError)
      #
      #   'foo'.lenthg
      #        ^^^^^^^
      # Did you mean?  length
    
      puts e.full_message # all of the above + "where it happened"
      # test.rb:2:in `<main>': undefined method `lenthg' for "foo":String (NoMethodError)
      #
      #   'foo'.lenthg
      #        ^^^^^^^
      # Did you mean?  length
    end
    
    # Implement the custom one:
    class LoadError
      def detailed_message(highlight: false, **)
        res = super # invoke the default implementation which will produce message + class name
        return res unless path.start_with?('vendor/')
    
        # Provide custom value. Ideally, the code should consider to add some
        # markup with escape codes for expressiveness if `highlight: true` is passed
        res + "\n"\
              "  Vendor library `#{path.delete_prefix('vendor/')}' not loaded\n"\
              "  Check our instructions in VENDOR.md"
      end
    end
    
    require 'vendor/tricky'
    # This will now raise an error which would be printed as...
    #
    # in `require': cannot load such file -- vendor/tricky (LoadError)
    #   Vendor library `tricky' not loaded
    #   Check our instructions in VENDOR.md

SyntaxError#path

Returns the path of where the error have happened.

  • Reason: The feature was introduced by request of SyntaxSuggest new core library. It makes post-processing of SyntaxError easier for this and third-party libraries, say, when it is necessary to analyze the code that errored.
  • Discussion: Feature #19138
  • Documentation: SyntaxError
  • Code:
    # Consider there is 'test.rb' such that:
    x = 5
    y = 6
    z = x**
    #----
    
    begin
      load 'test.rb'
    rescue SyntaxError => e
      p e #=> #<SyntaxError:"tmp/test.rb:3: syntax error, unexpected end-of-input\n  z = x**\n         ^\n">
      puts e.path #=> test.rb
    end
  • Note: As of 3.2, there is no way to set path (unlike other additional exception data like KeyError#key that can be set in #initialize). As SyntaxError is mostly meant to be generated by the Ruby parser and not by custom code, that might not be a big problem.

Concurrency

Fiber storage

Per-fiber hash-alike storage interface is introduced. It can be set up on Fiber creation, and accessed as a whole via #storage accessors, or key-by-key with Fiber[] accessors. By default, it is inherited on fiber creation, but can be overridden.

  • Reason: The official explanation from NEWS is the best: "You should generally consider Fiber storage for any state which you want to be shared implicitly between all fibers and threads created in a given context, e.g. a connection pool, a request id, a logger level, environment variables, configuration, etc."
  • Discussion: Feature #19078
  • Documentation: Fiber.[], Fiber.[]=, Fiber#storage, Fiber#storage=, Fiber.new
  • Code:
    Fiber[:user] = 'admin'
    Fiber[:user] #=> "admin"
    Fiber.current.storage #=> {user: 'admin'}
    # This will have no effect, storage returns a copy of internal storage
    Fiber.current.storage[:user] = 'John'
    # Still the same:
    Fiber[:user] #=> "admin"
    
    Fiber.current.storage = {user: 'Jane'}
    # warning: Fiber#storage= is experimental and may be removed in the future!
    Fiber[:user] #=> "Jane"
    
    # Cleaning up the storage
    Fiber.current.storage = nil
    Fiber.current.storage #=> {}
    
    Fiber[:user] = 'admin'
    f = Fiber.new { puts Fiber[:user] }
    f.resume # prints "admin", by default the storage is inherited
    f.storage
    # raises "Fiber storage can only be accessed from the Fiber it belongs to" (ArgumentError)
    
    # The storage can be overwritten on creation:
    Fiber.new(storage: {user: 'Jane'}) { puts Fiber[:user] }.resume
    # prints "Jane"
    # or...
    Fiber.new(storage: nil) { puts Fiber[:user] }.resume
    # prints empty string
    
    # The same as default: inherit from the creating fiber:
    Fiber.new(storage: true) { puts Fiber[:user] }.resume
    # prints "admin"
    
    # Even if inherited, fiber storage is isolated between fibers:
    f = Fiber.new {
      puts Fiber[:user]
      Fiber[:user] = 'Amy'
    }
    Fiber[:user] = 'Jane'
    f.resume
    # prints "admin" from fiber, change in the main fiber didn't affect inherited
    puts Fiber[:user]
    # prints "Jane", change in inherited fiber didn't affect the main one
  • Notes:
    • Only Fiber#storage= is considered experimental; the rest of API is considered stable;
    • There is an API discrepancy, currently discussed, between Fiber[] (which is class method, reading/writing current fiber's storage) and Fiber.current.storage (instance method, but available only on class instance);

Fiber::Scheduler#io_select

Implements non-blocking IO.select

  • Discussion: Feature #19060
  • Documentation: Fiber::Scheduler#io_select
  • Notes:
    • See code examples in 3.0 changelog for general demo of using Fiber Scheduler. As no simple implementation is available, it is complicated to show an example of new hooks in play.
    • Just to remind: Ruby does not include the default implementation of Fiber Scheduler, but the maintainer of the feature, Samuel Williams, provides one in his gem Async which is Ruby 3.2-compatible already.

Internals

Thread.each_caller_location

A way for enumerating backtrace entries without instantiating them all.

  • Reason: There are may contexts when only a small chunk of the backtrace is necessary, but to find this chunk, the whole backtrace needs to be materalized with #caller_locations. For example, consider "send to monitoring system the first line in the app/ that called this (library) query code." In large apps under high load, the call stack might be really large, and cost of its materialization into Ruby objects on frequent calls might be significant. The new method allows to go through stack frames one by one, and break as soon as the necessary one(s) is reached.
  • Discussion: Feature #16663
  • Documentation: Thread.each_caller_location
  • Code:
    # test.rb
    def inner
      Thread.each_caller_location {
        p [_1, _1.class]
      }
    end
    
    def outer
      inner
    end
    
    outer
    # prints:
    #   ["test.rb:8:in `outer'", Thread::Backtrace::Location]
    #   ["test.rb:11:in `<main>'", Thread::Backtrace::Location]
    
    # More realistic usage:
    def method_to_debug
      # ...
      app_frame = nil
      Thread.each_caller_location {
        if _1.path.match?('/app')
          app_frame = _1
          break
        end
      }
      Monitoring.notify "Method was invoked by #{app_frame}"
      # ...
    end
  • Notes:
    • Note that while each item is printed as a regular string, they are actually instances of a utility class Thread::Backtrace::Location.
    • The method intentionally doesn't have a block-less version (which should've returned Enumerator as Enumerable's method like #each or #map do): this would defy the point of efficient backtrace analysis at the current frame, adding more frames of Enumerable/Enumerator implementations;
    • For the reason of efficiency, each_caller_location returns nothing (again, to avoid materializing unnecessary objects), so if the goal is to find one location, as in example above, or select some part of the call stack, the only way to do it is non-idiomatic code:
      lib = []
      
      # Goal: take the first caller locations while they are inside our app's lib/ folder:
      Thread.each_caller_locaton {
        lib << _1
        break unless _1.start_with?('lib/')
      }

GC.latest_gc_info: add need_major_by: key

  • Reason: The information (whether the next garbage collection would be minor or major) might be useful for highload systems, where it might make sense to trigger garbage-collection preemptively if the next one would be major, before entering the performance-critical part of the code.
  • Discussion: GH-6791
  • Documentation: GC.latest_gc_info (the possible keys aren't documented)
  • Code:
    GC.latest_gc_info
    # 3.1:
    #   => {:major_by=>nil, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false, :state=>:sweeping}
    # 3.2:
    #   => {:major_by=>nil, :need_major_by=>nil, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false, :state=>:none}
    
    # Or:
    GC.latest_gc_info(:need_major_by) #=> nil
  • Notes: The author of this changelog is not a GC expert, and the matter is not very well documented, so I only can say that the possible values (besides nil), according to feature's code, are :nofree, :oldgen, :shady, :force, and they are the same as major_by: possible values. As far as I can guess, major_by: describes the latest major GC method, while need_major_by: describes the upcoming one; if it is nil, the major GC is not upcoming probably?..

ObjectSpace: dumping object shapes

Object Shapes is a large and interesting new internal object structuring approach which we (being focused on language API) wouldn't explain here. The explanation and discussion can be found at Feature #18776. The only way the Ruby-level API is affected by the change is a new parameter for ObjectSpace.dump_all method, that allows to dump shapes defined so far.

  • Discussion: GH-6868
  • Documentation: ObjectSpace#dump_all (docs not fully updated, though)
  • Code:
    require 'objspace'
    
    # To only output what would be put int ObjectSpace since this point
    gc_generation = GC.count
    since_id = RubyVM.stat(:next_shape_id)
    
    # New shapes are defined when instance vars for objects are set, so let's make one!
    class User
      def initialize(id, name)
        @id = id
        @name = name
      end
    end
    
    User.new(1, 'Yuki')
    
    ObjectSpace.dump_all(output: :stdout, since: gc_generation, shapes: since_id)
    # {"address":"0x7f6490e00da0", "type":"SHAPE", "id":237, "parent_id":5, "depth":3, "shape_type":"IVAR","edge_name":"@id", "edges":1, "memsize":120}
    # {"address":"0x7f6490e00dc0", "type":"SHAPE", "id":238, "parent_id":237, "depth":4, "shape_type":"IVAR","edge_name":"@name", "edges":0, "memsize":32}
    This reads: setting @id creates a new shape (with "id":237), and setting @name creates the next one, inherited from from that ("id":238, "parent_id":237). To understand the deep meaning and consequences of this behavior, though, we'll refer to the original discussion.

TracePoint#binding returns nil for c_call/c_return

  • Reason: See Kernel#binding explanations above: C methods don't have their own binding, so before Ruby 3.2, TracePoint#binding for their call confusingly returned the binding of the first Ruby caller in the call stack.
  • Discussion: Bug #18487
  • Documentation: TracePoint#binding (still has docs for old behavior, though)
  • Code:
    TracePoint.new(:c_call) do |tp|
      p [tp.method_id, tp.binding, tp.binding&.local_variables]
    end.enable {
      x = [5]
      x.map { }
    }
    # In Ruby 3.1, this prints:
    #   [:map, #<Binding:0x00007fbfb9b36d40>, [:x]] -- so, we have a binding of surrounding block, not insides of `map`
    # In Ruby 3.2:
    #   [:map, nil, nil]

TracePoint for block default to trace the current thread

  • Reason: In block form, the intention of the developer is to trace what's happening in the specified block. In complicated applications, though, other threads might work at the same time and pollute the tracing with unrelated occurrences.
  • Discussion: Bug #16889
  • Documentation: TracePoint#enable.
  • Code:
    def test = nil
    
    other = Thread.start {
      sleep(0.1) # to give TracePoint time to start
      test
    }
    
    Thread.current.name = 'main'
    other.name = 'other'
    
    # Note: each example below needs to restart the "other" thread.
    
    TracePoint.new(:call) do |tp|
      puts "Called from #{Thread.current}" if tp.method_id == :test
    end.enable do
      test
      other.join
    end
    # Ruby 3.1:
    #   Called from #<Thread:...@main run>
    #   Called from #<Thread:...@other run>
    # Ruby 3.2:
    #   Called from #<Thread:...@main run>
    
    # The desired thread to trace can be specified explicitly:
    TracePoint.new(:c_call) do |tp|
      puts "Called from #{Thread.current}" if tp.method_id == :size
    end.enable(target_thread: other) do
      test
      other.join
    end
    # Ruby 3.1 and 3.2:
    #   Called from #<Thread:...@other run>
    
    # Only block form is affected:
    tp = TracePoint.new(:c_call) do |tp|
      puts "Called from #{Thread.current}" if tp.method_id == :size
    end
    tp.enable
    
    test
    other.join
    # Ruby 3.1 & 3.2:
    #   Called from #<Thread:...@main run>
    #   Called from #<Thread:...@other run>
    
    # If target for tracing is explicitly specified, all threads are traced:
    TracePoint.new(:c_call) do |tp|
      puts "Called from #{Thread.current}" if tp.method_id == :size
    end.enable(target: method(:test)) do
      test
      other.join
    end
    # Ruby 3.1 & 3.2:
    #   Called from #<Thread:...@main run>
    #   Called from #<Thread:...@other run>

RubyVM::AbstractSyntaxTree

error_tolerant: true option for parsing

With this option, parsing can be performed even on incomplete and syntactically incorrect scripts, replacing unparseable parts with ERROR token.

  • Reason: The new option opens road for using Ruby native parser for various language tools working on the fly, while the code is written (like LSP) or providing advice and possible fixes on erroneous code. It is important that "official" language parser supported such cases out-of-the-box.
  • Discussion: Feature #19013
  • Documentation: AbstractSyntaxTree.parse
  • Code:
    src = <<~RUBY
      def test
    RUBY
    
    RubyVM::AbstractSyntaxTree.parse(src)
    # in `parse': syntax error, unexpected end-of-input (SyntaxError)
    
    root = RubyVM::AbstractSyntaxTree.parse(src, error_tolerant: true)
    pp root
    # Shortening for the sake of this changelog, the structure of the tree would be:
    #
    #   (SCOPE@1:0-1:8
    #    body:
    #      (DEFN@1:0-1:8
    #       mid: :test
    #       body:
    #         (SCOPE@1:0-1:8
    #          args:
    #            (ARGS@1:8-1:8 ...)
    #          body: nil)))
    #
    # E.g. the code is correctly parsed as "a beginning of a method `test`
    # without a body"
    
    # The parser also tries to recover from errors in the middle of the script:
    src = <<~RUBY
      def bad
        x +
      end
    
      def good
        puts 'ok'
      end
    RUBY
    
    root = RubyVM::AbstractSyntaxTree.parse(src, error_tolerant: true)
    pp root
    # Shortened output again...
    #
    #   (SCOPE@1:0-7:3
    #    body:
    #      (BLOCK@1:0-7:3
    #         (DEFN@1:0-3:3
    #          mid: :bad
    #          body:
    #            (SCOPE@1:0-3:3
    #             body: (ERROR@2:2-3:3)))
    #         (DEFN@5:0-7:3
    #          mid: :good
    #          body:
    #            (SCOPE@5:0-7:3
    #             body: (FCALL@6:2-6:13 :puts (LIST@6:7-6:13 (STR@6:7-6:13 "test") nil))))))
    #
    # Note the ERROR node in the midle of method :bad, but then properly parsed
    # body of method :good
  • Notes:
    • Recovery not guaranteed

keep_tokens: true option for parsing

With keep_tokens: true option provided, AbstractSyntaxTree.parse will attach corresponding code tokens array to each node of the syntax tree.

  • Reason: As the previous feature, this one is useful for implementing code analysis tools: there are several ways to write code that will produce exactly the same syntax tree; and while it doesn't affect interpreting, it does affect style checking, suggestions etc.
  • Discussion: Feature #19070
  • Documentation: AbstractSyntaxTree.parse, Node#tokens, Node#all_tokens
  • Code:
    RubyVM::AbstractSyntaxTree.parse("puts 'test'", keep_tokens: true).tokens
    # =>
    # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]],
    #  [1, :tSP, " ", [1, 4, 1, 5]],
    #  [2, :tSTRING_BEG, "'", [1, 5, 1, 6]],
    #  [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]],
    #  [4, :tSTRING_END, "'", [1, 10, 1, 11]]]
    RubyVM::AbstractSyntaxTree.parse("puts('test')", keep_tokens: true).tokens
    # =>
    # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]],
    #  [1, :"(", "(", [1, 4, 1, 5]],
    #  [2, :tSTRING_BEG, "'", [1, 5, 1, 6]],
    #  [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]],
    #  [4, :tSTRING_END, "'", [1, 10, 1, 11]],
    #  [5, :")", ")", [1, 11, 1, 12]]]
    RubyVM::AbstractSyntaxTree.parse("puts('test', )", keep_tokens: true).tokens
    # =>
    # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]],
    #  [1, :"(", "(", [1, 4, 1, 5]],
    #  [2, :tSTRING_BEG, "'", [1, 5, 1, 6]],
    #  [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]],
    #  [4, :tSTRING_END, "'", [1, 10, 1, 11]],
    #  [5, :",", ",", [1, 11, 1, 12]],
    #  [6, :tSP, " ", [1, 12, 1, 13]],
    #  [7, :")", ")", [1, 13, 1, 14]]]
    Note that all three scripts are exactly equivalent execution-wise and will produce the same syntax tree; but from the point of view of code analysis tool, they are different. For example, the first one might cause the suggestion to add parentheses (if that's the preferred style setting), and the last one might imply that the user waits for suggestions for possible local variables to add to output.

Standard library

By Ruby 3.1 release, most of the standard library is extracted to either default or bundled gems; their development happens in separate repositories, and changelogs are either maintained there, or absent altogether. Either way, their changes aren't mentioned in the combined Ruby changelog, and I'll not be trying to follow all of them.

stdgems.org project has a nice explanations of default and bundled gems concepts, as well as a list of currently gemified libraries and links to their docs.

"For the rest of us" this means libraries development extracted into separate GitHub repositories, and they are just packaged with main Ruby before release. It means you can do issue/PR to any of them independently, without going through more tough development process of the core Ruby.

A few changes to mention, though:

  • Pathname#lutime.
  • FileUtils.ln_sr and relative: option for FileUtils.ln_s. Discussion: Feature #18925.
  • CGI.escapeURIComponent and CGI.unescapeURIComponent are added. This is an attempt to mitigate discrepancy between various helper method throughout the standard libraries like URI, ERB and CGI. Discussion: Feature #18822
    • The difference with CGI.escape/unescape is only in encoding and decoding ' ' character (escape follows application/x-www-form-urlencoded which converts it to +, while escapeURIComponent follows RFC 3986 and converts it to '%20')
    • Previously, the goal could've been achieved with URI.escape, but it was deprecated since 1.9 and removed in 3.0, being too vague and generic (it actually meant to replace all "unsafe" characters on URI construction).
    • Unusual for Ruby method names are mimicking well-known JS ones like encodeURIComponent.
  • Coverage:
  • There are many awesome changes in Ruby's console IRB, see the gem author's article What's new in Ruby 3.2's IRB?.

Version updates

Default gems

Bundled gems

Standard library content changes

New libraries

  • syntax_suggest (formerly dead_end) gem added. It provides helpful error messages for wrong syntax, trying to guess the place of the error. For example, assuming this test.rb:
    def foo
      [1, 2, 3].each {
    end
    an attempt to run it with Ruby 3.1 produces:
    test.rb:3: syntax error, unexpected `end'
    while Ruby 3.2 produces:
    Unmatched `{', missing `}' ?
      23  def foo
    > 24    [1, 2, 3].each {
      25  end
    test.rb:3: syntax error, unexpected `end' (SyntaxError)