Skip to content

Performance regression: typing.cast() with Union types causes 15-60x slowdown in hot code paths #137138

@hyfjjjj

Description

@hyfjjjj

Bug report

Bug description:

Summary

The typing.cast() function exhibits severe performance degradation when used with complex Union types, causing 15-60x slowdown compared to direct assignment. This significantly impacts performance-critical code that relies on type hints.

Performance Impact

Based on benchmarking with 1,000,000 iterations on Python 3.12.11 (macOS ARM64):

Method Time (seconds) Slowdown Factor
Direct assignment 0.0129 1.0x (baseline)
Type annotation 0.0125 1.0x
cast(str, value) 0.0307 2.4x
cast(Union[str, int], value) 0.1919 14.9x
cast(Union[str, int, float, bool], value) 0.2057 16.0x
cast(Literal[6 values], value) 0.3342 26.0x
cast(Union[Literal[...], Union[...]], value) 2.4168 187.8x

Reproduction Code

import time
from typing import Union, Literal, cast

def benchmark_cast_performance():
    test_value = '<i2'
    iterations = 1000000

    # Baseline: direct assignment
    start = time.perf_counter()
    for _ in range(iterations):
        result = test_value
    end = time.perf_counter()
    baseline = end - start
    print(f"Baseline (direct assignment): {baseline:.4f}s (1.00x)")

    # Simple cast
    start = time.perf_counter()
    for _ in range(iterations):
        result = cast(str, test_value)
    end = time.perf_counter()
    simple_cast_time = end - start
    print(f"Simple cast: {simple_cast_time:.4f}s ({simple_cast_time/baseline:.2f}x)")

    # Union cast
    start = time.perf_counter()
    for _ in range(iterations):
        result = cast(Union[str, int], test_value)
    end = time.perf_counter()
    union_cast_time = end - start
    print(f"Union cast: {union_cast_time:.4f}s ({union_cast_time/baseline:.2f}x)")

    # Complex Union cast
    LiteralType1 = Literal['<i1', '<u1', '<i2', '<u2', '<i4', '<u4', '<i8', '<u8', '<f4', '<f8']
    UnionType1 = Union[LiteralType1, Literal[0]]

    start = time.perf_counter()
    for _ in range(iterations):
        result = cast(Union[LiteralType1, UnionType1], test_value)
    end = time.perf_counter()
    complex_cast_time = end - start
    print(f"Complex Union cast: {complex_cast_time:.4f}s ({complex_cast_time/baseline:.2f}x)")

if __name__ == "__main__":
    benchmark_cast_performance()

Output:

Direct assignment (baseline)                       0.0129       1.0x
Type annotation                                    0.0125       1.0x
cast(str, value)                                   0.0307       2.4x
cast(Union[str, int], value)                       0.1919       14.9x
cast(Union[str, int, float, bool], value)          0.2057       16.0x
cast(Literal[6 values], value)                     0.3342       26.0x
cast(Complex Union[Literal[...], Union[...]], value) 2.4168       187.8x

Root Cause Analysis

The performance degradation appears to stem from:

  1. Runtime type construction overhead: Complex Union types require expensive type object creation
  2. Function call overhead: Unlike type annotations (which are compile-time), cast() involves actual function calls
  3. Type validation complexity: More complex types require more processing during the cast operation

Bytecode Analysis

Direct assignment bytecode:

LOAD_FAST    0 (test_value)
STORE_FAST   1 (result)

Simple cast bytecode:

LOAD_GLOBAL  1 (NULL + cast)
LOAD_GLOBAL  2 (str)
LOAD_FAST    0 (test_value)
CALL         2
STORE_FAST   1 (result)

Complex Union cast bytecode:

LOAD_GLOBAL              1 (NULL + cast)
LOAD_GLOBAL              2 (Union)
LOAD_GLOBAL              4 (LiteralType1)
LOAD_GLOBAL              6 (UnionType1)
BUILD_TUPLE              2
BINARY_SUBSCR
LOAD_FAST                0 (test_value)
CALL                     2
STORE_FAST               1 (result)

Impact on Real Applications

This performance issue significantly affects:

  1. Performance-critical libraries that use type hints extensively
  2. Data processing pipelines with frequent type casting
  3. Scientific computing where every microsecond counts
  4. Hot code paths in production applications

Expected Behavior

Since cast() is documented as doing nothing at runtime ("This returns the value unchanged"), users expect minimal performance overhead. The current 15-60x slowdown is unexpected and problematic.

Suggested Solutions

  1. Optimize cast() implementation for common Union patterns
  2. Cache type construction for frequently used Union types
  3. Add performance warnings in documentation for complex Union casts
  4. Consider compile-time optimization similar to type annotations

Environment

  • Python Version: 3.12.x
  • Operating System: macOS/Linux/Windows (reproduced across platforms)
  • Architecture: ARM64/x86_64 (both affected)

Workaround

For performance-critical code, use type annotations instead of cast():

# Slow (204x slower)
result = cast(Union[ComplexType1, ComplexType2], value)

# Fast (no performance penalty)
result: Union[ComplexType1, ComplexType2] = value  # type: ignore

This issue affects any codebase that relies heavily on typing.cast() with Union types in performance-critical sections.

CPython versions tested on:

3.12

Operating systems tested on:

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)pendingThe issue will be closed if no feedback is providedperformancePerformance or resource usagetopic-typingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions