Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD ROCm] _validate_bnb_multi_backend_availability() incorrectly tries to alter a frozenset. #1573

Closed
anadon opened this issue Mar 25, 2025 · 6 comments
Assignees
Labels

Comments

@anadon
Copy link

anadon commented Mar 25, 2025

System Info

When building for the ROCm/HIP backend, then importing the python module, the following traceback is printed:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/anadon/Documents/code/Kiwi-LLaMA/kiwillama/__main__.py", line 20, in <module>
    train()
  File "/home/anadon/Documents/code/Kiwi-LLaMA/kiwillama/train.py", line 45, in train
    base_model = AutoModelForCausalLM.from_pretrained(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3698, in from_pretrained
    hf_quantizer.validate_environment(
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 83, in validate_environment
    validate_bnb_backend_availability(raise_exception=True)
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/integrations/bitsandbytes.py", line 558, in validate_bnb_backend_availability
    return _validate_bnb_multi_backend_availability(raise_exception)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/6va0jj7wpj3xb3vkxkwz78apvsd2ckyx-python3.12-transformers-4.49.0/lib/python3.12/site-packages/transformers/integrations/bitsandbytes.py", line 499, in _validate_bnb_multi_backend_availability
    available_devices.discard("cpu")  # Only Intel CPU is supported by BNB at the moment
    ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'frozenset' object has no attribute 'discard'

Reproduction

Using a System with an AMD CPU, and a RX 7900 XTX, build the following Nix Flake then import bitsandbytes.

{
  description = "Define development dependencies.";

  inputs = {
    # Which Nix upstream package branch to track
    nixpkgs.url = "nixpkgs/nixos-unstable";
    process-compose-flake.url = "github:Platonic-Systems/process-compose-flake";
    services-flake.url = "github:juspay/services-flake";
  };

  # What results we're going to expose
  outputs = { nixpkgs, process-compose-flake, services-flake, ... }:
    let

      supportedSystems = [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" ];
      forAllSystems = f: nixpkgs.lib.genAttrs supportedSystems (system: f rec {

        # Configure package settings
        pkgs = import nixpkgs { 
          inherit system; 
          # Accept the following un-free licenses 
          config.allowUnfreePredicate = pkg: builtins.elem (nixpkgs.lib.getName pkg) [
            "cuda_nvcc"
            "cudnn"
            "libcublas"
            "cuda_cudart"
            "cuda_cccl"
            "libcufile"
            "libcurand" # because of bitsandbytes
            "libcusolver" # because of bitsandbytes
            "libnvjitlink" # because of bitsandbytes
            "libcusparse" # because of bitsandbytes
          ];

          # Some packages are reported broken but we need them to even build, so enable them anyways.
          config.allowBroken=true;

          overlays = [
            (final: prev: { 

              python312Packages = prev.python312Packages // {
                trl = pkgs.python312Packages.buildPythonPackage rec {
                  pname = "trl";
                  version = "v0.16.0";
                  # Declare repos which are then later used to build packages
                  # See https://ryantm.github.io/nixpkgs/builders/fetchers/ for more details.
                  src = pkgs.fetchFromGitHub {
                    owner = "huggingface";
                    repo = "trl";
                    rev = "${version}";
                    sha256 = "sha256-+ab952LXUM3nSpsil/xH2PrqTA9uNdt82m1dLN1iEQg=";
                  };
                  propagatedBuildInputs = [ (with pkgs.python312Packages; [ 
                    datasets
                    rich
                    accelerate
                    transformers
                  ])];
                };

                bitsandbytes-hip = pkgs.python312Packages.buildPythonPackage rec {
                  pname = "bitsandbytes";
                  version = "0.45.1";
                  format = "other";
                
                  # Directly fetch the wheel file instead of using pip during the build
                  wheel = pkgs.fetchurl {
                    url = "https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.45.1.dev0-py3-none-manylinux_2_24_x86_64.whl";
                    hash = "sha256-Z/7V+LU8XNXXh/WKwVKNHalSarRQLjjGijI+iGPY3K4="; # Your original hash looked correct
                  };

                  src = pkgs.fetchFromGitHub {
                    owner = "bitsandbytes-foundation";
                    repo = "${pname}";
                    rev = "multi-backend-refactor";
                    sha256 = "sha256-WWNhrhQYaauvhW2xylZ0ROoOfGxqpUUWoD2d9YLWFUE="; # Your updated hash
                  };
                
                  dontBuild = true;
                
                  # Dependencies
                  nativeBuildInputs = with pkgs; [
                    #python312Packages.pip
                    #python312Packages.wheel
                    #python312Packages.setuptools
                    unzip
                    #git
                    patchelf
                    makeWrapper
                  ];
                
                  buildInputs = with pkgs; [
                    rocmPackages.clr
                    rocmPackages.hipblas
                    rocmPackages.rocblas
                    rocmPackages.rocrand
                    rocmPackages.hipcub
                    rocmPackages.miopen
                  ];
                
                  propagatedBuildInputs = with pkgs.python312Packages; [
                    torch
                    scipy
                    numpy
                  ];
                
                  # Custom install phase
                  installPhase = ''
                    # Create Python package directory
                    mkdir -p $out/${pkgs.python312.sitePackages}
                    
                    # Extract the wheel directly to the site-packages directory
                    unzip ${wheel} -d $TMPDIR/wheel_extract
                    
                    # Copy the package content
                    cp -r $TMPDIR/wheel_extract/bitsandbytes $out/${pkgs.python312.sitePackages}/
                    cp -r $TMPDIR/wheel_extract/bitsandbytes-*.dist-info $out/${pkgs.python312.sitePackages}/
                    
                    # Fix RPATH in the shared libraries - specify all relevant ROCm .so files
                    find $out/${pkgs.python312.sitePackages}/bitsandbytes -name "*.so" | while read sofile; do
                      echo "Patching RPATH for $sofile"
                      patchelf --set-rpath "${pkgs.lib.makeLibraryPath buildInputs}" "$sofile"
                    done
                    
                    # Create bin directory and wrapper script
                    mkdir -p $out/bin
                    makeWrapper ${pkgs.python312}/bin/python3 $out/bin/python3-bnb \
                      --set PYTHONPATH $out/${pkgs.python312.sitePackages}:$PYTHONPATH \
                      --set BNB_COMPUTE_BACKEND "HIP" \
                      --set HIP_VISIBLE_DEVICES "0" \
                      --set LD_LIBRARY_PATH "${pkgs.lib.makeLibraryPath buildInputs}"
                  '';
                
                  # Skip tests for now
                  doCheck = false;
                
                  # Simple import check to verify installation
                  pythonImportsCheck = [ "bitsandbytes" ];
                
                  meta = with pkgs.lib; {
                    description = "8-bit optimizers and matrix multiplication with ROCm support";
                    homepage = "https://github.com/bitsandbytes-foundation/bitsandbytes";
                    license = licenses.mit;
                    platforms = platforms.linux;
                  };
                };
              };
            })
          ];
        };

        # Specify service processes which should be made available to run via `nix run ...`.
        servicesMod = (import process-compose-flake.lib { inherit pkgs; }).evalModules {
          modules = [
            services-flake.processComposeModules.default
            {
              services.ollama."ollama1" = {
                enable = true;
                acceleration = "rocm";
              };
            }
          ];
        };
      });

    in {
      packages = forAllSystems ({ servicesMod, ... }: {
        default = servicesMod.config.outputs.package;
      });

      # Declare what packages we need as a record. The use as a record is
      # needed because, without it, the data contained within can't be
      # referenced in other parts of this file.
      devShells = forAllSystems ({pkgs, servicesMod}: {
        default = pkgs.mkShell rec {
          packages = with pkgs; [
            python312Full 
            python312Packages.distlib
            python312Packages.cython
            python312Packages.setuptools
            python312Packages.setuptoolsBuildHook
            python312Packages.wheel
            python312Packages.vllm
            python312Packages.beautifulsoup4
            python312Packages.types-beautifulsoup4
            python312Packages.keyring
            python312Packages.peft
            python312Packages.trl
            python312Packages.bitsandbytes-hip
            cudaPackages.cudnn
            cudaPackages.libcublas
            cudaPackages.cuda_cudart
            python312Packages.torchWithoutCuda
            direnv
            pkg-config
            cmake
            blas
            lapack
            gcc_multi 
            gccMultiStdenv
            gcc-unwrapped
            ruff
            ninja
            gfortran
            meson
            glibc_multi
            ollama-rocm
            openblas
            cudaPackages.cuda_nvcc
            zlib
            # niv
            # NOTE: Put additional packages you need in this array. Packages may be found by looking them up in
            # https://search.nixos.org/packages
          ];

          # Getting the library paths needed for Python to be put into
          # LD_LIBRARY_PATH
          pythonldlibpath = "${pkgs.stdenv.cc.cc.lib}/lib:${pkgs.stdenv.cc.cc.lib.outPath}/lib:${pkgs.lib.makeLibraryPath packages}:$NIX_LD_LIBRARY_PATH";

          shellHook = ''
            export LD_LIBRARY_PATH="${pythonldlibpath}"
            export BNB_COMPUTE_BACKEND="HIP"
            export HIP_VISIBLE_DEVICES="0"
            export ROCM_PATH=${pkgs.rocmPackages.clr}
            export HIP_PATH=${pkgs.rocmPackages.clr}
          '';
        };
      });
    };
}

Expected behavior

When imported, no error should be encountered under these operating conditions.

@anadon
Copy link
Author

anadon commented Mar 25, 2025

This is another issue related in #1271

@anadon
Copy link
Author

anadon commented Mar 25, 2025

This is caused by the transformers module, but impacts usage here as well. I'm not sure how you guys want to treat this.

@anadon
Copy link
Author

anadon commented Mar 25, 2025

Related bug created at transformers

anadon added a commit to anadon/transformers that referenced this issue Mar 25, 2025
…ROCm support.

Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.
@anadon
Copy link
Author

anadon commented Mar 25, 2025

I need someone with all the appropriate hardware to test anadon/transformers#1 . I have the sneaking suspicion that the reason this wasn't caught earlier was because available_devices wasn't a frozenset.

anadon added a commit to anadon/transformers that referenced this issue Mar 25, 2025
…36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.
@anadon
Copy link
Author

anadon commented Mar 25, 2025

@matthewdouglas
Copy link
Member

@anadon Thanks for looking into this! I've commented and given my approval on the transformers PR.

As an FYI, now that we have merged #1544 to main, we're going to start to move the AMD integration out of the multiplatform branch. In the very near future we're going to push toward mainlining with a new interface. Stay tuned!

@matthewdouglas matthewdouglas self-assigned this Mar 26, 2025
SunMarc pushed a commit to huggingface/transformers that referenced this issue Mar 26, 2025
…ROCm support. (#36975)

* Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.

Related to bitsandbytes-foundation/bitsandbytes#1573 and #36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.

* Related to bitsandbytes-foundation/bitsandbytes#1573 and #36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.

---------

Co-authored-by: Mohamed Mekkouri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants