Skip to content

Relax deps, transformers 5.x compat, comet as optional extra#446

Merged
ArtemVazh merged 29 commits intomainfrom
fix/relax-spacy-optional-comet
Apr 9, 2026
Merged

Relax deps, transformers 5.x compat, comet as optional extra#446
ArtemVazh merged 29 commits intomainfrom
fix/relax-spacy-optional-comet

Conversation

@smirnovlad
Copy link
Copy Markdown
Collaborator

@smirnovlad smirnovlad commented Apr 8, 2026

Summary

  • Remove spacy<3.8.0 upper boundspacy>=3.4.0

    • Allows newer thinc/numpy versions needed for vLLM compatibility
  • Remove transformers upper boundtransformers>=4.50.0

    • Added forward-compatibility code for transformers 5.x breaking changes:
      • BeamScorer removal → None fallback
      • Output class renames → try/except imports
      • batch_encode_plus removal → replaced with tokenizer()
      • AutoModelForVision2Seq rename → fallback to AutoModelForImageTextToText
    • Added _SanitizeLogitsProcessor to handle inf/nan logits edge cases
  • Move unbabel-comet to optional extraspip install lm-polygraph[comet]

    • unbabel-comet pins numpy<2.0 which conflicts with vLLM (numpy>=2.0)
    • Comet import guarded with try/except ImportError in __init__.py
    • Lazy evaluate import in comet.py

Install paths

Use case Command numpy transformers
Full (with comet) pip install lm-polygraph[comet] <2.0 <5.0
Without comet (for vLLM) pip install lm-polygraph any any
With comet + numpy 2.x pip install lm-polygraph then pip install unbabel-comet --no-deps 2.x any

Test plan

  • CI passes (single job, no transformers version matrix)
  • from lm_polygraph.generation_metrics import Comet returns None when comet not installed
  • pip install lm-polygraph[comet] installs comet correctly
  • Existing tests pass with both transformers 4.x and 5.x

- Remove spacy<3.8.0 upper bound: spacy 3.8+ uses thinc 8.3+/9.x
  which is compatible with numpy 2.x (required by vLLM and other
  modern ML packages). The old pin forced thinc 8.2.x → numpy 1.x,
  creating unresolvable conflicts with vLLM/torch/cupy.

- Make unbabel-comet optional: comment out from requirements.txt
  and guard the import in generation_metrics/__init__.py. The Comet
  metric class is only used for translation evaluation and is not
  needed by most users. Users who need it can install separately
  with `pip install unbabel-comet --no-deps`.

- Move `from evaluate import load` to lazy import inside Comet.__init__
  so the module can be imported without unbabel-comet installed.
Run tests against both default transformers (from requirements.txt)
and transformers 5.x to catch compatibility issues early.
Lint runs only once (on default version).

Relates to #445.
transformers 5.0 removed/renamed several classes:
- beam_search submodule removed: BeamScorer no longer exists
- Output classes renamed:
  - BeamSearchOutput → GenerateBeamEncoderDecoderOutput
  - BeamSearchDecoderOnlyOutput → GenerateBeamDecoderOnlyOutput
  - SampleOutput → GenerateNonBeamOutput
  - SampleDecoderOnlyOutput → GenerateDecoderOnlyOutput
  - GreedySearchOutput → GenerateNonBeamOutput
  - GreedySearchDecoderOnlyOutput → GenerateDecoderOnlyOutput
- AutoModelForVision2Seq removed

All imports now use try/except with aliases to support both 4.x and 5.x.

Relates to #445.
batch_encode_plus was removed from newer transformers tokenizers.
The direct __call__ (tokenizer(...)) is equivalent and works on
all versions.
The test_all_seq_ue test uses do_sample=True for sampling-based
estimators (MonteCarloSequenceEntropy, PTrueSampling, etc.).
With seed=null, torch.multinomial occasionally fails with
"probability tensor contains inf, nan or element < 0" due to
non-deterministic logit values from bloomz-560m on CPU.
Setting a fixed seed makes the test deterministic and reproducible.
Different numpy versions (1.x vs 2.x) can cause bloomz-560m to
produce inf/nan logits on CPU, crashing torch.multinomial.
Add _SanitizeLogitsProcessor that clamps inf/nan to finite values
before scoring and sampling. Runs first in the logits processor chain.
@smirnovlad
Copy link
Copy Markdown
Collaborator Author

smirnovlad commented Apr 8, 2026

Note on _SanitizeLogitsProcessor and numpy 2.x

Why relax spacy<3.8.0?

vLLM (and other modern ML packages) requires numpy>=2.0. But:

  • spacy<3.8.0 → installs thinc 8.2.x → compiled against numpy 1.x
  • numpy 1.x and numpy 2.x are binary incompatible (ValueError: numpy.dtype size changed)

So lm-polygraph with spacy<3.8.0 is uninstallable alongside vLLM. Relaxing to spacy>=3.4.0 allows spacy 3.8+ → thinc 8.3+ which is built for numpy 2.x.

Why the logits sanitizer?

The numpy 1.x → 2.x change subtly affects floating point behavior in PyTorch CPU operations. Specifically, bloomz-560m running on CPU with numpy 2.x occasionally produces inf/nan logits during generation, which crashes torch.multinomial in the sampling path:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The same model + code + seed works fine with numpy 1.x (the main branch) — the numerical difference is purely from the numpy version.

The _SanitizeLogitsProcessor added in this PR runs first in the logits processor chain and clamps any inf/nan values to finite numbers via torch.nan_to_num(). This is a defensive fix that makes the code robust regardless of the numpy version or hardware.

The previous sanitizer replaced +inf with 1e4, which completely
dominated softmax and caused the model to generate the same token
repeatedly, never hitting stop_strings. This made test_just_works
take 22+ minutes on CI (vs 3 min on main) because generations
ran to max_new_tokens instead of stopping at "\n".

Now replaces inf values with the max/min finite value from the same
row, preserving the original distribution shape.

Also add per-step timeout to pytest to prevent future hangs.
Stream subprocess output in tests and add per-stage timing to
polygraph_eval to identify which step is slow on CI.
In transformers 5.x, parameters like temperature, top_k, top_p must
be passed via GenerationConfig object, not as loose kwargs to
model.generate(). Without this, temperature was silently ignored,
causing the model to generate with default temperature=1.0 instead
of the configured value. This made generations much longer (never
hitting stop conditions early), causing NLI calculators to process
far more tokens and making CI tests take 25+ minutes instead of ~7.
@smirnovlad
Copy link
Copy Markdown
Collaborator Author

Root cause of slow CI tests

Found the issue. When we made unbabel-comet optional (commented out in requirements.txt), we removed an implicit transformers<5 pin — comet was the dependency that constrained transformers to 4.x.

Without that constraint, pip install . now resolves to transformers 5.x even for the "default" CI job.

In transformers 5.x, generation parameters like temperature, top_k, top_p must be passed via a GenerationConfig object — passing them as loose kwargs to model.generate() results in:

The following generation flags are not valid and may be ignored: ['temperature']

This meant temperature=0.7 from test configs was silently ignored → model used default temperature=1.0 → generated much longer outputs → NLI calculator had to process far more tokens (310s/batch instead of ~30s) → CI tests took 25+ minutes instead of ~7 minutes → runner killed with SIGTERM.

Fix: wrap generation params in GenerationConfig before calling model.generate(). This is backwards-compatible with transformers 4.x too.

Revert all debug logging and failed fix attempts (sanitizer,
GenerationConfig wrapping, test config changes). Keep only the
core changes: relaxed spacy bound, optional comet, transformers
5.x import compat, and batch_encode_plus replacement.

CI now tests both transformers <5 and >=5 via matrix strategy.
@smirnovlad smirnovlad changed the title Relax spacy upper bound, make unbabel-comet optional Relax dependency bounds, make comet optional, add transformers 5.x forward-compat Apr 9, 2026
@smirnovlad smirnovlad changed the title Relax dependency bounds, make comet optional, add transformers 5.x forward-compat Add transformers 5.x forward-compat Apr 9, 2026
- Remove transformers upper bound (compat code handles both 4.x and 5.x)
- Move unbabel-comet to [comet] extra in pyproject.toml
- Update README with two install paths (with/without comet)
- Fix black formatting in generation_metrics __init__
@smirnovlad smirnovlad changed the title Add transformers 5.x forward-compat Relax deps, transformers 5.x compat, comet as optional extra Apr 9, 2026
@smirnovlad
Copy link
Copy Markdown
Collaborator Author

Summary

Goal: Make lm-polygraph installable without any post-install patches (sed hacks), so we can release ThinkBooster as a standalone PyPI package. Currently ThinkBooster's setup.sh patches lm-polygraph's requirements.txt after cloning — this won't work for a PyPI release where users just do pip install thinkbooster.

What we did:

  1. Removed spacy<3.8.0 upper bound — the old pin forced thinc 8.2.xnumpy 1.x, which conflicts with vLLM (numpy>=2.0)

  2. Removed transformers upper bound (was <4.52.0) — added forward-compatibility code for transformers 5.x (try/except imports for removed/renamed classes, batch_encode_plus replacement, logits sanitizer for inf/nan edge cases)

  3. Moved unbabel-comet to optional [comet] extraunbabel-comet pins numpy<2.0, and there's no version of comet that supports numpy 2.x. Since vLLM requires numpy>=2.0, they can't coexist. By making comet optional:

    • pip install lm-polygraph — works with vLLM/numpy 2.x (what ThinkBooster needs)
    • pip install lm-polygraph[comet] — full install with comet for standalone usage

Result: After this PR is merged, ThinkBooster can simply list lm-polygraph as a dependency in pyproject.toml — no patches, no sed, clean PyPI-compatible install.

@ArtemVazh ArtemVazh merged commit 71a59ad into main Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dependency conflict between unbabel-comet and transformers>=5.0.0/sentencepiece>=0.2.1

2 participants