Relax deps, transformers 5.x compat, comet as optional extra by smirnovlad · Pull Request #446 · IINemo/lm-polygraph

smirnovlad · 2026-04-08T12:15:04Z

Summary

Remove spacy<3.8.0 upper bound → spacy>=3.4.0
- Allows newer thinc/numpy versions needed for vLLM compatibility
Remove transformers upper bound → transformers>=4.50.0
- Added forward-compatibility code for transformers 5.x breaking changes:
  - BeamScorer removal → None fallback
  - Output class renames → try/except imports
  - batch_encode_plus removal → replaced with tokenizer()
  - AutoModelForVision2Seq rename → fallback to AutoModelForImageTextToText
- Added _SanitizeLogitsProcessor to handle inf/nan logits edge cases
Move unbabel-comet to optional extras → pip install lm-polygraph[comet]
- unbabel-comet pins numpy<2.0 which conflicts with vLLM (numpy>=2.0)
- Comet import guarded with try/except ImportError in __init__.py
- Lazy evaluate import in comet.py

Install paths

Use case	Command	numpy	transformers
Full (with comet)	`pip install lm-polygraph[comet]`	<2.0	<5.0
Without comet (for vLLM)	`pip install lm-polygraph`	any	any
With comet + numpy 2.x	`pip install lm-polygraph` then `pip install unbabel-comet --no-deps`	2.x	any

Test plan

CI passes (single job, no transformers version matrix)
from lm_polygraph.generation_metrics import Comet returns None when comet not installed
pip install lm-polygraph[comet] installs comet correctly
Existing tests pass with both transformers 4.x and 5.x

- Remove spacy<3.8.0 upper bound: spacy 3.8+ uses thinc 8.3+/9.x which is compatible with numpy 2.x (required by vLLM and other modern ML packages). The old pin forced thinc 8.2.x → numpy 1.x, creating unresolvable conflicts with vLLM/torch/cupy. - Make unbabel-comet optional: comment out from requirements.txt and guard the import in generation_metrics/__init__.py. The Comet metric class is only used for translation evaluation and is not needed by most users. Users who need it can install separately with `pip install unbabel-comet --no-deps`. - Move `from evaluate import load` to lazy import inside Comet.__init__ so the module can be imported without unbabel-comet installed.

Run tests against both default transformers (from requirements.txt) and transformers 5.x to catch compatibility issues early. Lint runs only once (on default version). Relates to #445.

transformers 5.0 removed/renamed several classes: - beam_search submodule removed: BeamScorer no longer exists - Output classes renamed: - BeamSearchOutput → GenerateBeamEncoderDecoderOutput - BeamSearchDecoderOnlyOutput → GenerateBeamDecoderOnlyOutput - SampleOutput → GenerateNonBeamOutput - SampleDecoderOnlyOutput → GenerateDecoderOnlyOutput - GreedySearchOutput → GenerateNonBeamOutput - GreedySearchDecoderOnlyOutput → GenerateDecoderOnlyOutput - AutoModelForVision2Seq removed All imports now use try/except with aliases to support both 4.x and 5.x. Relates to #445.

batch_encode_plus was removed from newer transformers tokenizers. The direct __call__ (tokenizer(...)) is equivalent and works on all versions.

The test_all_seq_ue test uses do_sample=True for sampling-based estimators (MonteCarloSequenceEntropy, PTrueSampling, etc.). With seed=null, torch.multinomial occasionally fails with "probability tensor contains inf, nan or element < 0" due to non-deterministic logit values from bloomz-560m on CPU. Setting a fixed seed makes the test deterministic and reproducible.

…mial

Different numpy versions (1.x vs 2.x) can cause bloomz-560m to produce inf/nan logits on CPU, crashing torch.multinomial. Add _SanitizeLogitsProcessor that clamps inf/nan to finite values before scoring and sampling. Runs first in the logits processor chain.

smirnovlad · 2026-04-08T14:30:52Z

Note on `_SanitizeLogitsProcessor` and numpy 2.x

Why relax `spacy<3.8.0`?

vLLM (and other modern ML packages) requires numpy>=2.0. But:

spacy<3.8.0 → installs thinc 8.2.x → compiled against numpy 1.x
numpy 1.x and numpy 2.x are binary incompatible (ValueError: numpy.dtype size changed)

So lm-polygraph with spacy<3.8.0 is uninstallable alongside vLLM. Relaxing to spacy>=3.4.0 allows spacy 3.8+ → thinc 8.3+ which is built for numpy 2.x.

Why the logits sanitizer?

The numpy 1.x → 2.x change subtly affects floating point behavior in PyTorch CPU operations. Specifically, bloomz-560m running on CPU with numpy 2.x occasionally produces inf/nan logits during generation, which crashes torch.multinomial in the sampling path:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The same model + code + seed works fine with numpy 1.x (the main branch) — the numerical difference is purely from the numpy version.

The _SanitizeLogitsProcessor added in this PR runs first in the logits processor chain and clamps any inf/nan values to finite numbers via torch.nan_to_num(). This is a defensive fix that makes the code robust regardless of the numpy version or hardware.

The previous sanitizer replaced +inf with 1e4, which completely dominated softmax and caused the model to generate the same token repeatedly, never hitting stop_strings. This made test_just_works take 22+ minutes on CI (vs 3 min on main) because generations ran to max_new_tokens instead of stopping at "\n". Now replaces inf values with the max/min finite value from the same row, preserving the original distribution shape. Also add per-step timeout to pytest to prevent future hangs.

Stream subprocess output in tests and add per-stage timing to polygraph_eval to identify which step is slow on CI.

In transformers 5.x, parameters like temperature, top_k, top_p must be passed via GenerationConfig object, not as loose kwargs to model.generate(). Without this, temperature was silently ignored, causing the model to generate with default temperature=1.0 instead of the configured value. This made generations much longer (never hitting stop conditions early), causing NLI calculators to process far more tokens and making CI tests take 25+ minutes instead of ~7.

smirnovlad · 2026-04-08T20:01:28Z

Root cause of slow CI tests

Found the issue. When we made unbabel-comet optional (commented out in requirements.txt), we removed an implicit transformers<5 pin — comet was the dependency that constrained transformers to 4.x.

Without that constraint, pip install . now resolves to transformers 5.x even for the "default" CI job.

In transformers 5.x, generation parameters like temperature, top_k, top_p must be passed via a GenerationConfig object — passing them as loose kwargs to model.generate() results in:

The following generation flags are not valid and may be ignored: ['temperature']

This meant temperature=0.7 from test configs was silently ignored → model used default temperature=1.0 → generated much longer outputs → NLI calculator had to process far more tokens (310s/batch instead of ~30s) → CI tests took 25+ minutes instead of ~7 minutes → runner killed with SIGTERM.

Fix: wrap generation params in GenerationConfig before calling model.generate(). This is backwards-compatible with transformers 4.x too.

…5.x" This reverts commit 5e27656.

Revert all debug logging and failed fix attempts (sanitizer, GenerationConfig wrapping, test config changes). Keep only the core changes: relaxed spacy bound, optional comet, transformers 5.x import compat, and batch_encode_plus replacement. CI now tests both transformers <5 and >=5 via matrix strategy.

- Remove transformers upper bound (compat code handles both 4.x and 5.x) - Move unbabel-comet to [comet] extra in pyproject.toml - Update README with two install paths (with/without comet) - Fix black formatting in generation_metrics __init__

smirnovlad · 2026-04-09T09:02:51Z

Summary

Goal: Make lm-polygraph installable without any post-install patches (sed hacks), so we can release ThinkBooster as a standalone PyPI package. Currently ThinkBooster's setup.sh patches lm-polygraph's requirements.txt after cloning — this won't work for a PyPI release where users just do pip install thinkbooster.

What we did:

Removed spacy<3.8.0 upper bound — the old pin forced thinc 8.2.x → numpy 1.x, which conflicts with vLLM (numpy>=2.0)
Removed transformers upper bound (was <4.52.0) — added forward-compatibility code for transformers 5.x (try/except imports for removed/renamed classes, batch_encode_plus replacement, logits sanitizer for inf/nan edge cases)
Moved unbabel-comet to optional [comet] extra — unbabel-comet pins numpy<2.0, and there's no version of comet that supports numpy 2.x. Since vLLM requires numpy>=2.0, they can't coexist. By making comet optional:
- pip install lm-polygraph — works with vLLM/numpy 2.x (what ThinkBooster needs)
- pip install lm-polygraph[comet] — full install with comet for standalone usage

Result: After this PR is merged, ThinkBooster can simply list lm-polygraph as a dependency in pyproject.toml — no patches, no sed, clean PyPI-compatible install.

ArtemVazh mentioned this pull request Apr 8, 2026

Dependency conflict between unbabel-comet and transformers>=5.0.0/sentencepiece>=0.2.1 #445

Closed

smirnovlad added 5 commits April 8, 2026 16:17

Add optional dependencies section to README for unbabel-comet

a408e7e

Add transformers 5.x to CI test matrix

fd22daf

Run tests against both default transformers (from requirements.txt) and transformers 5.x to catch compatibility issues early. Lint runs only once (on default version). Relates to #445.

Use AutoModelForImageTextToText as fallback for AutoModelForVision2Seq

781bb21

Replace batch_encode_plus with direct tokenizer call

448485c

batch_encode_plus was removed from newer transformers tokenizers. The direct __call__ (tokenizer(...)) is equivalent and works on all versions.

ArtemVazh linked an issue Apr 8, 2026 that may be closed by this pull request

Dependency conflict between unbabel-comet and transformers>=5.0.0/sentencepiece>=0.2.1 #445

Closed

smirnovlad added 6 commits April 8, 2026 16:52

Fix black formatting for tokenizer calls

0630902

Fix seed format: must be a list for polygraph_eval

d9076fc

Set temperature=0.7 in seq_ue test to stabilize sampling

c008917

Add renormalize_logits=True to sampling to prevent inf/nan in multino…

a84359f

…mial

smirnovlad added 8 commits April 8, 2026 19:48

Increase CI timeout to 45 minutes per job

4d061b5

Add timing logs to diagnose slow CI tests

112b4a4

Stream subprocess output in tests and add per-stage timing to polygraph_eval to identify which step is slow on CI.

Fix black formatting for timing logs

7d8d5aa

Add per-calculator timing logs to UEManager

19c5748

Temporarily run only test_all_seq_ue for debugging

8e3a894

Add NLI calculator progress logging

d634148

smirnovlad added 6 commits April 9, 2026 00:11

Revert "Pass generation params via GenerationConfig for transformers …

abd79c0

…5.x" This reverts commit 5e27656.

Pin transformers before installing lm-polygraph

fa46e1e

Use single CI job (same as main)

2133b82

Add logits sanitizer to prevent inf/nan crash in sampling

45955e8

Add CI matrix: transformers <5 and >=5

0184ad5

Pin transformers<5, single CI job

28dd6fb

smirnovlad changed the title ~~Relax spacy upper bound, make unbabel-comet optional~~ Relax dependency bounds, make comet optional, add transformers 5.x forward-compat Apr 9, 2026

smirnovlad changed the title ~~Relax dependency bounds, make comet optional, add transformers 5.x forward-compat~~ Add transformers 5.x forward-compat Apr 9, 2026

smirnovlad changed the title ~~Add transformers 5.x forward-compat~~ Relax deps, transformers 5.x compat, comet as optional extra Apr 9, 2026

smirnovlad mentioned this pull request Apr 9, 2026

Relax dependency bounds and make comet optional #448

Closed

4 tasks

Install lm-polygraph[comet] in CI

daae875

ArtemVazh approved these changes Apr 9, 2026

View reviewed changes

ArtemVazh merged commit 71a59ad into main Apr 9, 2026
1 check passed

smirnovlad mentioned this pull request Apr 9, 2026

Make ThinkBooster installable without patches IINemo/thinkbooster#245

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax deps, transformers 5.x compat, comet as optional extra#446

Relax deps, transformers 5.x compat, comet as optional extra#446
ArtemVazh merged 29 commits intomainfrom
fix/relax-spacy-optional-comet

smirnovlad commented Apr 8, 2026 •

edited

Loading

Uh oh!

smirnovlad commented Apr 8, 2026 •

edited

Loading

Uh oh!

smirnovlad commented Apr 8, 2026

Uh oh!

smirnovlad commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smirnovlad commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Install paths

Test plan

Uh oh!

smirnovlad commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note on _SanitizeLogitsProcessor and numpy 2.x

Why relax spacy<3.8.0?

Why the logits sanitizer?

Uh oh!

smirnovlad commented Apr 8, 2026

Root cause of slow CI tests

Uh oh!

smirnovlad commented Apr 9, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smirnovlad commented Apr 8, 2026 •

edited

Loading

smirnovlad commented Apr 8, 2026 •

edited

Loading

Note on `_SanitizeLogitsProcessor` and numpy 2.x

Why relax `spacy<3.8.0`?