Relax deps, transformers 5.x compat, comet as optional extra#446
Relax deps, transformers 5.x compat, comet as optional extra#446
Conversation
- Remove spacy<3.8.0 upper bound: spacy 3.8+ uses thinc 8.3+/9.x which is compatible with numpy 2.x (required by vLLM and other modern ML packages). The old pin forced thinc 8.2.x → numpy 1.x, creating unresolvable conflicts with vLLM/torch/cupy. - Make unbabel-comet optional: comment out from requirements.txt and guard the import in generation_metrics/__init__.py. The Comet metric class is only used for translation evaluation and is not needed by most users. Users who need it can install separately with `pip install unbabel-comet --no-deps`. - Move `from evaluate import load` to lazy import inside Comet.__init__ so the module can be imported without unbabel-comet installed.
Run tests against both default transformers (from requirements.txt) and transformers 5.x to catch compatibility issues early. Lint runs only once (on default version). Relates to #445.
transformers 5.0 removed/renamed several classes: - beam_search submodule removed: BeamScorer no longer exists - Output classes renamed: - BeamSearchOutput → GenerateBeamEncoderDecoderOutput - BeamSearchDecoderOnlyOutput → GenerateBeamDecoderOnlyOutput - SampleOutput → GenerateNonBeamOutput - SampleDecoderOnlyOutput → GenerateDecoderOnlyOutput - GreedySearchOutput → GenerateNonBeamOutput - GreedySearchDecoderOnlyOutput → GenerateDecoderOnlyOutput - AutoModelForVision2Seq removed All imports now use try/except with aliases to support both 4.x and 5.x. Relates to #445.
batch_encode_plus was removed from newer transformers tokenizers. The direct __call__ (tokenizer(...)) is equivalent and works on all versions.
The test_all_seq_ue test uses do_sample=True for sampling-based estimators (MonteCarloSequenceEntropy, PTrueSampling, etc.). With seed=null, torch.multinomial occasionally fails with "probability tensor contains inf, nan or element < 0" due to non-deterministic logit values from bloomz-560m on CPU. Setting a fixed seed makes the test deterministic and reproducible.
Different numpy versions (1.x vs 2.x) can cause bloomz-560m to produce inf/nan logits on CPU, crashing torch.multinomial. Add _SanitizeLogitsProcessor that clamps inf/nan to finite values before scoring and sampling. Runs first in the logits processor chain.
Note on
|
The previous sanitizer replaced +inf with 1e4, which completely dominated softmax and caused the model to generate the same token repeatedly, never hitting stop_strings. This made test_just_works take 22+ minutes on CI (vs 3 min on main) because generations ran to max_new_tokens instead of stopping at "\n". Now replaces inf values with the max/min finite value from the same row, preserving the original distribution shape. Also add per-step timeout to pytest to prevent future hangs.
Stream subprocess output in tests and add per-stage timing to polygraph_eval to identify which step is slow on CI.
In transformers 5.x, parameters like temperature, top_k, top_p must be passed via GenerationConfig object, not as loose kwargs to model.generate(). Without this, temperature was silently ignored, causing the model to generate with default temperature=1.0 instead of the configured value. This made generations much longer (never hitting stop conditions early), causing NLI calculators to process far more tokens and making CI tests take 25+ minutes instead of ~7.
Root cause of slow CI testsFound the issue. When we made Without that constraint, In transformers 5.x, generation parameters like This meant Fix: wrap generation params in |
…5.x" This reverts commit 5e27656.
Revert all debug logging and failed fix attempts (sanitizer, GenerationConfig wrapping, test config changes). Keep only the core changes: relaxed spacy bound, optional comet, transformers 5.x import compat, and batch_encode_plus replacement. CI now tests both transformers <5 and >=5 via matrix strategy.
- Remove transformers upper bound (compat code handles both 4.x and 5.x) - Move unbabel-comet to [comet] extra in pyproject.toml - Update README with two install paths (with/without comet) - Fix black formatting in generation_metrics __init__
SummaryGoal: Make lm-polygraph installable without any post-install patches (sed hacks), so we can release ThinkBooster as a standalone PyPI package. Currently ThinkBooster's What we did:
Result: After this PR is merged, ThinkBooster can simply list |
Summary
Remove
spacy<3.8.0upper bound →spacy>=3.4.0Remove
transformersupper bound →transformers>=4.50.0BeamScorerremoval →Nonefallbackbatch_encode_plusremoval → replaced withtokenizer()AutoModelForVision2Seqrename → fallback toAutoModelForImageTextToText_SanitizeLogitsProcessorto handle inf/nan logits edge casesMove
unbabel-cometto optional extras →pip install lm-polygraph[comet]unbabel-cometpinsnumpy<2.0which conflicts with vLLM (numpy>=2.0)try/except ImportErrorin__init__.pyevaluateimport incomet.pyInstall paths
pip install lm-polygraph[comet]pip install lm-polygraphpip install lm-polygraphthenpip install unbabel-comet --no-depsTest plan
from lm_polygraph.generation_metrics import CometreturnsNonewhen comet not installedpip install lm-polygraph[comet]installs comet correctly