-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ModernBERT #53
Comments
Hey. ModernBERT-based rerankers are already supported as long as your |
Hi @bclavie, class DisableCompileContextManager:
def __init__(self):
self._original_compile = torch.compile
def __enter__(self):
# Turn torch.compile into a no-op
torch.compile = lambda *args, **kwargs: lambda x: x # type: ignore
def __exit__(self, exc_type, exc_val, exc_tb):
torch.compile = self._original_compile
class TransformerRanker(BaseRanker):
def __init__(
self,
model_name_or_path: str,
dtype: Optional[Union[str, torch.dtype]] = None,
device: Optional[Union[str, torch.device]] = None,
batch_size: int = 16,
verbose: int = 1,
max_length: int = 0,
**kwargs,
):
self.verbose = verbose
self.device = get_device(device, verbose=self.verbose)
self.dtype = get_dtype(dtype, self.device, self.verbose)
self.max_length = max_length
model_kwargs = kwargs.get("model_kwargs", {})
with DisableCompileContextManager():
self.model = AutoModelForSequenceClassification.from_pretrained(
model_name_or_path,
torch_dtype=self.dtype,
**model_kwargs,
).to(self.device)
vprint(f"Loaded model {model_name_or_path}", self.verbose)
vprint(f"Using device {self.device}.", self.verbose)
vprint(f"Using dtype {self.dtype}.", self.verbose)
self.model.eval()
tokenizer_kwargs = kwargs.get("tokenizer_kwargs", {})
self.tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
**tokenizer_kwargs,
)
self.ranking_type = "pointwise"
self.batch_size = batch_size
@torch.inference_mode()
def rank(
self,
query: str,
docs: Union[str, List[str], Document, List[Document]],
doc_ids: Optional[Union[List[str], List[int]]] = None,
metadata: Optional[List[dict]] = None,
batch_size: Optional[int] = None,
) -> RankedResults:
docs = prep_docs(docs, doc_ids, metadata)
inputs = [(query, doc.text) for doc in docs]
# Override self.batch_size if explicitely set
if batch_size is None:
batch_size = self.batch_size
batched_inputs = [
inputs[i : i + batch_size] for i in range(0, len(inputs), batch_size)
]
scores = []
for batch in batched_inputs:
# tokenized_inputs = self.tokenize(batch)
with torch.no_grad():
if self.max_length:
tokenized_inputs = self.tokenizer(
batch,
max_length=self.max_length,
return_tensors="pt",
padding=True,
truncation=True,
).to(self.device)
else:
tokenized_inputs = self.tokenizer(
batch,
return_tensors="pt",
padding=True,
truncation=True,
).to(self.device)
batch_scores = self.model(**tokenized_inputs).logits.squeeze()
batch_scores = batch_scores.detach().cpu().numpy().tolist()
if isinstance(batch_scores, float): # Handling the case of single score
scores.append(batch_scores)
else:
scores.extend(batch_scores)
if len(scores) == 1:
return Result(document=docs[0], score=scores[0])
else:
ranked_results = [
Result(document=doc, score=score, rank=idx + 1)
for idx, (doc, score) in enumerate(
sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
)
]
return RankedResults(results=ranked_results, query=query, has_scores=True)
@torch.inference_mode()
def score(self, query: str, doc: str) -> float:
inputs = self.tokenize((query, doc)) # type: ignore
outputs = self.model(**inputs)
score = outputs.logits.squeeze().detach().cpu().numpy().astype(float)
return score |
Ooh this might be a broader ModernBERT issue rather than transformers itself here! Could you please share the exact error message? I'd be happy to look into what caused it. |
@bclavie, I will try to recreate the issue next week, will need to revert my code and get the error back. |
Sure! I'm curious what's causing this for you -- I suspect it might be something wrong with how we set up modernbert loading. I've been unable to reproduce the issue on either 4090 or Mac MPS 🤔 |
@bclavie , it was easy to get the error back: |
Thank you! Could you try with modifying the model init call to:
(the |
@bclavie. looks like it worked. My code did not get the exactions it got before. |
Hi @bclavie , |
Hi @bclavie , two more nits.
inputs = self.tokenize((query, doc)) # type: ignore I think it should be: inputs = self.tokenize([(query, doc)])
|
Hey, my bad for letting this thread go quiet, I had to go on a med leave for a bit! I'll be updating the library once I'm recovered to fix this issue:
As well as go with uncompiled modernbert by default. Thanks again! |
I moved to another computer and trying to run the code as per this discussion. follows the code I am using: class TransformerRanker(BaseRanker):
def __init__(
self,
model_name_or_path: str,
dtype: Optional[Union[str, torch.dtype]] = None,
device: Optional[Union[str, torch.device]] = None,
batch_size: int = 16,
verbose: int = 1,
**kwargs,
):
self.verbose = verbose
self.device = get_device(device, verbose=self.verbose)
self.dtype = get_dtype(dtype, self.device, self.verbose)
self.is_monobert = "monobert" in model_name_or_path.lower()
model_kwargs = kwargs.get("model_kwargs", {})
self.model = AutoModelForSequenceClassification.from_pretrained(
model_name_or_path,
torch_dtype=self.dtype,
reference_compile=False,
**model_kwargs,
).to(self.device)
vprint(f"Loaded model {model_name_or_path}", self.verbose)
vprint(f"Using device {self.device}.", self.verbose)
vprint(f"Using dtype {self.dtype}.", self.verbose)
self.model.eval()
tokenizer_kwargs = kwargs.get("tokenizer_kwargs", {})
self.tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
**tokenizer_kwargs,
)
self.ranking_type = "pointwise"
self.batch_size = batch_size
# added Tuple[str, str] as this is what you pass in the score function, inputs = self.tokenize((query, doc)) below
def tokenize(
self, inputs: Union[str, List[str], Tuple[str, str], List[Tuple[str, str]]]
) -> BatchEncoding:
return self.tokenizer(
inputs, return_tensors="pt", padding=True, truncation=True
).to(self.device)
@torch.inference_mode()
def rank(
self,
query: str,
docs: Union[str, List[str], Document, List[Document]],
doc_ids: Optional[Union[List[str], List[int]]] = None,
metadata: Optional[List[dict]] = None,
batch_size: Optional[int] = None,
) -> RankedResults:
docs = prep_docs(docs, doc_ids, metadata)
inputs = [(query, doc.text) for doc in docs]
# Override self.batch_size if explicitly set
if batch_size is None:
batch_size = self.batch_size
batched_inputs = [
inputs[i : i + batch_size] for i in range(0, len(inputs), batch_size)
]
scores: List[Union[float, List[float]]] = []
for batch in batched_inputs:
tokenized_inputs = self.tokenize(batch)
batch_scores = self.model(**tokenized_inputs).logits.squeeze()
if self.dtype != torch.float32:
batch_scores = batch_scores.float()
batch_scores = batch_scores.detach().cpu().numpy().tolist()
if isinstance(batch_scores, float): # Handling the case of single score
scores.append(batch_scores)
else:
scores.extend(batch_scores)
if self.is_monobert:
scores = [x[1] - x[0] for x in scores] # type: ignore
if len(scores) == 1: # TODO - this is different than the original code
# return Result(document=docs[0], score=scores[0])
return RankedResults(
results=[Result(document=docs[0], score=scores[0])],
query=query,
has_scores=True,
)
else:
ranked_results = [
Result(document=doc, score=score, rank=idx + 1)
for idx, (doc, score) in enumerate(
sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
)
]
return RankedResults(results=ranked_results, query=query, has_scores=True)
@torch.inference_mode()
def score(self, query: str, doc: str) -> float:
inputs = self.tokenize((query, doc))
outputs = self.model(**inputs)
score = outputs.logits.squeeze().detach().cpu().numpy().astype(float)
return score |
Hey, this seems to be another issue that is specifically related to ModernBERT's implementation in HF Transformers 🤔 Let me look into this further so I can see if I can find a more generalised solution for rerankers. |
adding support for using ModernBERT
The text was updated successfully, but these errors were encountered: