Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

sondalex · 2025-01-16T19:06:25Z

masked_fill function of distilbert model implementation has currently unintuitive logic

candle/candle-transformers/src/models/distilbert.rs

Lines 13 to 18 in efd0e68

    
           fn masked_fill(on_false: &Tensor, mask: &Tensor, on_true: f32) -> Result<Tensor> { 
        
               let shape = mask.shape(); 
        
               let on_true = Tensor::new(on_true, on_false.device())?.broadcast_as(shape.dims())?; 
        
               let m = mask.where_cond(&on_true, on_false)?; 
        
               Ok(m) 
        
           }

In the current setup, the user must invert the attention mask obtained from the tokenizer before passing it to the model.forward function. This requirement can be confusing as it differs from transformers implementation.

...
let text: Vec<&str>  = vec![...];
let encoded = tokenizer.encode_batch(text.to_vec().clone(), true)?;
let input_ids = encoded.iter().map(|v| v.get_ids().to_vec()).collect::<Vec<_>>();
let input_ids = Tensor::new(input_ids, &device)?;
let attention_mask = encoded.iter().map(|encoding| encoding.get_attention_mask().to_vec()).collect::<Vec<_>>();
let attention_mask = Tensor::new(attention_mask, &device)?;

let (batch_size, feature_size) = input_ids.dims2()?;

// Invert the attention mask for correct behavior --> Counterintuitive
let attention_mask = attention_mask.eq(0 as u32)?.reshape((batch_size, 1, 1, feature_size))?;

let output = model.forward(&input_ids, &attention_mask)?;
...

Proposition:

Replace masked_fill function with:

fn masked_fill(on_true: &Tensor, mask: &Tensor, on_false: f32) -> Result<Tensor> {
    let shape = mask.shape();
    let on_false = Tensor::new(on_false, on_true.device())?.broadcast_as(shape.dims())?;
    let m = mask.where_cond(&on_true, &on_false)?;
    Ok(m)
}

The text was updated successfully, but these errors were encountered:

sondalex mentioned this issue Jan 16, 2025

Shape of attention mask in distilbert example #2667

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

sondalex commented Jan 16, 2025 •

edited

Loading

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Comments

sondalex commented Jan 16, 2025 • edited Loading

sondalex commented Jan 16, 2025 •

edited

Loading