Skip to content

Conversation

@besaleli
Copy link
Member

@besaleli besaleli commented Dec 3, 2025

  • Adds compile-time checks for transforms during encoderfile build stage
  • DRY mean pooling code a little

Scope creep:

  • remove old config docs generation
  • new makefile commands

@besaleli besaleli linked an issue Dec 3, 2025 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 85.04578% with 147 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
encoderfile-core/src/cli.rs 0.00% 26 Missing ⚠️
...e-core/src/transforms/engine/sentence_embedding.rs 87.27% 14 Missing ⚠️
encoderfile/src/transforms/validation/mod.rs 88.07% 13 Missing ⚠️
encoderfile/src/transforms/validation/embedding.rs 83.33% 12 Missing ⚠️
...le/src/transforms/validation/sentence_embedding.rs 84.21% 12 Missing ⚠️
encoderfile-core/src/transforms/engine/mod.rs 88.42% 11 Missing ⚠️
...c/transforms/validation/sequence_classification.rs 86.11% 10 Missing ⚠️
.../src/transforms/validation/token_classification.rs 86.11% 10 Missing ⚠️
...e/src/transforms/engine/sequence_classification.rs 89.61% 8 Missing ⚠️
...core/src/transforms/engine/token_classification.rs 89.61% 8 Missing ⚠️
... and 6 more
Files with missing lines Coverage Δ
encoderfile-core/src/dev_utils/mod.rs 100.00% <100.00%> (ø)
encoderfile-core/src/inference/embedding.rs 100.00% <100.00%> (ø)
...coderfile-core/src/inference/sentence_embedding.rs 100.00% <100.00%> (ø)
...file-core/src/inference/sequence_classification.rs 100.00% <100.00%> (ø)
...derfile-core/src/inference/token_classification.rs 96.55% <100.00%> (ø)
encoderfile-core/src/runtime/config.rs 0.00% <ø> (-12.00%) ⬇️
encoderfile-core/src/runtime/tokenizer.rs 73.68% <ø> (ø)
encoderfile-core/src/transforms/tensor/mod.rs 98.06% <100.00%> (+0.02%) ⬆️
encoderfile-core/src/runtime/state.rs 80.00% <75.00%> (-20.00%) ⬇️
encoderfile/src/cli.rs 0.00% <0.00%> (ø)
... and 14 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 3, 2025

CodSpeed Performance Report

Merging #151 will not alter performance

Comparing 122-validate-transforms-on-compile (1cb9583) with main (ef6c587)

Summary

✅ 20 untouched
⏩ 20 skipped1

Footnotes

  1. 20 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@besaleli besaleli marked this pull request as ready for review December 5, 2025 07:27
pub struct ModelConfig {
pub model_type: String,
pub pad_token_id: u32,
pub num_labels: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is done for classification? Does it make sense to group class-related items under a classification key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiermtorres this is taken from the ModelConfig schema from huggingface. unfortunately can't change it :')

Copy link
Contributor

@javiermtorres javiermtorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stronger types and dry runs are both great ideas 👍 Some minor remarks, but totally mergeable.

Did you take a look at https://luau.org/sandbox? Would it be worth it?

.into_owned();

outputs = state.transform().postprocess(outputs)?;
outputs = EmbeddingTransform::new(state.transform_str())?.postprocess(outputs)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow changing the transform at runtime. Do we aim to do that at some point?

Copy link
Member Author

@besaleli besaleli Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was basically already the case. We can improve though, I think this is a good flag

fn postprocess(&self, (data, mask): Self::Input) -> Result<Self::Output, ApiError> {
let func = match &self.postprocessor {
Some(p) => p,
None => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even hope of making something like None => default_pool, but let me know if this would not be totally fitting.

impl TransformValidatorExt for EmbeddingTransform {
fn dry_run(&self, _model_config: &ModelConfig) -> Result<()> {
// create dummy hidden states with shape [batch_size, seq_len, hidden_dim]
let dummy_hidden_states = random_tensor(&[BATCH_SIZE, SEQ_LEN, HIDDEN_DIM], (-1.0, 1.0))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it still be valid testing if we used zero'd vectors?
Also, what about calling these TEST_BATCH_SIZE and so on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiermtorres with all zeros? we could, but we might get some issues when unit testing things like softmax

@besaleli besaleli merged commit dedca60 into main Dec 5, 2025
5 checks passed
@besaleli besaleli deleted the 122-validate-transforms-on-compile branch December 5, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate transforms on compile

4 participants