feat: validate transforms on compile #151

besaleli · 2025-12-03T12:36:49Z

Adds compile-time checks for transforms during encoderfile build stage
DRY mean pooling code a little

Scope creep:

remove old config docs generation
new makefile commands

codecov-commenter · 2025-12-03T12:40:40Z

Codecov Report

❌ Patch coverage is 85.04578% with 147 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
encoderfile-core/src/cli.rs	0.00%	26 Missing ⚠️
...e-core/src/transforms/engine/sentence_embedding.rs	87.27%	14 Missing ⚠️
encoderfile/src/transforms/validation/mod.rs	88.07%	13 Missing ⚠️
encoderfile/src/transforms/validation/embedding.rs	83.33%	12 Missing ⚠️
...le/src/transforms/validation/sentence_embedding.rs	84.21%	12 Missing ⚠️
encoderfile-core/src/transforms/engine/mod.rs	88.42%	11 Missing ⚠️
...c/transforms/validation/sequence_classification.rs	86.11%	10 Missing ⚠️
.../src/transforms/validation/token_classification.rs	86.11%	10 Missing ⚠️
...e/src/transforms/engine/sequence_classification.rs	89.61%	8 Missing ⚠️
...core/src/transforms/engine/token_classification.rs	89.61%	8 Missing ⚠️
... and 6 more

Files with missing lines	Coverage Δ
encoderfile-core/src/dev_utils/mod.rs	`100.00% <100.00%> (ø)`
encoderfile-core/src/inference/embedding.rs	`100.00% <100.00%> (ø)`
...coderfile-core/src/inference/sentence_embedding.rs	`100.00% <100.00%> (ø)`
...file-core/src/inference/sequence_classification.rs	`100.00% <100.00%> (ø)`
...derfile-core/src/inference/token_classification.rs	`96.55% <100.00%> (ø)`
encoderfile-core/src/runtime/config.rs	`0.00% <ø> (-12.00%)`	⬇️
encoderfile-core/src/runtime/tokenizer.rs	`73.68% <ø> (ø)`
encoderfile-core/src/transforms/tensor/mod.rs	`98.06% <100.00%> (+0.02%)`	⬆️
encoderfile-core/src/runtime/state.rs	`80.00% <75.00%> (-20.00%)`	⬇️
encoderfile/src/cli.rs	`0.00% <0.00%> (ø)`
... and 14 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-12-03T12:42:08Z

CodSpeed Performance Report

Merging #151 will not alter performance

_{Comparing 122-validate-transforms-on-compile (1cb9583) with main (ef6c587)}

Summary

✅ 20 untouched
⏩ 20 skipped¹

20 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

javiermtorres · 2025-12-05T09:02:20Z

encoderfile-core/src/common/model_config.rs

+pub struct ModelConfig {
+    pub model_type: String,
+    pub pad_token_id: u32,
+    pub num_labels: Option<usize>,


I suspect this is done for classification? Does it make sense to group class-related items under a classification key?

@javiermtorres this is taken from the ModelConfig schema from huggingface. unfortunately can't change it :')

javiermtorres

Stronger types and dry runs are both great ideas 👍 Some minor remarks, but totally mergeable.

Did you take a look at https://luau.org/sandbox? Would it be worth it?

javiermtorres · 2025-12-05T09:03:53Z

encoderfile-core/src/inference/embedding.rs

        .into_owned();

-    outputs = state.transform().postprocess(outputs)?;
+    outputs = EmbeddingTransform::new(state.transform_str())?.postprocess(outputs)?;


This would allow changing the transform at runtime. Do we aim to do that at some point?

This was basically already the case. We can improve though, I think this is a good flag

encoderfile-core/src/transforms/engine/mod.rs

javiermtorres · 2025-12-05T09:18:15Z

encoderfile-core/src/transforms/engine/sentence_embedding.rs

+    fn postprocess(&self, (data, mask): Self::Input) -> Result<Self::Output, ApiError> {
+        let func = match &self.postprocessor {
+            Some(p) => p,
+            None => {


I'd even hope of making something like None => default_pool, but let me know if this would not be totally fitting.

javiermtorres · 2025-12-05T10:18:03Z

encoderfile/src/transforms/validation/embedding.rs

+impl TransformValidatorExt for EmbeddingTransform {
+    fn dry_run(&self, _model_config: &ModelConfig) -> Result<()> {
+        // create dummy hidden states with shape [batch_size, seq_len, hidden_dim]
+        let dummy_hidden_states = random_tensor(&[BATCH_SIZE, SEQ_LEN, HIDDEN_DIM], (-1.0, 1.0))?;


Would it still be valid testing if we used zero'd vectors?
Also, what about calling these TEST_BATCH_SIZE and so on?

@javiermtorres with all zeros? we could, but we might get some issues when unit testing things like softmax

besaleli added 19 commits December 3, 2025 13:38

add encoderfile core dep

08fcf86

add transform validator

6a51af0

update

3e94228

update

9410066

update

3b44179

update

07cfc94

update

338a105

update

1e3ea10

update

38601c9

seq cls validation

9109959

fmt

ab5cfbf

add token classification validation

557ce5f

add sentence embedding validation

0bde95e

add sentence embedding validation

72dc2b4

update configs

e0f818f

update docs

ae483cc

update makefile

cb6bb36

update makefile

834dd7e

update

e157b8e

besaleli linked an issue Dec 3, 2025 that may be closed by this pull request

Validate transforms on compile #122

Closed

clippy

32746ca

besaleli added 7 commits December 4, 2025 21:05

update

9769eb3

simplify

9447673

simplify

5c34e3c

simplify

7d19be0

simplify

2128168

remove old code

031257e

update

5b16b52

besaleli added 20 commits December 5, 2025 02:57

fmt

e5b2cae

update Makefile

0710e1d

update Makefile

c917425

add tests

0f5255d

fmt

82dcc7d

more tests

f3eb55e

fmt

7e2f3c2

update

aaa42dc

add bad dimensionality tests

f3bd438

add bad dimensionality tests

b06639f

update

c6bb0d9

update

f00d16c

update

100b001

update

bb08b82

update

7d5c00c

add tests

7d79685

more tests

b323fb9

update

33017d1

dry

f9e0f78

dry

922f7a9

besaleli marked this pull request as ready for review December 5, 2025 07:27

javiermtorres reviewed Dec 5, 2025

View reviewed changes

javiermtorres approved these changes Dec 5, 2025

View reviewed changes

besaleli added 3 commits December 5, 2025 15:38

add entrypoint

faf1fab

clippy

fc0e997

fix feature gate

1cb9583

besaleli merged commit dedca60 into main Dec 5, 2025
5 checks passed

besaleli deleted the 122-validate-transforms-on-compile branch December 5, 2025 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: validate transforms on compile #151

feat: validate transforms on compile #151

Uh oh!

besaleli commented Dec 3, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Dec 3, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

javiermtorres Dec 5, 2025

Uh oh!

besaleli Dec 5, 2025

Uh oh!

javiermtorres left a comment

Uh oh!

javiermtorres Dec 5, 2025

Uh oh!

besaleli Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

javiermtorres Dec 5, 2025

Uh oh!

javiermtorres Dec 5, 2025

Uh oh!

besaleli Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: validate transforms on compile #151

feat: validate transforms on compile #151

Uh oh!

Conversation

besaleli commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #151 will not alter performance

Summary

Footnotes

Uh oh!

javiermtorres Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

besaleli Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

javiermtorres left a comment

Choose a reason for hiding this comment

Uh oh!

javiermtorres Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

besaleli Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

javiermtorres Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

javiermtorres Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

besaleli Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

besaleli commented Dec 3, 2025 •

edited

Loading

codecov-commenter commented Dec 3, 2025 •

edited

Loading

codspeed-hq bot commented Dec 3, 2025 •

edited

Loading

besaleli Dec 5, 2025 •

edited

Loading