Releases: tracel-ai/burn
v0.21.0-pre.4
What's Changed
- Update cubek: tile matmul refactor (#4888) @louisfd
- Add ctc_loss backend trait hook + tch and cubecl impls (#4819) @antimora
- Centralize internal burn-* deps in [workspace.dependencies] (#4876) @antimora
- Update cubecl + cubek: fix matmul, reduce WASM and vector size check on strided tensors (#4874) @laggui
- Split Associated Types from Backend into BackendTypes (#4868) @skewballfox
- All reduce backward (#4873) @Charles23R
- Update/cubecl to client (#4866) @Charles23R
- Fix select_assign OOB units (#4870) @laggui
- Add linear op to ModuleOps for fused matmul+bias (#4747) @antimora
- Add burn-std::config runtime configuration with fusion logging and search optimization (#4864) @nathanielsimard
- Fix Typo in One Hot encoding class size error (#4869) @Baseng0815
- Fix fusion reduce broadcasted when multi block local might be a view (#4867) @laggui
- Add STFT/ISTFT and thread n through FFT backend trait (#4835) @antimora
- Fix burn-flex argmax NaN ordering; tighten expand; precise erf (#4859) @antimora
- Fix burn-flex sum_dim reading contiguous storage on transposed input (#4861) @antimora
- Fix rustls-webpki audit (#4863) @laggui
- Add det (determinant) tensor operation (#4813) @softmaximalist
- Add Blackman window function to signal module (#4842) @softmaximalist
- Display FlexDevice as Cpu (#4857) @antimora
- Update cubecl: refactor toml config, fix autotune priority and fix persistent memory pool reset (#4858) @nathanielsimard
- Migrate default test backend from NdArray to Flex (#4854) @antimora
- Use burn-flex in docs and examples (#4841) @antimora
- Fix burn-flex to_contiguous fast path for prefix views (#4856) @antimora
- Migrate benchmarks from burn-flex to burn-backend-tests (#4853) @antimora
- Fix autotune context, remove unsafe code (#4781) @ArthurBrussee
- Override
float_meanin cubecl backends (#4840) @laggui - Device service usage (#4839) @nathanielsimard
- Fusion all reduce + refactor collective (#4803) @Charles23R
- Add missing dispatch overrides and native tch ops for softmax, layer_norm (#4834) @antimora
- Fix
CrossEntropyLosswith probabilities (#4829) @laggui - Move tensor tests from burn-flex to burn-backend-tests (#4812) @antimora
- Remove unused M param from SimpleOptimizerMapper. (#4823) @crutcher
- Forward gemm perf features and fix burn-flex SIMD flag cascade (#4826) @antimora
- Add Record<(R0,)> 1-Tuple (#4825) @crutcher
- Cleanup OptimizerAdaptor / GradAdaptor API. (#4822) @crutcher
- Prep for Group Multi Optimizers (#4818) @crutcher
- Fix clippy lints (#4820) @laggui
- Matmul selection (#4773) @nathanielsimard
- Fix conv x-backward padding_out bug (#4806) @antimora
- burn-flex: implement softmax and layer_norm backend op (#4805) @antimora
- Add
FloatInfofor dtype-aware precision info (#4721) @antimora - Add softmax and layer_norm backend trait hooks (#4797) @antimora
- Update bitstream-io & rustls-webpki (yanked + audit) (#4801) @laggui
- feat(burn-nn): add native LocalResponseNorm module (#4765) @jcwal1516
- Fix: make module cloning efficient for CPU devices (#4703) @antimora
- burn-flex: enable f16 tests and fix mean overflow, grid_sample and quantization (#4769) @antimora
- Seed CubeCL normal distribution test (#4791) @leohenon
- Drop burn-flex I64 debug_asserts (#4780) @antimora
- fix(vision): propagate backend features to burn-vision (#4753) @jcwal1516
- Optimize and update LU decomposition function (#4738) @softmaximalist
- Fix burn-flex attention rejecting broadcasted mask/bias (#4777) @antimora
- Fix burn-flex bool binary ops to broadcast operands (#4775) @antimora
- Add burn-flex CPU backend (#4761) @antimora
- Fix flaky initializer_normal_init test (#4766) @leohenon
- Fix unsqueeze_dims panic on duplicate sorted axes (#4764) @antimora
- fix(ndarray): grouped conv SIMD clamp + regressions (#4727) @dnvt
- Fix xtask CI renamed feature (#4763) @laggui
- Fix/fusion autotune context (#4759) @nathanielsimard
Full Changelog: v0.21.0-pre.2...v0.21.0-pre.3
v0.21.0-pre.3
What's Changed
- Fix select_assign OOB (#4760) @nathanielsimard
- Fix
unsqueeze_dimspanic (#4755) @softmaximalist - Fix quantization tests and flaky tolerance (#4743) @laggui
- Fix fusion scalar broadcasting in
write_output_aligned(#4741) @laggui - Feat/implement fusion for irfft (#4736) @Sublime12
- Fix cubecl cuda all-reduce + remove useless check in distributed server (#4720) @Charles23R
- Feat/implement fusion for rfft (#4735) @Sublime12
- update cubek & fix gemv autotune (#4726) @louisfd
- Add more checks for quantized tensor reshape (#4704) @laggui
- feat: support cross-kind tensor casting via .cast() (#4713) @antimora
- chore: Fix some clippy errors, fix quant tests (#4708) @wingertge
- Update cubek (#4714) @louisfd
- Feat/add irfft (#4719) @Sublime12
- Feat/add rfft (#4707) @Sublime12
- Make Param Sync for parallel model inference (#4701) @antimora
- Perf/burn fusion overhead (#4645) @nathanielsimard
- Split
TrainingStrategyto decouple theDistributedBackendrequirement (#4710) @laggui - fix: use integer arithmetic for nearest-neighbor coordinate scaling (#4687) @wkrettek
- All reduce in backward (#4650) @Charles23R
- fix output in attention tuner (#4702) @louisfd
- Fix attention_fallback NaN for fully-masked rows (#4697) @antimora
- Add HammingWindow operator to burn-tensor (#4698) @RunjiaChen
- update cubek and cubecl (#4699) @louisfd
- Fix fusion consistency checks and binding estimation in burn-cubecl-fusion (#4695) @nathanielsimard
- Update cubek and fix vecmat autotune (#4682) @louisfd
- Ignore local tests with pre-trained weights (#4676) @laggui
- Fix dispatch when only wgpu is enabled (maps to webgpu) (#4678) @laggui
- update cubek (#4677) @louisfd
- Fix fusion kernel vector_size mismatch on f16 output writes (#4675) @AdrianEddy
- Include new vec2mat routine in matmul autotune (#4673) @louisfd
- Update cubecl & cubek revs (#4672) @laggui
- feat: add categorical sampling for tensors (#4655) @majiayu000
- chore: Update to upstream changes in cubecl (#4670) @wingertge
- Refactor backend tests to set device settings at initialization + use
Dispatch(#4666) @laggui - Add HannWindow operator to burn-tensor (#4631) @walkinggo
- fixup:(burn-ndarray) fix comment and tidy imports (#4668) @TsaoLun
- Fix tch int_zeros dtype in sync (#4664) @laggui
- [Breaking] Use device settings to provide output dtype (#4653) @laggui
- feat: add FID vision metric (#4644) @cong-or
- Add Adan optimizer implementation with tests (#4651) @sepcnt
- [Breaking] Add bool store dtype + remove bool elem from fusion (#4649) @laggui
- Selector/attention (#4648) @louisfd
- fix(burn-ndarray): use owned storage for native heap allocations in from_data (#4647) @TsaoLun
- add utilities fn to FusionServer (#4640) @Charles23R
- Remove int powf and make powi numeric op (#4646) @laggui
- refactor: View launch (#4639) @wingertge
- chore: Update to cubecl changes (#4630) @wingertge
- Dispatch autodiff checkpointing strategy support (#4629) @laggui
- Implement RNNT loss (#4623) @cong-or
- Remove named tensor (#4628) @laggui
- Perf: Improve fusion score (#4511) @nathanielsimard
- refactor: Vector size generic (#4624) @wingertge
- Fix function arg name inconsistencies (#4626) @softmaximalist
- Update building-blocks chapter (#4625) @softmaximalist
- Refactor/device handle (#4593) @nathanielsimard
- feat: Introduce Lanczos3 interpolation method (#4601) @ovr
- Add Gram Matrix Loss for vision tasks (#4595) @softmaximalist
- Fix fusion cumulative op inputs (#4621) @laggui
- fix: replace ValidStep with InferenceStep in training.md (#4620) @TsaoLun
- Update documentation link for burn-store (#4619) @softmaximalist
- Improve module derive + add
#[module(skip)]attribute (#4618) @laggui - Add HalfPrecisionAdapter for F32/F16 mixed-precision storage (#4594) @antimora
- Fix cosine scheduler record in composed scheduler (#4617) @laggui
- Update ONNX import docs for LoadStrategy and from_bytes (#4607) @antimora
- Use shape in
TensorData(#4603) @laggui - Update SSIM float types to f32 (#4602) @softmaximalist
- Fix
conv2d_weight_backwardw/ strided channels and unit spatial dims (viaconv_im2col_1x1) (#4591) @laggui - Add multi-scale SSIM for image quality assessment (#4555) @softmaximalist
- Remove
Clonebound fromWindowsDatasetitem (#4597) @laggui - Add contributing guidelines with AI-assisted contributions policy (#4569) @antimora
- feat: Implements DISTS metric (#4574) @koreaygj
- Fix dispatch autodiff feature propagation (#4592) @laggui
Full Changelog: v0.21.0-pre.2...v0.21.0-pre.3
v0.21.0-pre.2
What's Changed
- Fix: create multiple elemwise fused block by @nathanielsimard in #4497
- Upgrade to rand 0.10 by @laggui in #4500
- fix overflow in int_abs_elem for i64 min value by @Olexandr88 in #4486
- Implements: LPIPS matrics for Image quality by @koreaygj in #4403
- Fix quantization non-contiguous input by @laggui in #4498
- Add SequenceOutput struct for sequence prediction outputs by @softmaximalist in #4474
- feat: enhance attention() with scale, attn_bias, softcap, and is_causal by @antimora in #4476
- Fix too many kernels by @nathanielsimard in #4505
- feat: Enable 64-bit indexing for kernels by @wingertge in #4502
- feat: support padding on arbitrary dimensions by @antimora in #4507
- allow flash attention with causal by @louisfd in #4509
- Remove getrandom w/ wasm_js backend by @laggui in #4515
- Bump polars to 0.53.0 by @laggui in #4514
- perf: Make backing storage of
Shapemore flexible by @wingertge in #4516 - Combined PRs by @github-actions[bot] in #4528
- feat: add align_corners support to InterpolateOptions by @antimora in #4518
- fix: OptimSharded strategy validation device mismatch by @Dreaming-Codes in #4527
- Add native sign unary ops for CubeCL float and int by @yash27-lab in #4513
- Bump zip to 8.1.0 by @laggui in #4533
- Fix image-classification-web links by @laggui in #4536
- Fix zip yanked downstream dep by @laggui in #4540
- add LBFGS optimizer by @donjuanplatinum in #4471
- Replace Vec-based TransitionBuffer with tensor-backed storage by @arferreira in #4504
- Implement CTC loss by @softmaximalist in #4529
- refactor: Metadata type/strides refactor by @wingertge in #4534
- Attention: remove default impl and implement for all backends by @louisfd in #4544
- fix: resolve macOS build and test failures by @antimora in #4545
- fix: Bool from_data_dtype panics on GPU backends by @antimora in #4551
- Attention autotune by @louisfd in #4552
- Attention: add autotune gate by @louisfd in #4554
- Combined PRs by @github-actions[bot] in #4565
- Optional Ordering for NdArrayElement by @skewballfox in #4559
- Add Smooth L1 loss by @softmaximalist in #4547
- Implement HardShrink, SoftShrink and Shrink Activations by @aditya0by0 in #4556
- doc(notebook) : add more basic operations and some examples by @Tyooughtul in #4542
- Update cubecl/cubek revs by @laggui in #4568
- Fix(lpips): load ImageNet backbone weights for pretrained models by @koreaygj in #4557
- [Feat] Global backend
Dispatchby @laggui in #4508 - fix(burn-candle): move wildcard match arm to end of dtype match by @holg in #4571
- move sign back to mathOps by @skewballfox in #4573
- refactor: Move from
CubeOptiontoOptionby @wingertge in #4543 - update attention cubek autotune by @louisfd in #4579
- Add evaluator summary by @laggui in #4578
- Move
burn-nnmodule name checks inburn-storeadapter to the test section by @softmaximalist in #4580 - Expose
BurnpackErrorby @AdrianEddy in #4585 - Combined PRs by @github-actions[bot] in #4588
- Bump versions by @nathanielsimard in #4589
- Add burn-dispatch publish by @laggui in #4590
Full Changelog: v0.21.0-pre.1...v0.21.0-pre.2
v0.21.0-pre.1
What's Changed
- Bump burn version 0.21 by @laggui in #4333
- Use NodeType to point to unimplemented node by @laggui in #4334
- burn-train: include GPU power draw in CudaMetric by @StanByriukov02 in #4322
- Fix book guide training changes by @laggui in #4340
- Combined PRs by @github-actions[bot] in #4352
- ensure that tensor is owned on iter_dim call by @tzemanovic in #4309
- docs: add DataframeDataset example using Polars by @SameerVers3 in #4298
- Add evaluation name
as_str+ display by @laggui in #4354 - Fix memory growth: use GraphLocator::remove_entry for orphan cleanup by @jnamika in #4342
- Bump ratatui from 0.29.0 to 0.30.0 by @dependabot[bot] in #4305
- Performance tweaks to the lp_norm code. by @crutcher in #4318
- Add
Scalarruntime literal by @laggui in #4337 - Add compile errors for module derive by @laggui in #4356
- Make
ElementComparisonoptional for dtypes by @skewballfox in #4255 - fix: Actually implement conv backwards ops for
burn-fusion/burn-routerby @wingertge in #4360 - Update for cubecl
try_cast_unchecked->downcastrename by @adolago in #4335 - fix: Fix interpolate with NHWC input by @wingertge in #4363
- Move ONNX import to
burn-onnxcrate by @laggui in #4361 - Update cubek by @laggui in #4365
- Implement Mean(L(P) Norm Error)Loss by @softmaximalist in #4341
- Fix clippy rust 1.93 by @laggui in #4371
- Use
cache_dir()instead of hardcoded~/.cachepath by @antimora in #4372 - Combined PRs by @github-actions[bot] in #4386
- Fix typo in dataset.md in Burn Book by @softmaximalist in #4380
- chore: Enable macos CI by @dcvz in #4389
- add AMSgrad support for Adam/AdamW by @donjuanplatinum in #4388
- Implement the PSNR vision metric by @softmaximalist in #4379
- Update cubecl wgpu v28 by @laggui in #4244
- Bump tracel-ai/github-actions from 6 to 7 by @dependabot[bot] in #4394
- Bump tracel-ai/github-actions/.github/workflows/publish-crate.yml from 6 to 7 by @dependabot[bot] in #4395
- chore: enable metal backend tests on ci by @dcvz in #4390
- Feat/device policy by @laggui in #4373
- More explicit global dtype support by @laggui in #4400
- Move ONNX crates to burn-onnx repository by @antimora in #4393
- opt(burn-cubecl): Optimized tensors by default by @wingertge in #4402
- chore: fix typos caught by xtask by @huahuadeliaoliao in #4406
- Add field docs to generated methods by @swfsql in #4408
- Make transformer layer APIs public for cross-crate usage by @antimora in #4409
- Implement SSIM vision metric by @softmaximalist in #4396
- Combined PRs by @github-actions[bot] in #4425
- move sort functions to orderable trait by @skewballfox in #4419
- [BREAKING] Add asymmetric padding support for conv and pool operations by @antimora in #4263
- Update Burn Book: metrics and trig functions by @softmaximalist in #4413
- Add device dtype usage by @laggui in #4404
- add KLDivLoss and batch_mean in reduction by @donjuanplatinum in #4399
- feat(burn-store): add ModuleAdapter chaining by @huahuadeliaoliao in #4407
- Fix cubek matmul stage size by @laggui in #4435
- Bump tracel-ai/github-actions/.github/workflows/publish-crate.yml from 7 to 8 by @dependabot[bot] in #4443
- chore: deprecate burn-candle backend by @antimora in #4416
- Add configurable activation and layer_norm_eps to transformer layers by @antimora in #4410
- Add Softsign activation function by @antimora in #4437
- chore: update workflows by @syl20bnr in #4446
- Add ThresholdedRelu activation function by @antimora in #4440
- Combined PRs by @github-actions[bot] in #4453
- Add check for wasm-bindgen installation by @zhoukekestar in #4358
- Add BiGru (bidirectional GRU) module by @antimora in #4442
- Fix:
SupervisedTrainingshould use the model device by default by @laggui in #4456 - Add Elu activation function by @antimora in #4438
- chore: update workflows to use Tracel GitHub actions v9 by @syl20bnr in #4457
- Add CELU activation function by @antimora in #4441
- Add Selu activation function by @antimora in #4439
- Burn rl by @Charles23R in #4447
- Perf/fusion/reduce broadcasted by @nathanielsimard in #4338
- Implement median tensor operation by @softmaximalist in #4454
- Add deg2rad and rad2deg by @softmaximalist in #4462
- fix: use all dilation entries in
max_pool2d_with_indices_backwardby @fcasal in #4466 - Update zip + time by @laggui in #4468
- Implement basic RNN module by @aditya0by0 in #4460
- fix: default to single device strat when only 1 device by @Charles23R in #4463
- Combined PRs by @github-actions[bot] in #4485
- Add
module.train()to move a module back to the autodiff backend by @laggui in #3975 - chore: Update cubecl to runtime config refactor by @wingertge in #4489
- Feature flag + Tests for RL in burn-rl and burn-train by @Charles23R in #4470
- Fix reduce line size parallel and mean accumulator precision by @laggui in #4467
- Chore: Pre-Release 0.21.0-pre.1 by @nathanielsimard in #4494
- Fix pre-release by @nathanielsimard in #4495
v0.20.1
v0.20.0
Summary
This release marks a major turning point for the ecosystem with the introduction of CubeK. Our goal was to solve a classic challenge in deep learning: achieving peak performance on diverse hardware without maintaining fragmented codebases.
By unifying CPU and GPU kernels through CubeCL, we've managed to squeeze maximum efficiency out of everything from NVIDIA Blackwell GPUs to standard consumer CPUs.
Beyond performance, this release makes the library more robust, flexible, and significantly easier to debug.
This release also features a complete overhaul of the ONNX import system, providing broader support for a wide range of ONNX models. In addition, various bug fixes and new tensor operations enhance stability and usability.
For more details, check out the release post on our website.
Changelog
Breaking
We've introduced a couple of breaking API changes with this release. The affected interfaces are detailed in the sections below.
Training
We refactored burn-train to better support different abstractions and custom training strategies. As part of this,
the LearnerBuilder has been replaced by the LearningParadigm flow:
- let learner = LearnerBuilder::new(ARTIFACT_DIR)
+ let training = SupervisedTraining::new(ARTIFACT_DIR, dataloader_train, dataloader_valid)
.metrics((AccuracyMetric::new(), LossMetric::new()))
.num_epochs(config.num_epochs)
- .learning_strategy(burn::train::LearningStrategy::SingleDevice(device))
- .build(model, config.optimizer.init(), lr_scheduler.init().unwrap());
+ .summary();
- let result = learner.fit(dataloader_train, dataloader_valid);
+ let result = training.launch(Learner::new(
+ model,
+ config.optimizer.init(),
+ lr_scheduler.init().unwrap(),
+ ));Interface Changes
The scatter and select_assign operations now require an IndexingUpdateOp to specify the update behavior.
- let output = tensor.scatter(0, indices, values);
+ let output = tensor.scatter(0, indices, values, IndexingUpdateOp::Add);API calls for slice, slice_assign, and slice_fill no longer require const generics for dimensions, which cleans up the syntax quite a bit:
- let prev_slice = tensor.slice::<[Range<usize>; D]>(slices.try_into().unwrap());
+ let prev_slice = tensor.slice(slices.as_slice());The grid_sample_2d operation now supports different options.
To preserve the previous behavior, make sure to specify the matching options:
- let output = tensor.grid_sample_2d(grid, InterpolateMode::Bilinear);
+ let options = GridSampleOptions::new(InterpolateMode::Bilinear)
+ .with_padding_mode(GridSamplePaddingMode::Border)
+ .with_align_corners(true);
+ let output = tensor.grid_sample_2d(grid, options);The QuantStore variants used in QuantScheme have been updated to support a packing dimension.
pub enum QuantStore {
/// Native quantization doesn't require packing and unpacking.
Native,
+ /// Store packed quantized values in a natively supported packing format (i.e. e2m1x2).
+ PackedNative(usize),
/// Store packed quantized values in a 4-byte unsigned integer.
- U32,
+ PackedU32(usize),
}Finally, Shape no longer implements IntoIterator. If you need to iterate by-value over dimensions, access the dims field directly.
- for s in shape {
+ for s in shape.dims {Module & Tensor
- Generalize linalg::outer semantics; add linalg::outer_dim (#3923) @crutcher
- Use square() where appropriate. (#3900) @crutcher
- Add linalg matvec (#3967) @huy209vn
- Add GaussianNoise layer (#4022) @kul-sudo
- Make TransformerEncoderLayer fields public (#4053) @Mnwa
- Workaround MPS embedding allocation error in LibTorch (#4073) @antimora
- Fix Slice operation to handle empty ranges (#4083) @antimora
- Handle empty tensors in cat and slice_assign ops (#4095) @antimora
- [Breaking] Add
IndexingUpdateOptoscatterandselect_assign(#4070) @laggui - Add CrossAttention module to burn-nn (#4101) @huy209vn
- Add reflect and edge padding modes to tensor.pad (#4105 #) @antimora
- Fix GLU and quiet softmax activations (#4121) @laggui
- Add ceil_mode support to pooling operations (MaxPool, AvgPool) (#4112) @antimora
- [Breaking] Remove D2 const generic from slice / SliceArg (#4127) @crutcher
- Add backend supports_dtype (#4155) @laggui
- Fix repeat 0 times (#4216) @laggui
- feat: add hardswish activation (#4209) @mertalev
- Add more trig ops (#4282) @laggui
- Add empty/zeros/ones/full
TensorCreationOptions(#4285) @laggui - feat: nms op (#4246) @mertalev
Datasets & Training
- Refactor metric loggers(#3895 #4017) @Charles23R
- Add support for custom learning strategy (#3921) @Charles23R
- Feat/optim/distributed (#4018) @nathanielsimard
- Refactor MetricEntry (#4031) @Charles23R
- Feature muon (#3925) @NewBornRustacean
- Add warmup epochs to
MetricEarlyStoppingStrategy(#4041) @crutcher - Log running values (#4199) @Charles23R
- Fix checkpoint and summary log level (#4201) @J-F-Liu
- [Breaking] Burn train api refactor (#4223 #4283) @Charles23R
- Fix checkpointer interrupt (#4268) @Charles23R
Backends
- Add candle device seeding (#3959) @laggui
- feat: Enable tuning for MMA matmul (#3961) @wingertge
- feat: TMA autotuning (#3986) @wingertge
- feat: Enable tuning specialized matmul (#4026) @wingertge
- Add CubeCL Flash Attention module (#4089 #4192) @louisfd
- Zero-copy tensor loading for NdArray backend (#4178) @antimora
- feat: Implicit GEMM weight gradients for convolution (#4182) @wingertge
- Perf/reduce cpu + Fix OOB (#4197 #4204) @nathanielsimard
- feat: Accelerated convolution data gradient (#4220) @wingertge
- Remove linux-only constraint for cpu (#4233) @louisfd
- Perf/into contiguous (#4257) @nathanielsimard
- fix: grid sample using excessive memory (#4236 #4242) @mertalev
- Add fast-path for batched vector–matrix matmul (#4300) @louisfd
Bug Fixes
- Fix async barrier & TMA checks (#4007) @nathanielsimard
- Fix fusion reduce local already registered as output (#4014) @laggui
- Fix remainder int (#4015) @laggui
- Fix cuda mem error (#4020) @nathanielsimard
- Cleanup autodiff unused roots (#4039) @laggui
- Fix autotuner (#4049) @nathanielsimard
- Fix scatter values backward (#4064) @khoek
- More correctness fixes in autodiff ops (#4069) @khoek
- Fix transaction read (#4074) @laggui
- Fix tch bf16 kind (#4088 #4142 #4203) @laggui
- Fix cubecl cuda compilation error/typo (#4092) @BjornTheProgrammer
- Fix output dtype for argmin / argmax (#4195) @tzemanovic
- Return slice for each dimension in shape (#4152) @laggui
Documentation & Examples
- Update raspberry pi pico example (#4034 #4132) @BjornTheProgrammer
- Contributor Book: Update the "ONNX to Burn" Page (#4229) @softmaximalist
- docs: add examples for bool tensor operations (#4248) @qburke
- Update the "Adding New Operation" guide in the contributor book (#4284) @softmaximalist
- Refactor dop_timer for multiple trials (for warmup). (#4288) @crutcher
- Added documentation examples for more boolean tensor operations in burn-tensor (#4289) @qburke
Fixes
- Fix book (#3942) @laggui
- remove repetitive words in comment (#4029) @black5box
- Include katex header as symlink (#4118) @laggui
- Fix quantization docs (make it clear that only PTQ is currently supported) (#4316) @laggui
ONNX Support
- ONNX IR and import refactor to better support complex graphs (#3872 #4019 #4033 #4094) @antimora
- Add ONNX control flow operators:
If,Loop, andScan(#3936) @antimora - Silero VAD ONNX model verification (#3999) @antimora
- Add support for yolo12x model variant (#4048) @antimora
- Remove burn-import abstraction layer and use onnx-ir types directly (#4033) @antimora
- Fix ConstantOfShape output size determination (#4085) @antimora
- Specify output rank in squeeze_dims for type inference (#4086) @antimora
- Fix Expand operation to use ONNX max-semantics (#4082) @antimora
- [Breaking] Add ONNX GridSample op support and tests (#4084) @antimora
- Add RF-DETR model check for burn-import (#4087) @antimora
- Add LSTM operator support with configurable activations (#4106) @antimora
- Add memory-mapped ONNX loading with tensor data ref (#4097) @antimora
- Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) (#4119) @antimora
- Add Reshape scalar optimization and Gather scalar input support (#4146) @antimora
- Update GELU ONNX test to use native op and fix expected values (#4161) @antimora
- Add ONNX CumSum operator support (#4162) @antimora
- Remove global ONNX opset version restriction, recommend opset 16 (#4168) @antimora
- Handle 1D slope when importing prelu from onnx (#4205) @mertalev
- Fix handling scalar scan outputs in ONNX loop nodes (#4210) @antimora
- Add ONNX external data support for models >2GB (#4158) @antimora
- fix: handle negative indices in onnx gather op (#4207) @mertalev
- Split backend tensor ops tests (#4232) @laggui
- Do not use alloc import in burn-import codegen (#4286) @laggui
- Fix ONNX where broadcasted dims (#4315) @laggui
Enhancements
- Feat/pinned memory staging (#4016) @nathanielsimard
- burn-store enhancements for troubleshooting and new enum skip flag (#4051) @antimora
- Feat/runtime error (#4079 #4110) @nathanielsimard
- Perf/improve reduce autotuning + plane non uniform control flow check (#4208) @nathanielsimard
- Packed quantized matmul with
QuantStorechanges (#4310 #4323) @wingertge
Refactoring
- chore: Update to batch caching PR for
cubecl(#3948) @wingertge - Refactor IR to define outputs as a function of the operation (#3877) ...
v0.20.0-pre.6
What's Changed
- doc warning fix by @crutcher in #4130
- Fix tch bf16 into_data by @laggui in #4142
- Update raspberry-pi-pico example to use the Pico 2, and burnpack by @BjornTheProgrammer in #4132
- Unify all_reduce
LocalCollectiveClientoperation handling. by @crutcher in #4125 - Add direct tensor snapshot retrieval API to ModuleStore by @antimora in #4131
- Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) by @antimora in #4119
- Add removed docs for tensor equal_elem by @laggui in #4145
- Add ceil_mode support to pooling operations (MaxPool, AvgPool) by @antimora in #4112
- chore: Update cubecl by @wingertge in #4134
- Implement Slice iterator and utility methods. by @crutcher in #4042
- Bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #4148
- Add slice_dyn, slice_assign_dyn, and slice_fill_dyn variants. by @crutcher in #4127
- Add Reshape scalar optimization and Gather scalar input support by @antimora in #4146
- Shape FromStr/ToString by @crutcher in #4143
- Add contiguous reindexing for non-contiguous layer indices by @antimora in #4150
- Add warmup epochs to
MetricEarlyStoppingStrategy. (#3970) by @crutcher in #4041 - fix(onnx): Use activation function for GELU codegen instead of non-existent tensor method by @antimora in #4161
- Refactor more basic ops by @laggui in #4156
- Refactor
LocalCollectiveServerfor improved clarity and error handling by @crutcher in #4126 - Fix typo in comment for logger_task function by @crutcher in #4159
- Refactor configurable backend tests (no more testgen macros) by @laggui in #4129
- Zero-copy loading for embedded burnpack weights by @antimora in #4154
- Fix candle cuda imports by @laggui in #4171
- Backends no longer depend on
burn-tensor, but strictlyburn-backendby @laggui in #4169 - Chore/update cubek cubecl by @nathanielsimard in #4172
- Add ONNX CumSum operator support by @antimora in #4162
- Add backend supports_dtype by @laggui in #4155
- Fix attention shapes and out rank by @laggui in #4192
- Fix matmul & reduce execute fuse no autotune by @laggui in #4193
- Fix output dtype for argmin / argmax by @laggui in #4195
- Add
flatten_dimsmethod toShapeand refactor tensor flattening API by @crutcher in #4189 - Return slice for each dimension in shape by @laggui in #4152
- Make xtask validate run no-std checks first. by @crutcher in #4198
- Fix: CubeCL Reduce by @nathanielsimard in #4197
- Reorganize and tracing::instrument collective operations. by @crutcher in #4157
- Log running values by @Charles23R in #4199
- Remove global ONNX opset version restriction, recommend opset 16 by @antimora in #4168
- Fix dtype preservation when loading tensors in burn-store by @antimora in #4194
- Fix TchTensor::from_data bf16 by @laggui in #4203
- Perf/reduce cpu + Fix OOB by @nathanielsimard in #4204
- feat: Implicit GEMM weight gradients for convolution by @wingertge in #4182
- Fix checkpoint and summary log level by @J-F-Liu in #4201
- fix: handle 1D slope when importing prelu from onnx by @mertalev in #4205
- Zero-copy tensor loading for NdArray backend by @antimora in #4178
- Fix quantized tensor storage data length calculation by @antimora in #4180
- Fix handling scalar scan outputs in ONNX loop nodes by @antimora in #4210
- Perf/improve reduce autotuning + plane non uniform control flow check by @nathanielsimard in #4208
- Add ONNX external data support for models >2GB by @antimora in #4158
- Update/cubek by @louisfd in #4214
- Refactor: Replace
canonicalize_dimwithexpect_dimby @crutcher in #4196 - fix: handle negative indices in onnx gather op by @mertalev in #4207
- Refactor/cube dim by @nathanielsimard in #4217
- Refactor: Consolidate shape and slice error handling into
ExpressionErrorby @crutcher in #4218 - Update: CubeK by @louisfd in #4222
- feat: Accelerated convolution data gradient by @wingertge in #4220
- Fix repeat 0 times by @laggui in #4216
- Burn train api refactor by @Charles23R in #4223
- Chore/pre release 6 by @nathanielsimard in #4224
v0.20.0-pre.5
What's Changed
- Bump version by @nathanielsimard in #4102
- Handle empty tensors in cat and slice_assign ops by @antimora in #4095
- Add network utilities to
burn-stdby @laggui in #4104 - Remove RefCell from onnx-ir Arguments by @antimora in #4094
- Fix raspberry pi pico example not compiling by @BjornTheProgrammer in #4034
- Flash Attention module by @louisfd in #4089
- [Breaking] Add
IndexingUpdateOptoscatterandselect_assignby @laggui in #4070 - Feat/improve errors by @nathanielsimard in #4110
- Add 256-byte tensor alignment to burnpack format for mmap zero-copy support by @antimora in #4100
- Add CrossAttention module to burn-nn by @huy209vn in #4101
- Add reflect and edge padding modes to tensor.pad by @antimora in #4105
- Add LSTM operator support with configurable activations by @antimora in #4106
- Add memory-mapped ONNX loading with lazy tensor data by @antimora in #4097
- Refactor
RemoteDeviceto use a thread-safe global address registry. by @crutcher in #4113 - Partial cleanup of RemoteSender api. by @crutcher in #4108
- Move backend traits and types to
burn-backendby @laggui in #4111 - Fix remote sync error by @laggui in #4117
- Small LSTM clean up of unused variable by @antimora in #4116
- Fix/autotune checks by @nathanielsimard in #4114
- Include katex header as symlink by @laggui in #4118
- chore: Update cubecl by @wingertge in #4120
- Fix GLU and quiet softmax activations by @laggui in #4121
- Migrate ONNX import to burnpack format (removing Record type) by @antimora in #4122
- Combined PRs by @github-actions[bot] in #4140
- Chore/pre release 5 by @nathanielsimard in #4141
v0.20.0-pre.4
What's Changed
- Make TransformerEncoderLayer fields public by @Mnwa in #4053
- Feature muon by @NewBornRustacean in #3925
- Implement
FromStrforSlicewith parsing and error handling by @crutcher in #3983 - chore: Update to cubecl scalar refactor by @wingertge in #4062
- refactor: cubecl Runtime trait by @wingertge in #4065
- Fix scatter values backward by @khoek in #4064
- Refactor/autotuner by @nathanielsimard in #4068
- Fix MPS "Placeholder storage has not been allocated" error for embedding operations by @antimora in #4073
- Remove burn-import abstraction layer and use onnx-ir types directly by @antimora in #4033
- More correctness fixes in autodiff ops by @khoek in #4069
- Fix transaction read by @laggui in #4074
- Feat/error handling cubecl by @nathanielsimard in #4076
- Move types from
burn-tensorby @laggui in #4050 - burn-store enhancements for troubleshooting and new enum skip flag by @antimora in #4051
- Re-enabled no-std support for safetensors store by @antimora in #4071
- Fix tch bf16 kind by @laggui in #4088
- Feat/runtime error by @nathanielsimard in #4079
- Fix ConstantOfShape output size determination by @antimora in #4085
- Fix reduce codegen to use turbofish for squeeze_dims by @antimora in #4086
- Fix Expand operation to use ONNX max-semantics by @antimora in #4082
- Add ONNX GridSample op support and tests by @antimora in #4084
- Fix Slice operation to handle empty ranges by @antimora in #4083
- Add RF-DETR model check for burn-import by @antimora in #4087
- Fix cubecl by @BjornTheProgrammer in #4092
v0.20.0-pre.3
What's Changed
- Node to Enum-based design for type-safe IR by @antimora in #4019
- Ignore number_prefix advisory from tokenizers by @laggui in #4037
- BUG: Fixed burn version by @Marc-AnthonyG in #4035
- Refactor/dtype cubecl by @nathanielsimard in #4032
- Fix parallel spelling error. by @crutcher in #4046
- Refactor MetricEntry by @Charles23R in #4031
- Bump actions/checkout from 5 to 6 by @dependabot[bot] in #4047
- Refactor of burn fusion and burn cubecl fusion by @nathanielsimard in #4044
- update cubecl by @louisfd in #4045
- Cleanup autodiff unused roots by @laggui in #4039
- Fix autotuner by @nathanielsimard in #4049
- Combined PRs by @github-actions[bot] in #4059
- Fix floating point norm test tolerance by @laggui in #4061
- Add support for yolo12x model variant check by @antimora in #4048
- Chore: Prepare pre-release 3 by @nathanielsimard in #4060