Releases · pykeio/ort

07 Jan 21:40

decahedron1

v2.0.0-rc.11

a873610

v2.0.0-rc.11 Latest

Latest

💖 If you find `ort` useful, please consider sponsoring us on Open Collective 💖

🤔 Need help upgrading? Ask questions in GitHub Discussions or in the pyke.io Discord server!

I'm sorry it took so long to get to this point, but the next big release of ort should be, finally, 2.0.0 🎉. I know I said that about one of the old alpha releases (if you can even remember those), but I mean it this time! Also, I would really like to not have to do another major release right after, so if you have any concerns about any APIs, please speak now or forever hold your peace!

A huge thank you to all the individuals who have contributed to the Collective over the years: Marius, Urban Pistek, Phu Tran, Haagen, Yunho Cho, Laco Skokan, Noah, Matouš Kučera, mush42, Thomas, Bartek, Kevin Lacker, & Okabintaro. You guys have made these past rc releases possible.

If you are a business using ort, please consider sponsoring me. Egress bandwidth from pyke.io has quadrupled in the last 4 months, and 90% of that comes from just a handful of businesses. I'm lucky enough that I don't have to pay for egress right now, but I don't expect that arrangement to last forever. pyke & ort have been funded entirely from my own personal savings for years, and (as I'm sure you're well aware 😂) everything is getting more expensive, so that definitely isn't sustainable.

Seeing companies that raise tens of millions in funding build large parts of their business on ort, ask for support, and then not give anything back just... seems kind of unfair, no?

`ort-web`

ort-web allows you to use the fully-featured ONNX Runtime on the Web! This time, it's hack-free and thus here to stay (it won't be removed, and then added back, and then removed again like last time!)

See the crate docs for info on how to port your application to ort-web; there is a little bit of work involved. For a very barebones sample application, see ort-web-sample.

Documentation for ort-web, like the rest of ort, will improve by the time 2.0.0 comes around. If you ever have any questions, you can always reach out via GitHub Discussions or Discord!

Features

5d85209 Add WebNN & WASM execution providers for ort-web.
#430 (💖 @jhonboy121) Support statically linking to iOS frameworks.
#433 (💖 @rMazeiks) Implement more traits for GraphOptimizationLevel.
6727c98 Make PrepackedWeights Send + Sync.
15bd15c Make the TLS backend configurable with new tls-* Cargo features.
f3cd995 Allow overriding the cache dir with the ORT_CACHE_DIR environment variable.
🚨 8b3a1ed Load the dylib immediately when using ort::init_from.
- You can now detect errors from dylib loading and let your program react accordingly.
🚨 #484 (💖 @michael-p) Update ndarray to v0.17.
- This means you'll need to upgrade your ndarray dependency to v0.17, too.
0084d08 New ort::lifetime tracing target tracks when objects are allocated/freed to aid in debugging leaks.

Fixes

2ee17aa Fix a memory leak in IoBinding.
317be20 Don't store Environment as a static.
- This fixes a mutex lock failed: Invalid argument crash on macOS when exiting the process.
466025c Fix unexpected CPU usage when copying GPU tensors.
ecca246 Fix UB when extracting empty tensors.
22f71ba Gate the ArrayExtensions trait behind the std feature, fixing #![no_std] builds.
af63cea Fix an illegal memory access on no_std builds.
#444 (💖 @pembem22) Fix Android link.
1585268 Don't allow sessions to be created with non-CPU allocators
#485 (💖 @mayocream) Fix load order when using cuda::preload_dylibs.
c5b68a1 Fix AsyncInferenceFut drop behavior.

Misc

Update ONNX Runtime to v1.23.2.
The MSRV is now Rust 1.88.
Binaries are now compressed using LZMA2, which reduces bandwidth by 30% compared to gzip but may double the time it takes to download binaries for the first time.
- If you use ort in CI, please cache the ~/.cache/ort.pyke.io directory between runs.
ort's dependency tree has shrunk a little bit, so it should build a little faster!
b68c928 Overhaul build.rs
- Warnings should now appear when binaries aren't available, and errors should look a lot nicer.
- pkg-config support now requires the pkg-config feature.
🚨 d269461 Make Metadata methods return Option<T> instead of Result<T>.
🚨 47e5667 Gate preload_dylib and cuda::preload_dylibs behind a new preload-dylibs feature flag instead of load-dynamic.
🚨 3b408b1 Shorten execution_providers to ep and XXXExecutionProvider to XXX.
- They are still re-imported as their old names to avoid breakage, but these re-imports will be removed and thus broken in 2.0.0, so it's a good idea to change them now.
🚨 38573e0 Simplify ThreadManager trait.

ONNX Runtime binary changes

Now shipping iOS & Android builds!!! Thank you Raphael Menges!!!
Support for Intel macOS (x86_64-apple-darwin) has been dropped following upstream changes to ONNX Runtime & Rust.
- Additionally, the macOS target has been raised to 13.4.
- This means I can't debug macOS issues in my Hackintosh VM anymore, so expect little to no macOS support in general from now on. If you know where I can get a used 16GB Apple Silicon Mac Mini for cheap, please let me know!
ONNX Runtime is now compiled with --client_package_build, meaning default options will optimize for low-resource edge inference rather than high throughput.
- This currently only disables spinning by default. For server deployments, re-enable inter- and intra-op spinning for best throughput.
Now shipping TensorRT RTX builds on Windows & Linux!
x86_64 builds now target x86-64-v3, aka Intel Haswell/Broadwell and AMD Zen (any Ryzen) or later.
Linux builds are now built with Clang instead of GCC.
Various CUDA changes:
- Kernels are now shipped compressed; this saves bandwidth & file size, but may slightly increase first-run latency. It will have no effect on subsequent runs.
- Recently-added float/int matrix multiplication kernels aren't enabled. Quantized models will miss out on a bit of performance, but it was impossible to compile these kernels within the limitations of free GitHub Actions runners.

`ort-tract`

Update tract to 0.22.
2d40e05 ort-tract no longer claims it is ort-candle in ort::info().

`ort-candle`

Update candle to 0.9.

❤️🧡💛💚💙💜

Assets 2

05 Jun 04:54

decahedron1

v2.0.0-rc.10

d1ebde9

v2.0.0-rc.10

💖 If you find `ort` useful, please consider sponsoring us on Open Collective 💖

🤔 Need help upgrading? Ask questions in GitHub Discussions or in the pyke.io Discord server!

🔗 Tensor Array Views

You can now create a TensorRef directly from an ArrayView. Previously, tensors could only be created via Tensor::from_array (which, in many cases, performed a copy if borrowed data was provided). The new TensorRef::from_array_view (and the complementary TensorRefMut::from_array_view_mut) method(s) allows for the zero-copy creation of tensors directly from an ArrayView.

Tensor::from_array now only accepts owned data, so you should either refactor your code to use TensorRefs or pass ownership of the array to the Tensor.

⚠️ ndarrays must be in standard/contiguous memory layout to be converted to a TensorRef(Mut); see .as_standard_layout().

↔️ Copy Tensors

rc.10 now allows you to manually copy tensors between devices using Tensor::to!

// Create our tensor in CUDA memory
let cuda_allocator = Allocator::new(
	&session,
	MemoryInfo::new(AllocationDevice::CUDA, 0, AllocatorType::Device, MemoryType::Default)?
)?;
let cuda_tensor = Tensor::<f32>::new(&cuda_allocator, [1_usize, 3, 224, 224])?;

// Copy it back to CPU
let cpu_tensor = cuda_tensor.to(AllocationDevice::CPU, 0)?;

There's also Tensor::to_async, which replicates the functionality of PyTorch's non_blocking=True. Additionally, Tensors now implement Clone.

⚙️ Alternative Backends

ort is no longer just a wrapper for ONNX Runtime; it's a one-stop shop for inferencing ONNX models in Rust thanks to the addition of the alternative backend API.

Alternative backends wrap other inference engines behind ONNX Runtime's API, which can simply be dropped in and used in ort - all it takes is one line of code:

fn main() {
    ort::set_api(ort_tract::api()); // <- magic!

    let session = Session::builder()?
        ...
}

2 alternative backends are shipping alongside rc.10 - ort-tract, powered by tract, and ort-candle, powered by candle, with more to come in the future.

Outside of the Rust ecosystem, these alternative backends can also be compiled as standalone libraries that can be directly dropped in to applications as a replacement for libonnxruntime. 🦀🦠

✏️ Model Editor

Models can be created entirely programmatically, or edited from an existing ONNX model via the new Model Editor API.

See src/editor/tests.rs for an example of how an ONNX model can be created programmatically. You can combine the Model Editor API with SessionBuilder::with_optimized_model_path to export the model outside Rust.

⚛️ Compiler

Many execution providers internally convert ONNX graphs to a framework-specific graph representation, like CoreML networks/TensorRT engines. This process can take a long time, especially for larger and more complex models. Since these generated artifacts aren't persisted between runs, they have to be created every time a session is loaded.

The new Compiler API allows you to compile an optimized, EP-ready graph ahead-of-time, so subsequent loads are lighting fast! ⚡

ModelCompiler::new(
    Session::builder()?
        .with_execution_providers([
            TensorRTExecutionProvider::default().build()
        ])?
)?
    .with_model_from_file("model.onnx")?
    .compile_to_file("compiled_trt_model.onnx")?;

🪶 `#![no_std]`

🚨 BREAKING: If you previously used ort with default-features = false...

That will now disable ort's std feature, which means you don't get to use APIs that interact with the operating system, like SessionBuilder::commit_from_file - APIs you probably need!

To minimize breakage, manually enable the std feature:
[dependencies]
ort = { version = "=2.0.0-rc.10", default-features = false, features = [ "std", ... ] }

ort no longer depends on std (but does still depend on alloc) - default-features = false will enable #![no_std] for ort.

⚡ Execution Providers

🚨 BREAKING: Boolean options for ArmNN, CANN, CoreML, CPU, CUDA, MIGraphX, NNAPI, OpenVINO, & ROCm...

If you previously used an option setter on one of these EPs that took no parameters (i.e. a boolean option that was false by default), note that these functions now do take a boolean parameter to align with Rust idiom.

Migrating is as simple as passing true to these functions. Affected functions include:

ArmNNExecutionProvider::with_arena_allocator

CANNExecutionProvider::with_dump_graphs

CPUExecutionProvider::with_arena_allocator

CUDAExecutionProvider::with_cuda_graph

CUDAExecutionProvider::with_skip_layer_norm_strict_mode

CUDAExecutionProvider::with_prefer_nhwc

MIGraphXExecutionProvider::with_fp16

MIGraphXExecutionProvider::with_int8

NNAPIExecutionProvider::with_fp16

NNAPIExecutionProvider::with_nchw

NNAPIExecutionProvider::with_disable_cpu

NNAPIExecutionProvider::with_cpu_only

OpenVINOExecutionProvider::with_opencl_throttling

OpenVINOExecutionProvider::with_dynamic_shapes

OpenVINOExecutionProvider::with_npu_fast_compile

ROCmExecutionProvider::with_exhaustive_conv_search

🚨 BREAKING: Renamed enum options for CANN, CUDA, QNN...

The following EP option enums have been renamed to reduce verbosity:

CANNExecutionProviderPrecisionMode -> CANNPrecisionMode

CANNExecutionProviderImplementationMode -> CANNImplementationMode

CUDAExecutionProviderAttentionBackend -> CUDAAttentionBackend

CUDAExecutionProviderCuDNNConvAlgoSearch -> CuDNNConvAlgorithmSearch

QNNExecutionProviderPerformanceMode -> QNNPerformanceMode

QNNExecutionProviderProfilingLevel -> QNNProfilingLevel

QNNExecutionProviderContextPriority -> QNNContextPriority

🚨 BREAKING: Updated CoreML options...

CoreMLExecutionProvider has been updated to use a new registration API, unlocking more options. To migrate old options:

.with_cpu_only() -> .with_compute_units(CoreMLComputeUnits::CPUOnly)

.with_ane_only() -> .with_compute_units(CoreMLComputeUnits::CPUAndNeuralEngine)

.with_subgraphs() -> .with_subgraphs(true)

rc.10 adds support for 3 execution providers:

Azure allows you to call Azure AI models like GPT-4 directly from ort.
WebGPU is powered by Dawn, an implementation of the WebGPU standard, allowing accelerated inference with almost any D3D12/Metal/Vulkan/OpenGL-supported GPU. Binaries with the WebGPU EP are available on Windows & Linux, so you can start testing it straight away!
NV TensorRT RTX is a new execution provider purpose-built for NVIDIA RTX GPUs running with ONNX Runtime on Windows. It's powered by TensorRT for RTX, a specially-optimized inference library built upon TensorRT releasing in June.

All binaries are now statically linked! This means the cuda and tensorrt features no longer use onnxruntime.dll/libonnxruntime.so. The EPs themselves do still require separate DLLs - like libonnxruntime_providers_cuda - but this change should make it significantly easier to set up and use ort with CUDA/TRT.

🧩 Custom Operator Improvements

🚨 BREAKING: Migrating your custom operators...

All methods under Operator now take &self.
The operator's kernel is no longer an associated type - create_kernel is instead expected to return a Box<dyn Kernel> (which can now be created directly from a function!)

 impl Operator for MyCustomOp {
-    type Kernel = MyCustomOpKernel;
 
-    fn name() -> &'static str {
+    fn name(&self) -> &str {
         "MyCustomOp"
     }
 
-    fn inputs() -> Vec<OperatorInput> {
+    fn inputs(&self) -> Vec<OperatorInput> {
         vec![OperatorInput::required(TensorElementType::Float32)]
     }
 
-    fn outputs() -> Vec<OperatorOutput> {
+    fn outputs(&self) -> Vec<OperatorOutput> {
         vec![OperatorOutput::required(TensorElementType::Float32)]
     }
 
-   fn create_kernel(_: &KernelAttributes) -> ort::Result<Self::Kernel> {
-       Ok(MyCustomOpKernel)
-   }
+   fn create_kernel(&self, _: &KernelAttributes) -> ort::Result<Box<dyn Kernel>> {
+       Ok(Box::new(|ctx: &KernelContext| {
+           ...
+       }))
...

Assets 2

21 Nov 20:47

decahedron1

v2.0.0-rc.9

123c449

v2.0.0-rc.9

🌴 Undo The Flattening (`d4f82fc`)

A previous ort release 'flattened' all exports, such that everything was exported at the crate root - ort::{TensorElementType, Session, Value}. This was done at a time when ort didn't export much, but now it exports a lot, so this was leading to some big, ugly use blocks.

rc.9 now has most exports behind their respective modules - Session is now imported as ort::session::Session, Tensor as ort::value::Tensor, etc. rust-analyzer and some quick searches on docs.rs can help you find the right paths to import.

📦 Tensor `extract` optimization (`1dbad54`)

Previously, calling any of the extract_tensor_* methods would have to call back to ONNX Runtime to determine the value's ValueType to ensure it was OK to extract. This involved a lot of FFI calls and a few allocations which could have a notable performance impact in hot loops.

Since a value's type never changes after it is created, the ValueType is now created when the Value is constructed (i.e. via Tensor::from_array or returned from a session). This makes extract_tensor_* a lot cheaper!

Note that this does come with some breaking changes:

Raw tensor extract methods return &[i64] for their dimensions instead of Vec<i64>.
Value::dtype() and Tensor::memory_info() now return &ValueType and &MemoryInfo respectively, instead of their non-borrowed counterparts.
ValueType::Tensor now has an extra field for symbolic dimensions, dimension_symbols, so you might have to update matches on ValueType.

🚥 Threading management (`87577ef`)

2.0.0-rc.9 introduces a new trait: ThreadManager. This allows you to define custom thread create & join functions for session & environment thread pools! See the thread_manager.rs test for an example of how to create your own ThreadManager and apply it to a session, or an environment's GlobalThreadPoolOptions (previously EnvironmentGlobalThreadPoolOptions).

Additionally, sessions may now opt out of the environment's global thread pool if one is configured.

🧠 Shape inference for custom operators (`87577ef`)

ort now provides ShapeInferenceContext, an interface for custom operators to provide a hint to ONNX Runtime about the shape of the operator's output tensors based on its inputs, which may open the doors to memory optimizations.

See the updated custom_operators.rs example to see how it works.

📃 Session output refactor (`8a16adb`)

SessionOutputs has been slightly refactored to reduce memory usage and slightly increase performance. Most notably, it no longer derefs to a &BTreeMap.

The new SessionOutputs interface closely mirrors BTreeMap's API, so most applications require no changes unless you were explicitly dereferencing to a &BTreeMap.

🛠️ LoRA Adapters (`d877fb3`)

ONNX Runtime v1.20.0 introduces a new Adapter format for supporting LoRA-like weight adapters, and now ort has it too!

An Adapter essentially functions as a map of tensors, loaded from disk or memory and copied to a device (typically whichever device the session resides on). When you add an Adapter to RunOptions, those tensors are automatically added as inputs (except faster, because they don't need to be copied anywhere!)

With some modification to your ONNX graph, you can add LoRA layers using optional inputs which Adapter can then override. (Hopefully ONNX Runtime will provide some documentation on how this can be done soon, but until then, it's ready to use in ort!)

let model = Session::builder()?.commit_from_file("tests/data/lora_model.onnx")?;
let lora = Adapter::from_file("tests/data/adapter.orl", None)?;

let mut run_options = RunOptions::new()?;
run_options.add_adapter(&lora)?;

let outputs = model.run_with_options(ort::inputs![Tensor::<f32>::from_array(([4, 4], vec![1.0; 16]))?]?, &run_options)?;

🗂️ Prepacked weights (`87577ef`)

PrepackedWeights allows multiple sessions to share the same weights across multiple sessions. If you create multiple Sessions from one model file, they can all share the same memory!

Currently, ONNX Runtime only supports prepacked weights for the CPU execution provider.

‼️ Dynamic dimension overrides (`87577ef`)

You can now override dynamic dimensions in a graph using SessionBuilder::with_dimension_override, allowing ONNX Runtime to do more optimizations.

🪶 Customizable workload type (`87577ef`)

Not all workloads need full performance all the time! If you're using ort to perform background tasks, you can now set a session's workload type to prioritize either efficiency (by lowering scheduling priority or utilizing more efficient CPU cores on some architectures), or performance (the default).

let session = Session::builder()?.commit_from_file("tests/data/upsample.onnx")?;
session.set_workload_type(WorkloadType::Efficient)?;

Other features

28e00e3 Update to ONNX Runtime v1.20.0.
552727e Expose the ortsys! macro.
- Note that this commit also made ort::api() return &ort_sys::OrtApi instead of NonNull<ort_sys::OrtApi>.
82dcf84 Add AsPointer trait.
- Structs that previously had a ptr() method now have an AsPointer implementation instead.
b51f60c Add config entries to RunOptions.
67fe38c Introduce the ORT_CXX_STDLIB environment variable (mirroring CXXSTDLIB) to allow changing the C++ standard library ort links to.

Fixes

c1c736b Fix ValueRef & ValueRefMut leaking value memory.
2628378 Query MemoryInfo's DeviceType instead of its allocation device to determine whether Tensors can be extracted.
e220795 Allow ORT_PREFER_DYNAMIC_LINK to work even when cuda or tensorrt are enabled.
1563c13 Add missing downcast implementations for Sequence<T>.
Returned Ferris to the docs.rs page 🦀

If you have any questions about this release, we're here to help:

Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support ort as well, consider contributing on Open Collective 💖

🩷💜🩷💜

Assets 2

0 Join discussion

18 Oct 23:24

decahedron1

v2.0.0-rc.7

ab13822

v2.0.0-rc.7

Breaking: Infallible functions

The following functions have been updated to return T instead of ort::Result<T>:

MemoryInfo::memory_type
MemoryInfo::allocator_type
MemoryInfo::allocation_device
MemoryInfo::device_id
Value::<T>::memory_info
Value::<T>::dtype

Features

ValueType now implements Display.
7f71e6c Implement Sync for Value<T>.
abd527b Arbitrarily configurable execution providers allows you to add custom configuration options to the CANN, CUDA, oneDNN, QNN, TensorRT, VITIS, and XNNPACK execution providers.
- This also fixes a bug when attempting to configure TensorRT's ep_context_embed_mode.
e16fd5b Add more options to the CUDA execution provider, including user compute streams and SDPA kernel configuration.
bd3c891 Implement Send for Allocator.
6de6aa5 Add Session::overridable_initializers to get a list of overridable initializers in the graph.
c8b36f3 Allow loading a session with external initializers in memory.
2e1f014 Allow upgrading a ValueRef or ValueRefMut to a Value in certain cases.
f915bca Adds SessionBuilder::with_config_entry for adding custom session config options.
ae7b594 Adds an environment variable, ORT_PREFER_DYNAMIC_LINK, to override whether or not ort should prefer static or dynamic libs when ORT_LIB_LOCATION is specified.
1e2e7b0 Add functions for explicit data re-synchronization for IoBinding.
b19cff4 Add ::ptr() to every C-backed struct to expose ort_sys pointers.
d0ee395 Implement Clone for MemoryInfo.

Fixes

b58595c The oneDNN execution provider now registers using a more recent API internally. (Also, with_arena_allocator is now with_use_arena.)
cf1be86 Remove the lifetime bound for IoBinding so it can be stored in a struct alongside a session.
Multiple fixes to static linking for Linux, macOS, and Android.
- 4da5700
- 2502224
- c882c58
- a779c3b
- b0e7ebe
b1fb8c0 Sequence::extract_sequence now returns Value<T> instead of ValueRef<T>.
542f210 Make Environment and ExecutionProvider Send + Sync.
fbe8cbf (Sorta) handle error messages for non-English localities on Windows.

Other changes

API documentation is now back on docs.rs!
Improved error messages in multiple areas

If you have any questions about this release, we're here to help:

Thank you to Brad Neuman, web3nomad, and Julien Cretin for contributing to this release!

💜🩷💜🩷

Assets 2

0 Join discussion

10 Sep 21:46

decahedron1

v2.0.0-rc.6

ee5cc20

v2.0.0-rc.6

`ort::Error` refactor

ort::Error is no longer an enum, but rather an opaque struct with a message and a new ErrorCode field.

ort::Error still implements std::error::Error, so this change shouldn't be too breaking; however, if you were previously matching on ort::Errors, you'll have to refactor your code to instead match on the error's code (acquired with the Error::code() function).

`AllocationDevice` refactor

The AllocationDevice type has also been converted from an enum to a struct. Common devices like CUDA or DirectML are accessible via associated constants like AllocationDevice::CUDA & AllocationDevice::DIRECTML.

Features

60f6eca Update to ONNX Runtime v1.19.2.
9f4527c Added ModelMetadata::custom_keys() to get a Vec of all custom keys.
bfa791d Add various SessionBuilder options affecting compute & graph optimizations.
5e6fc6b Expose the underlying Allocator API. You can now allocate & free buffers acquired from a session or operator kernel context.
52422ae Added ValueType::Optional.
2576812 Added the Vitis AI execution provider for new AMD Ryzen AI chips.
41ef65a Added the RKNPU execution provider for certain Rockchip NPUs.
6b3e7a0 Added KernelContext::par_for, allowing operator kernels to use ONNX Runtime's thread pool without needing an extra dependency on a crate like rayon.

Fixes

edcb219 Make environment initialization thread-safe. This should eliminate intermittent segfaults when running tests concurrently, like seen in #278.
3072279 Linux dylibs no longer require version symlinks, fixing #269.
bc70a0a Fixed unsatisfiable lifetime bounds when creating Tensors from &CowArrays.
6592b17 Providing more inputs than the model expects no longer segfaults.
b595048 Shave off dependencies by removing tracing's attributes feature - a --no-default-features build of ort now only builds 9 crates!
c7ddbdb Removed the operator-libraries feature - you can still use SessionBuilder::with_operator_library, it's just no longer gated behind the feature!

If you have any questions about this release, we're here to help:

Love ort? Consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

0 Join discussion

18 Aug 19:13

decahedron1

v2.0.0-rc.5

92541cd

v2.0.0-rc.5

Possibly breaking

Pre-built static libraries (i.e. not cuda or tensorrt) are now linked with /MD instead of /MT on Windows; i.e. MSVC CRT is no longer statically linked. This should resolve linking issues in some cases (particularly crates using other FFI libraries), but may cause issues for others. I have personally tested this in 2 internal pyke projects that depend on ort & many FFI libraries and haven't encountered any issues, but your mileage may vary.

Definitely breaking

069ddfd ort now depends on ndarray 0.16.
e2c4549 wasm32-unknown-unknown support has been removed.
- Getting wasm32-unknown-unknown working in the first place was basically a miracle. Hacking ONNX Runtime to work outside of Emscripten took a lot of effort, but recent changes to Emscripten and ONNX Runtime have made this exponentially more difficult. Given I am not adequately versed on ONNX Runtime's internals, the nigh-impossibility of debugging weird errors, ~~and the vow I took to write as little C++ as possible ever since I learned Rust~~, it's no longer feasible for me to work on WASM support for ort.
- If you were using ort in WASM, I suggest you use and/or support the development of alternative WASM-supporting ONNX inference crates like tract or WONNX.

Features

ab293f8 Update to ONNX Runtime v1.19.0.
ecf76f9 Use the URL hash for downloaded model filenames. Models previously downloaded & cached with commit_from_url will be redownloaded.
9d25514 Add missing configuration keys for some execution providers.
733b7fa New callbacks for the simple Trainer API, just like HF's TrainerCallbacks! This allows you to write custom logging/LR scheduling callbacks. See the updated train-clm-simple example for usage details.

Fixes

1692d11 Fix OpenVINO EP option bugs.
08aaa0f Fix DirectML output tensor extraction.

If you have any questions about this release, we're here to help:

Love ort? Consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

12 Join discussion

07 Jul 20:52

decahedron1

v2.0.0-rc.4

04da381

v2.0.0-rc.4

This release addresses important linking issues with rc3, particularly regarding CUDA on Linux.

cuDNN 9 is no longer required for CUDA 12 builds (but is still the default); set the ORT_CUDNN_VERSION environment variable to 8 to use cuDNN 8 with CUDA 12.

If you have any questions about this release, we're here to help:

Love ort? Consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

06 Jul 17:20

decahedron1

v2.0.0-rc.3

3dec017

v2.0.0-rc.3

Training

ort now supports a (currently limited subset of) ONNX Runtime's Training API. You can use the on-device Training API for fine-tuning, online learning, or even full pretraining, on any CPU or GPU.

The train-clm example pretrains a language model from scratch. There's also a 'simple' API and related example, which offers a basically one-line training solution akin to 🤗 Transformers' Trainer API:

trainer.train(
	TrainingArguments::new(dataloader)
		.with_lr(7e-5)
		.with_max_steps(5000)
		.with_ckpt_strategy(CheckpointStrategy::Steps(500))
)?;

You can learn more about training with ONNX Runtime here. Please try it out and let us know how we can improve the training experience!

ONNX Runtime v1.18

ort now ships with ONNX Runtime v1.18.

The CUDA 12 build requires cuDNN 9.x, so if you're using CUDA 12, you need to update cuDNN. The CUDA 11 build still requires cuDNN 8.x.

`IoBinding`

IoBinding's previously rather unsound API has been reworked and actually documented.

Output selection & pre-allocation

Sometimes, you don't need to calculate all of the outputs of a session. Other times, you need to pre-allocate a session's outputs to save on slow device copies or expensive re-allocations. Now, you can do both of these things without IoBinding through a new API: OutputSelector.

let options = RunOptions::new()?.with_outputs(
	OutputSelector::no_default()
		.with("output")
		.preallocate("output", Tensor::<f32>::new(&Allocator::default(), [1, 3, 224, 224])?)
);

let outputs = model.run_with_options(inputs!["input" => input.view()]?, &options)?;

In this example, each call to run_with_options that uses the same options struct will use the same allocation in memory, saving the cost of re-allocating the output; and any outputs that aren't the output aren't even calculated.

Value ergonomics

String tensors are now Tensor<String> instead of DynTensor. They also no longer require an allocator to be provided to create or extract them. Additionally, Maps can also have string keys, and no longer require allocators.

Since value specialization, IntoTensorElementType was used to describe only primitive (i.e. f32, i64) elements. This has since been changed to PrimitiveTensorElementType, which is a subtrait of IntoTensorElementType. If you have type bounds that depended on IntoTensorElementType, you probably want to update them to use PrimitiveTensorElementType instead.

Custom operators

Operator kernels now support i64, string, Vec<f32>, Vec<i64>, and TensorRef attributes, among most other previously missing C API features.

Additionally, the API for adding an operator to a domain has been changed slightly; it is now .add::<Operator>() instead of .add(Operator).

Other changes

80be206 & 8ae23f2 Miscellaneous WASM build fixes.
1c0a5e4 Allow downcasting ValueRef & ValueRefMut.
ce5aaba Add EnvironmentBuilder::with_telemetry.
- pyke binaries were never compiled with telemetry support, only Microsoft-provided Windows builds of ONNX Runtime had telemetry enabled by default; if you are using Microsoft binaries, this will now allow you to disable telemetry.
23fce78 ExecutionProviderDispatch::error_on_failure will immediately error out session creation if the registration of an EP fails.
d59ac43 RunOptions is now taken by reference instead of via an Arc.
d59ac43 Add Session::run_async_with_options.
a92dd30 Enable support for SOCKS proxies when downloading binaries.
19d66de Add AMD MIGraphX execution provider.
882f657 Bundle libonnxruntime in library builds where crate-type=rlib/staticlib.
860e449 Fix build for i686-pc-windows-msvc.
1d89f82 Support pkg-config.

If you have any questions about this release, we're here to help:

Thank you to Florian Kasischke, cagnolone, Ryo Yamashita, and Julien Cretin for contributing to this release!

Thank you to Johannes Laier, Noah, Yunho Cho, Okabintaro, and Matouš Kučera, whose support made this release possible. If you'd like to support ort as well, consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

27 Apr 00:19

decahedron1

v2.0.0-rc.2

467d127

v2.0.0-rc.2

Changes

f30ba57 Update to ONNX Runtime v1.17.3
- New: CUDA 12 binaries. ort will automatically detect CUDA 12/11 in your environment and install the correct binary.
- New: Binaries for ROCm on Linux.
- Note that WASM is still on v1.17.1.
b12c43c Support for wasm32-unknown-unknown, wasm32-wasi
- With some minor limitations; see https://ort.pyke.io/setup/webassembly.
- Thank you to Yunho Cho, whose sponsorship made this possible! If you'd also like to support us, you may do so on Open Collective 💖
cedeb55 Swap specialized value upcast and downcast function names to reflect their actual meaning (thanks @/messense for pointing this out!)
de3bca4 Fix a segfault with custom operators.
681da43 Fix compatibility with older versions of rustc.
63a1818 Accept ValueRefMut as a session input.
8383879 Add a function to create tensors from a raw device pointer, allowing you to create tensors directly from a CUDA buffer.
4af33b1 Re-export ort-sys as ort::sys.

If you have any questions about this release, we're here to help:

Love ort? Consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

28 Mar 01:32

decahedron1

v2.0.0-rc.1

69c191d

v2.0.0-rc.1

Value specialization

The Value struct has been refactored into multiple strongly-typed structs: Tensor<T>, Map<K, V>, and Sequence<T>, and their type-erased variants: DynTensor, DynMap, and DynSequence.

Values returned by session inference are now DynValues, which behave exactly the same as Value in previous versions.

Tensors created from Rust, like via the new Tensor::new function, can be directly and infallibly extracted into its underlying data via extract_tensor (no try_):

let allocator = Allocator::new(&session, MemoryInfo::new(AllocationDevice::CUDAPinned, 0, AllocatorType::Device, MemoryType::CPUInput)?)?;
let tensor = Tensor::<f32>::new(&allocator, [1, 128, 128, 3])?;

let array = tensor.extract_array();
// no need to specify type or handle errors - Tensor<f32> can only extract into an f32 ArrayView

You can still extract tensors, maps, or sequence values normally from a DynValue using try_extract_*:

let generated_tokens: ArrayViewD<f32> = outputs["output1"].try_extract_tensor()?;

DynValue can be upcast()ed to the more specialized types, like DynMap or Tensor<T>:

let tensor: Tensor<f32> = value.upcast()?;
let map: DynMap = value.upcast()?;

Similarly, a strongly-typed value like Tensor<T> can be downcast back into a DynValue or DynTensor.

let dyn_tensor: DynTensor = tensor.downcast();
let dyn_value: DynValue = tensor.into_dyn();

Tensor extraction directly returns an `ArrayView`

extract_tensor (and now try_extract_tensor) now return an ndarray::ArrayView directly, instead of putting it behind the old ort::Tensor<T> type (not to be confused with the new specialized value type). This means you don't have to .view() on the result:

-let generated_tokens: Tensor<f32> = outputs["output1"].extract_tensor()?;
-let generated_tokens = generated_tokens.view();
+let generated_tokens: ArrayViewD<f32> = outputs["output1"].try_extract_tensor()?;

Full support for sequence & map values

You can now construct and extract Sequence/Map values.

Value views

You can now obtain a view of any Value via the new view() and view_mut() functions, which operate similar to ndarray's own view system. These views can also now be passed into session inputs.

Mutable tensor extraction

You can extract a mutable ArrayViewMut or &mut [T] from a mutable reference to a tensor.

let (raw_shape, raw_data) = tensor.extract_raw_tensor_mut();

Device-allocated tensors

You can now create a tensor on device memory with Tensor::new & an allocator:

let allocator = Allocator::new(&session, MemoryInfo::new(AllocationDevice::CUDAPinned, 0, AllocatorType::Device, MemoryType::CPUInput)?)?;
let tensor = Tensor::<f32>::new(&allocator, [1, 128, 128, 3])?;

The data will be allocated by the device specified by the allocator. You can then use the new mutable tensor extraction to modify the tensor's data.

What if custom operators were 🚀 blazingly 🔥 fast 🦀?

You can now write custom operator kernels in Rust. Check out the custom-ops example.

Custom operator library feature change

Since custom operators can now be written completely in Rust, the old custom-ops feature, which enabled loading custom operators from an external dynamic library, has been renamed to operator-libraries.

Additionally, Session::with_custom_ops_lib has been renamed to Session::with_operator_library, and the confusingly named Session::with_enable_custom_ops (which does not enable custom operators in general, but rather attempts to load onnxruntime-extensions) has been updated to Session::with_extensions to reflect its actual behavior.

Asynchronous inference

Session introduces a new run_async method which returns inference results via a future. It's also cancel-safe, so you can simply cancel inference with something like tokio::select! or tokio::time::timeout.

If you have any questions about this release, we're here to help:

Love ort? Consider supporting us on Open Collective 💖

❤️💚💙💛

Assets 2

1 Join discussion

Uh oh!

Releases: pykeio/ort

v2.0.0-rc.11

💖 If you find ort useful, please consider sponsoring us on Open Collective 💖

ort-web

Features

Fixes

Misc

ONNX Runtime binary changes

ort-tract

ort-candle

❤️🧡💛💚💙💜

Uh oh!

v2.0.0-rc.10

💖 If you find ort useful, please consider sponsoring us on Open Collective 💖

🔗 Tensor Array Views

↔️ Copy Tensors

⚙️ Alternative Backends

✏️ Model Editor

⚛️ Compiler

🪶 #![no_std]

⚡ Execution Providers

🧩 Custom Operator Improvements

Uh oh!

v2.0.0-rc.9

🌴 Undo The Flattening (d4f82fc)

📦 Tensor extract optimization (1dbad54)

🚥 Threading management (87577ef)

🧠 Shape inference for custom operators (87577ef)

📃 Session output refactor (8a16adb)

🛠️ LoRA Adapters (d877fb3)

🗂️ Prepacked weights (87577ef)

‼️ Dynamic dimension overrides (87577ef)

🪶 Customizable workload type (87577ef)

Other features

Fixes

Uh oh!

v2.0.0-rc.7

Breaking: Infallible functions

Features

Fixes

Other changes

Uh oh!

v2.0.0-rc.6

ort::Error refactor

AllocationDevice refactor

Features

Fixes

Uh oh!

v2.0.0-rc.5

Possibly breaking

Definitely breaking

Features

Fixes

Uh oh!

v2.0.0-rc.4

Uh oh!

v2.0.0-rc.3

Training

ONNX Runtime v1.18

IoBinding

Output selection & pre-allocation

Value ergonomics

Custom operators

Other changes

Uh oh!

v2.0.0-rc.2

Changes

Uh oh!

v2.0.0-rc.1

Value specialization

Tensor extraction directly returns an ArrayView

Full support for sequence & map values

Value views

Mutable tensor extraction

Device-allocated tensors

What if custom operators were 🚀 blazingly 🔥 fast 🦀?

Custom operator library feature change

Asynchronous inference

Uh oh!

💖 If you find `ort` useful, please consider sponsoring us on Open Collective 💖

`ort-web`

`ort-tract`

`ort-candle`

💖 If you find `ort` useful, please consider sponsoring us on Open Collective 💖

🪶 `#![no_std]`

🌴 Undo The Flattening (`d4f82fc`)

📦 Tensor `extract` optimization (`1dbad54`)

🚥 Threading management (`87577ef`)

🧠 Shape inference for custom operators (`87577ef`)

📃 Session output refactor (`8a16adb`)

🛠️ LoRA Adapters (`d877fb3`)

🗂️ Prepacked weights (`87577ef`)

‼️ Dynamic dimension overrides (`87577ef`)

🪶 Customizable workload type (`87577ef`)

`ort::Error` refactor

`AllocationDevice` refactor

`IoBinding`

Tensor extraction directly returns an `ArrayView`