Skip to content

Conversation

@zhen9910
Copy link

@zhen9910 zhen9910 commented Nov 18, 2025

As discussed in #8547, the existing wasi-nn onnx backend only uses the default CPU execution provider. This PR added onnx-cuda based wasi-nn GPU execution target in wasmtime-wasi-nn onnx backend.

@zhen9910 zhen9910 requested review from a team as code owners November 18, 2025 20:04
@zhen9910 zhen9910 requested review from fitzgen and removed request for a team November 18, 2025 20:04
@zhen9910 zhen9910 force-pushed the zkong/update-ort-and-onnx-gpu branch from 482de2b to 75caf05 Compare November 19, 2025 00:03
@devigned
Copy link
Contributor

Looks like the bump to ort caused cargo vet to be angry, which is to be expected.

+1 to the addition of GPU support for the ONNX backend \o/

@abrown, you might be interested in this one.

@fitzgen fitzgen requested review from abrown and removed request for fitzgen November 19, 2025 20:51
@zhen9910
Copy link
Author

@abrown could you please review the PR and add the cargo vet entries?

@alexcrichton
Copy link
Member

@abrown is in the process of handing off wasi-nn maintenance/work to @jlb6740 and @rahulchaphalkar so as a heads up this might take a moment @zhen9910 to allocate time to work on it.

@zhen9910
Copy link
Author

@abrown is in the process of handing off wasi-nn maintenance/work to @jlb6740 and @rahulchaphalkar so as a heads up this might take a moment @zhen9910 to allocate time to work on it.

I see, thanks for update

@zhen9910 zhen9910 force-pushed the zkong/update-ort-and-onnx-gpu branch from 75caf05 to c919981 Compare January 12, 2026 20:16
@zhen9910
Copy link
Author

@alexcrichton could you take a look this PR, which is rebased against main with cargo vet and ort changes

@alexcrichton
Copy link
Member

I unfortunately am not equipped myself to review/maintain wasi-nn. Historically that was Andrew, who's now handed off to @jlb6740 and @rahulchaphalkar, but they (I suspect) have other priorities to balance too. @zhen9910 if you'd like to reach out to them directly I'm sure they'd be happy to help work on a path forward

@rahulchaphalkar
Copy link
Contributor

Thanks @zhen9910 for the contribution and also waiting for someone to take a look at it, and @alexcrichton for summarizing some of the changes happening here. I'm reviewing this.

Copy link
Contributor

@rahulchaphalkar rahulchaphalkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of comments/question about the included Example.

#[cfg(feature = "onnx-cuda")]
{
// Use CUDA execution provider for GPU acceleration
tracing::debug!("Configuring ONNX Nvidia CUDA execution provider for GPU target");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: debug messages for CPU and GPU can be of similar form, e.g. Using CPU/Nvidia GPU/CUDA execution provider or similar, or more verbose if you want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

./target/debug/wasmtime run \
-Snn \
--dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should've read wasm32-wasip2 instead of wasm32-wasip1 in the original Readme, as this fails with below error:

Error: failed to run main module `./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm`

Caused by:
    0: failed to instantiate "./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm"
    1: unknown import: `wasi:nn/[email protected]::[resource-drop]tensor` has not been defined

So thiis needs to be p2 for cpu and gpu both. (Build with cargo build --target wasm32-wasip2)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original Readme used cargo component build, which will build wasm component for wasm32-wasip1 by default. so, the readme will be updated with cargo component build --target wasm32-wasip2 for wasm32-wasip2

-Snn \
--dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm \
gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behavior of this example when run on a system w/o GPU/Cuda? I ran on my system without an Nvidia GPU, and it seemed to run without complaining, or without explicitly falling back to CPU or failing.

./target/debug/wasmtime run \
    -Snn \
    --dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
    ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm \
    gpu
Read ONNX model, size in bytes: 4956208
Using GPU (CUDA) execution target from argument
Loaded graph into wasi-nn with ExecutionTarget::Gpu target
Created wasi-nn execution context.
Read ONNX Labels, # of labels: 1000
Executed graph inference
Retrieved output data with length: 4000
Index: n02099601 golden retriever - Probability: 0.9948673
Index: n02088094 Afghan hound, Afghan - Probability: 0.002528982
Index: n02102318 cocker spaniel, English cocker spaniel, cocker - Probability: 0.001098644

Copy link
Author

@zhen9910 zhen9910 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ort, if the GPU execution provider is requested but the device does not have a GPU or the necessary CUDA drivers are missing, ONNX Runtime will fall back to the CPU execution provider. The application will continue to run, but inference will happen on the CPU. When ort log is enabled, we can see a warning like: No execution providers from session options registered successfully; may fall back to CPU.

But this fallback log is not propagated from ort to wasi_nn, which cause confusing. I then add a feature ort-tracing to enable ort logging for wasi_nn, and this can be used by the users to verify this fallback behavior, please see if this is fine.

configure_execution_providers() is also updated to fallback to CPU if GPU/TPU is not enabled/supported to get a consistent fallback behavior.

@zhen9910
Copy link
Author

Thanks @zhen9910 for the contribution and also waiting for someone to take a look at it, and @alexcrichton for summarizing some of the changes happening here. I'm reviewing this.

thanks @rahulchaphalkar @alexcrichton for the update! I will take a look and address the comments.

@zhen9910 zhen9910 force-pushed the zkong/update-ort-and-onnx-gpu branch from 163c6a8 to 4ec4eb2 Compare January 15, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants