-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Support Nvidia-Cuda execution provider for wasi-nn onnx backend #12044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support Nvidia-Cuda execution provider for wasi-nn onnx backend #12044
Conversation
482de2b to
75caf05
Compare
|
Looks like the bump to +1 to the addition of GPU support for the ONNX backend \o/ @abrown, you might be interested in this one. |
|
@abrown could you please review the PR and add the cargo vet entries? |
|
@abrown is in the process of handing off wasi-nn maintenance/work to @jlb6740 and @rahulchaphalkar so as a heads up this might take a moment @zhen9910 to allocate time to work on it. |
I see, thanks for update |
75caf05 to
c919981
Compare
|
@alexcrichton could you take a look this PR, which is rebased against main with cargo vet and ort changes |
|
I unfortunately am not equipped myself to review/maintain wasi-nn. Historically that was Andrew, who's now handed off to @jlb6740 and @rahulchaphalkar, but they (I suspect) have other priorities to balance too. @zhen9910 if you'd like to reach out to them directly I'm sure they'd be happy to help work on a path forward |
|
Thanks @zhen9910 for the contribution and also waiting for someone to take a look at it, and @alexcrichton for summarizing some of the changes happening here. I'm reviewing this. |
rahulchaphalkar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a couple of comments/question about the included Example.
crates/wasi-nn/src/backend/onnx.rs
Outdated
| #[cfg(feature = "onnx-cuda")] | ||
| { | ||
| // Use CUDA execution provider for GPU acceleration | ||
| tracing::debug!("Configuring ONNX Nvidia CUDA execution provider for GPU target"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: debug messages for CPU and GPU can be of similar form, e.g. Using CPU/Nvidia GPU/CUDA execution provider or similar, or more verbose if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| ./target/debug/wasmtime run \ | ||
| -Snn \ | ||
| --dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \ | ||
| ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should've read wasm32-wasip2 instead of wasm32-wasip1 in the original Readme, as this fails with below error:
Error: failed to run main module `./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm`
Caused by:
0: failed to instantiate "./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm"
1: unknown import: `wasi:nn/[email protected]::[resource-drop]tensor` has not been defined
So thiis needs to be p2 for cpu and gpu both. (Build with cargo build --target wasm32-wasip2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original Readme used cargo component build, which will build wasm component for wasm32-wasip1 by default. so, the readme will be updated with cargo component build --target wasm32-wasip2 for wasm32-wasip2
| -Snn \ | ||
| --dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \ | ||
| ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm \ | ||
| gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected behavior of this example when run on a system w/o GPU/Cuda? I ran on my system without an Nvidia GPU, and it seemed to run without complaining, or without explicitly falling back to CPU or failing.
./target/debug/wasmtime run \
-Snn \
--dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \
./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm \
gpu
Read ONNX model, size in bytes: 4956208
Using GPU (CUDA) execution target from argument
Loaded graph into wasi-nn with ExecutionTarget::Gpu target
Created wasi-nn execution context.
Read ONNX Labels, # of labels: 1000
Executed graph inference
Retrieved output data with length: 4000
Index: n02099601 golden retriever - Probability: 0.9948673
Index: n02088094 Afghan hound, Afghan - Probability: 0.002528982
Index: n02102318 cocker spaniel, English cocker spaniel, cocker - Probability: 0.001098644
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ort, if the GPU execution provider is requested but the device does not have a GPU or the necessary CUDA drivers are missing, ONNX Runtime will fall back to the CPU execution provider. The application will continue to run, but inference will happen on the CPU. When ort log is enabled, we can see a warning like: No execution providers from session options registered successfully; may fall back to CPU.
But this fallback log is not propagated from ort to wasi_nn, which cause confusing. I then add a feature ort-tracing to enable ort logging for wasi_nn, and this can be used by the users to verify this fallback behavior, please see if this is fine.
configure_execution_providers() is also updated to fallback to CPU if GPU/TPU is not enabled/supported to get a consistent fallback behavior.
thanks @rahulchaphalkar @alexcrichton for the update! I will take a look and address the comments. |
163c6a8 to
4ec4eb2
Compare
As discussed in #8547, the existing wasi-nn onnx backend only uses the default CPU execution provider. This PR added onnx-cuda based wasi-nn GPU execution target in
wasmtime-wasi-nnonnx backend.