-
Notifications
You must be signed in to change notification settings - Fork 800
Metal backend: compute init/execute times #16639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16639
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 3cfefbf with merge base d58c8ee ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds timing instrumentation to the Metal backend for measuring initialization and execution performance, and exposes these statistics in the Parakeet example application.
Changes:
- Added timing measurement for Metal backend
init()andexecute()methods with per-method granularity - Created a new stats API module with accessor and print functions for timing data
- Integrated Metal backend timing statistics into the Parakeet example to display performance metrics
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/apple/metal/runtime/metal_backend.cpp | Added timing instrumentation using std::chrono to measure and track init/execute times in global variables |
| backends/apple/metal/runtime/stats.h | Defined public API for accessing Metal backend timing statistics |
| backends/apple/metal/runtime/stats.cpp | Implemented function to print formatted timing statistics to stdout |
| backends/apple/metal/CMakeLists.txt | Added stats.cpp to build and defined ET_BUILD_METAL preprocessor macro |
| examples/models/parakeet/main.cpp | Added performance statistics output section that calls Metal backend stats when built with Metal support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This pull request introduces detailed performance tracking and reporting for the Metal backend, focusing on timing statistics for both initialization and execution phases. It adds infrastructure to collect, reset, and print timing data, and integrates statistics output into the Parakeet model example. Additionally, a preprocessor macro is defined to signal Metal backend availability.
Metal Backend Performance Statistics:
init()andexecute()calls in the Metal backend, including total time, call count, and per-method breakdowns. Accessor and reset functions are provided instats.hand implemented inmetal_backend.cpp.stats.cppfile with a function to print all collected Metal backend statistics, including per-method breakdowns for both initialization and execution.Build and Integration Improvements:
runtime/stats.cppto the Metal backend build sources and defined theET_BUILD_METALpreprocessor macro to indicate Metal backend support.Example Model Enhancements:
main.cpp) to show Metal backend timing stats after model execution if Metal is enabled.These changes provide better visibility into Metal backend performance and make it easier to profile and optimize model execution on Apple devices.