Integrate Automated QDQ placement tool - part 3.3#839
Integrate Automated QDQ placement tool - part 3.3#839willg-nv wants to merge 10 commits intoNVIDIA:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThese changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI (run_autotune)
participant Validator as Input Validator
participant Benchmark as Benchmark Init
participant Workflow as Autotuning Workflow
participant Model as ONNX Model
participant TensorRT as TensorRT Engine
participant Output as Model Export
User->>CLI: Invoke with arguments
CLI->>Validator: Validate model & baseline paths
Validator-->>CLI: Path valid / exit
CLI->>Benchmark: Initialize benchmark instance
Benchmark->>TensorRT: Configure with timing cache & plugins
TensorRT-->>Benchmark: Instance ready
Benchmark-->>CLI: Benchmark initialized
CLI->>Workflow: Invoke region_pattern_autotuning_workflow
Workflow->>Model: Load ONNX model
Workflow->>Model: Load pattern cache & QDQ baseline
Workflow->>Workflow: Profile regions & apply node filters
loop For each region
Workflow->>Workflow: Generate quantization schemes
Workflow->>Model: Apply Q/DQ to region
Workflow->>TensorRT: Benchmark model
TensorRT-->>Workflow: Latency result
end
Workflow->>Output: Export optimized model
Output-->>Workflow: Export complete
Workflow->>Output: Save state checkpoint
Output-->>Workflow: State saved
Workflow-->>CLI: Return autotuner result
CLI-->>User: Exit with status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.
In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.
09e136a to
e3ad6da
Compare
|
Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36 |
88b34a4 to
ebc6087
Compare
Added, please check |
|
/ok to test 0414b81 |
0414b81 to
1aa4818
Compare
|
@willg-nv, I'm seeing the following errors in the |
8f7fe19 to
95b4a5e
Compare
|
@willg-nv the precommit fixes are working, thank you! One last thing is the |
a2e3016 to
4bf8fbd
Compare
done |
c10f29e to
8a363da
Compare
|
/ok to test 8a363da |
|
@willg-nv CICD pipeline is failing, can you please have a look at it? Thanks! |
"ModuleNotFoundError: No module named 'modelopt.onnx.quantization.autotune.autotuner'" |
There was a problem hiding this comment.
Pull request overview
This PR implements the command-line interface (CLI) for the ONNX Q/DQ autotuning framework, completing part 3.3 of the automated QDQ placement tool integration. The PR builds upon the benchmark module (PR #837) and QDQAutotuner class (PR #838), providing a complete end-to-end workflow for automated quantization optimization of ONNX models using pattern-based region analysis and TensorRT performance measurement.
Changes:
- Added CLI (
__main__.py) with comprehensive argument parsing for model paths, quantization parameters, TensorRT benchmarking configuration, and workflow control - Implemented high-level workflow orchestration (
workflows.py) managing pattern-based region optimization, state persistence, baseline comparison, and benchmarking - Extended common data structures with
PatternSchemes,PatternCache, andConfigclasses for managing quantization schemes, caching patterns, and configuration
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
modelopt/onnx/quantization/autotune/__main__.py |
CLI implementation with argument parsing, input validation, and workflow invocation |
modelopt/onnx/quantization/autotune/workflows.py |
Workflow functions for benchmark initialization, pattern-based autotuning, and region filtering |
modelopt/onnx/quantization/autotune/common.py |
Extended with PatternSchemes, PatternCache, and Config dataclasses for scheme management and serialization |
tests/unit/onnx/quantization/autotune/test_config.py |
Unit tests for Config class default values, custom values, and parameter validation |
tests/gpu/onnx/quantization/autotune/test_workflow.py |
GPU test for quantized model export with Q/DQ insertion |
tests/_test_utils/onnx/quantization/autotune/models.py |
Test helper for creating simple ONNX models for autotuner testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
8a363da to
bee717a
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
What does this PR do?
This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:
PR 3.1: #837
PR 3.2 #838
PR 3.3: #839
Overview: ?
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.