Integrate Automated QDQ placement tool - part 3.3 by willg-nv · Pull Request #839 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-02T03:04:02Z

What does this PR do?

This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:

  python -m modelopt.onnx.quantization.autotune
      --onnx_path model.onnx --schemes_per_region 50
      --pattern_cache cache.yaml --qdq_baseline baseline.onnx
      --quant_type int8 --verbose

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Document will be added in part 4.
Did you update Changelog?: CHANGE log will be added in part 4.

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added a command-line interface for ONNX quantization autotuning with configurable parameters for models, output paths, quantization strategies, and TensorRT benchmarking.
- Introduced an automated workflow for pattern-based region optimization with state management, baseline comparison, and benchmarking capabilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-02T03:04:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-02T03:05:09Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

These changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants.

Changes

Cohort / File(s)	Summary
CLI Entry Point `modelopt/onnx/quantization/autotune/__main__.py`	Implements `run_autotune()` function with argument parsing via `_get_autotune_parser()`, input validation, TensorRT benchmark initialization, and orchestration of the region-pattern autotuning workflow. Includes error handling for keyboard interruption and general exceptions, plus logging of benchmark configuration.
Workflow Core `modelopt/onnx/quantization/autotune/workflows.py`	Provides `benchmark_onnx_model()` for latency measurement, `init_benchmark_instance()` for TensorRT benchmark setup, and `region_pattern_autotuning_workflow()` for automated Q/DQ optimization via region discovery, pattern filtering, per-region scheme iteration, model export, and state checkpointing.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI (run_autotune)
    participant Validator as Input Validator
    participant Benchmark as Benchmark Init
    participant Workflow as Autotuning Workflow
    participant Model as ONNX Model
    participant TensorRT as TensorRT Engine
    participant Output as Model Export

    User->>CLI: Invoke with arguments
    CLI->>Validator: Validate model & baseline paths
    Validator-->>CLI: Path valid / exit
    CLI->>Benchmark: Initialize benchmark instance
    Benchmark->>TensorRT: Configure with timing cache & plugins
    TensorRT-->>Benchmark: Instance ready
    Benchmark-->>CLI: Benchmark initialized
    CLI->>Workflow: Invoke region_pattern_autotuning_workflow
    Workflow->>Model: Load ONNX model
    Workflow->>Model: Load pattern cache & QDQ baseline
    Workflow->>Workflow: Profile regions & apply node filters
    loop For each region
        Workflow->>Workflow: Generate quantization schemes
        Workflow->>Model: Apply Q/DQ to region
        Workflow->>TensorRT: Benchmark model
        TensorRT-->>Workflow: Latency result
    end
    Workflow->>Output: Export optimized model
    Output-->>Workflow: Export complete
    Workflow->>Output: Save state checkpoint
    Output-->>Workflow: State saved
    Workflow-->>CLI: Return autotuner result
    CLI-->>User: Exit with status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Integrate Automated QDQ placement tool - part 3.3' clearly describes the main change: adding a QDQ autotuner CLI. It directly relates to the changeset which introduces new modules for ONNX Q/DQ autotuning workflow and a command-line entry point.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.

In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.

modelopt/onnx/quantization/autotune/__main__.py

modelopt/onnx/quantization/autotune/workflows.py

tests/unit/onnx/quantization/autotune/test_config.py

gcunhase · 2026-02-10T00:30:15Z

Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36

modelopt/onnx/quantization/autotune/__main__.py

tests/unit/onnx/quantization/autotune/test_config.py

willg-nv · 2026-02-12T00:51:09Z

Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36

Added, please check

ajrasane · 2026-02-12T09:34:11Z

/ok to test 0414b81

modelopt/onnx/quantization/autotune/common.py

tests/gpu/onnx/quantization/autotune/test_workflow.py

gcunhase · 2026-02-20T16:38:37Z

@willg-nv, I'm seeing the following errors in the pre-commit step:

modelopt/onnx/quantization/autotune/common.py:352: error: Name "RegionPattern" is not defined  [name-defined]
modelopt/onnx/quantization/autotune/common.py:406: error: Name "RegionPattern" is not defined  [name-defined]
modelopt/onnx/quantization/autotune/common.py:630: error: List item 0 has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionInputInsertionPoint"  [list-item]
modelopt/onnx/quantization/autotune/common.py:635: error: Name "temp_insertion_points" already defined on line 619  [no-redef]
modelopt/onnx/quantization/autotune/common.py:640: error: Argument 1 to "append" of "list" has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionInputInsertionPoint"  [arg-type]
modelopt/onnx/quantization/autotune/common.py:646: error: List item 0 has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionOutputInsertionPoint"  [list-item]
modelopt/onnx/quantization/autotune/common.py:650: error: Name "temp_insertion_points" already defined on line 619  [no-redef]
modelopt/onnx/quantization/autotune/common.py:655: error: Argument 1 to "append" of "list" has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionOutputInsertionPoint"  [arg-type]
Found 8 errors in 1 file (checked 1 source file)

gcunhase · 2026-02-24T01:44:08Z

@willg-nv the precommit fixes are working, thank you!

One last thing is the test_workflows.py might need to move to tests/gpu/onnx for TRT access. Thanks.

willg-nv · 2026-02-24T05:02:35Z

@willg-nv the precommit fixes are working, thank you!

One last thing is the test_workflows.py might need to move to tests/gpu/onnx for TRT access. Thanks.

done

gcunhase · 2026-02-24T14:07:03Z

/ok to test 8a363da

gcunhase · 2026-02-24T20:25:15Z

@willg-nv CICD pipeline is failing, can you please have a look at it? Thanks!

willg-nv · 2026-02-25T01:44:09Z

@willg-nv CICD pipeline is failing, can you please have a look at it? Thanks!

"ModuleNotFoundError: No module named 'modelopt.onnx.quantization.autotune.autotuner'"
I think this PR requires 3.1 and 3.2 to get merged.

Copilot

Pull request overview

This PR implements the command-line interface (CLI) for the ONNX Q/DQ autotuning framework, completing part 3.3 of the automated QDQ placement tool integration. The PR builds upon the benchmark module (PR #837) and QDQAutotuner class (PR #838), providing a complete end-to-end workflow for automated quantization optimization of ONNX models using pattern-based region analysis and TensorRT performance measurement.

Changes:

Added CLI (__main__.py) with comprehensive argument parsing for model paths, quantization parameters, TensorRT benchmarking configuration, and workflow control
Implemented high-level workflow orchestration (workflows.py) managing pattern-based region optimization, state persistence, baseline comparison, and benchmarking
Extended common data structures with PatternSchemes, PatternCache, and Config classes for managing quantization schemes, caching patterns, and configuration

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`modelopt/onnx/quantization/autotune/__main__.py`	CLI implementation with argument parsing, input validation, and workflow invocation
`modelopt/onnx/quantization/autotune/workflows.py`	Workflow functions for benchmark initialization, pattern-based autotuning, and region filtering
`modelopt/onnx/quantization/autotune/common.py`	Extended with PatternSchemes, PatternCache, and Config dataclasses for scheme management and serialization
`tests/unit/onnx/quantization/autotune/test_config.py`	Unit tests for Config class default values, custom values, and parameter validation
`tests/gpu/onnx/quantization/autotune/test_workflow.py`	GPU test for quantized model export with Q/DQ insertion
`tests/_test_utils/onnx/quantization/autotune/models.py`	Test helper for creating simple ONNX models for autotuner testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/gpu/onnx/quantization/autotune/test_workflow.py

modelopt/onnx/quantization/autotune/common.py

## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: Will Guo <willg@nvidia.com>

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv requested a review from a team as a code owner February 2, 2026 03:04

willg-nv requested a review from vishalpandya1990 February 2, 2026 03:04

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ benchmark - part 3.1 #837

Merged

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__main__.py Show resolved Hide resolved

modelopt/onnx/quantization/autotune/workflows.py Show resolved Hide resolved

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ autotuner - part 3.2 #838

Open

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.3 branch 2 times, most recently from 09e136a to e3ad6da Compare February 9, 2026 08:42