Add Automated QDQ placement reference - part 4.2 by willg-nv · Pull Request #842 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-03T02:48:56Z

What does this PR do?

This PR uploads reference document of Automated QDQ placement tool. This tool could automatically search QDQ insertion points with better performance.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Documentation
- Added comprehensive documentation for the Automatic Q/DQ Placement Optimizer, covering architecture overview, core components, optimization lifecycle, usage examples for both command-line and Python APIs, design rationale, current limitations, and a glossary of relevant terms.

copy-pr-bot · 2026-02-03T02:48:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-03T02:49:20Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

A new comprehensive documentation file describing the Automatic Q/DQ Placement Optimizer architecture for ONNX models. The document covers framework components, optimization workflows, data models, the full lifecycle from initialization through evaluation, usage examples via CLI and Python API, design rationale, limitations, and related terminology.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/reference/2_qdq_placement.rst`	New comprehensive reference documentation (1092 lines) detailing the Q/DQ Placement Optimizer architecture, including component descriptions, lifecycle stages, data models, usage examples, design rationale, and glossary.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding documentation for the Automated QDQ placement optimizer, identified as part 4.2 of a series.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@docs/source/reference/2_qdq_placement.rst`:
- Around line 143-144: The summary line claiming "Supports forward-only,
backward-only, and bidirectional search" is inconsistent with later text that
says only forward (downstream) region expansion is implemented and backward
expansion is a potential future addition; update the brief feature list to
accurately reflect the current implementation by changing that line to state
that only forward (downstream) region expansion is currently implemented and
that backward and bidirectional search are planned/experimental, and ensure the
change references the same "region search" wording so readers see the consistent
status across the document.
- Around line 343-347: Add explicit documentation for configuring TensorRT
plugin library paths: state the config key name (tensorrt_plugin_paths) and the
CLI flag (--tensorrt-plugin-paths) and an environment variable alternative
(TENSORRT_PLUGIN_PATH), describe the accepted formats (comma-separated list or
JSON array in the config file, single string for a single path) and where to
place them in the config/CLI, and note that plugins are loaded during TensorRT
engine creation/initialization (before inference/warmup) so they must be
accessible at startup; include a short example for both CLI and config file
usage and clarify failure behavior if a plugin path is invalid.
- Line 299: Update the documentation for init_benchmark_instance(timing_cache,
warmup, timing) to include explicit parameter types and descriptions (e.g.,
whether timing_cache is a file path string or a binary blob, the expected
type/units for warmup and timing), indicate which parameters are optional and
their default values, specify the function's return type and what the return
value represents, and add a short usage example showing a typical call and
expected outcome; reference the exact symbol init_benchmark_instance and its
parameters timing_cache, warmup, timing so readers can locate the function
implementation if they need more detail.
- Around line 1088-1091: Fix the two broken/renamed links in
docs/source/reference/2_qdq_placement.rst by updating the "ONNX Quantization"
link (currently https://onnx.ai/onnx/technical/quantization.html) to the correct
ONNX quantization documentation path or remove that list item if no direct
replacement exists, and update the "NVIDIA ModelOpt" GitHub URL from
https://github.com/NVIDIA/TensorRT-Model-Optimizer to the renamed repository
https://github.com/NVIDIA/Model-Optimizer; target the list entries labeled "ONNX
Quantization" and "NVIDIA ModelOpt" in the diff when making the changes.

🧹 Nitpick comments (3)

docs/source/reference/2_qdq_placement.rst (3)

899-914: Clarify dual behavior of export_onnx and benchmark.run.

The example shows two different usage patterns:

Lines 899-900: export_onnx("baseline.onnx", ...) exports to file, then benchmark.run("baseline.onnx") reads from file

Lines 912-913: export_onnx(None, ...) returns bytes, then benchmark.run(model_bytes) accepts bytes

This dual behavior (file path vs. in-memory bytes) should be explicitly documented in the method descriptions. Consider adding a note explaining when to use each approach (e.g., use bytes for intermediate schemes to avoid disk I/O).

390-407: Consider adding parameter tuning guidance.

The configuration parameters are well-documented with clear defaults, but users would benefit from guidance on:

When to increase maximum_generation_attempts (e.g., for models with many constraints)

When to decrease pattern_cache_minimum_distance (e.g., for fine-grained exploration)

Trade-offs between schemes_per_region and optimization time

Performance implications of pattern_cache_max_entries_per_pattern

This would help users make informed decisions when tuning the autotuner for their specific use cases.

920-1037: Consider adding troubleshooting and operational guidance sections.

The documentation thoroughly covers architecture and usage but would benefit from additional sections on:

Error Handling: What happens when TensorRT engine build fails? How are failures logged? Can users configure retry behavior?

Troubleshooting: Common issues and solutions:

What if no speedup is achieved?

What if TensorRT can't parse the model?

What if optimization takes too long?

Performance Expectations: How long does typical optimization take? What factors affect optimization time? (This would help users set realistic expectations)

Logging Configuration: How to enable verbose logging? What information is logged at different verbosity levels?

These additions would make the documentation more complete for production use cases.

docs/source/reference/2_qdq_placement.rst

gcunhase · 2026-02-26T16:49:18Z

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

docs/source/reference/2_qdq_placement.rst

gcunhase · 2026-02-26T20:11:05Z

docs/source/reference/2_qdq_placement.rst

+* Input tensor index to that region
+* Enables quantization of data flowing into subregions
+
+**InsertionScheme**


The beginning of this section says "Defines 3 types of Q/DQ insertion locations".

How does InsertionScheme and Resolution Process fit into this section? Should a new section be created?

InsertionScheme is collection of insertion points.
I have added one line to explain why "Resolution Process" is needed for Insertion points.

I think it should be reasonable to put them in section?

gcunhase · 2026-02-26T20:18:58Z

docs/source/reference/2_qdq_placement.rst

+6. Configuration (common.py)
+-----------------------------
+
+Config Class


Will this be included in the ModelOpt ONNX API doc here? @cjluo-nv

We should find a way to make this doc generation automatic as it will be very difficult to maintain it manually.

gcunhase · 2026-02-26T20:21:13Z

docs/source/reference/2_qdq_placement.rst

+Autotuning Workflow
+===================
+
+Complete Optimization Process


Potentially move this to #843

docs/source/reference/2_qdq_placement.rst

Signed-off-by: Will Guo <willg@nvidia.com>

Copilot

Pull request overview

Adds a new reference document describing the architecture and usage of the automatic ONNX Q/DQ placement autotuning tool (modelopt.onnx.quantization.autotune) within the ModelOpt documentation set.

Changes:

Introduces a comprehensive reference page covering architecture, core components, workflows, CLI usage, and Python API usage for automatic Q/DQ placement.
Documents state management / crash recovery and pattern-cache warm-start concepts.
Adds a glossary and external references relevant to ONNX quantization and TensorRT benchmarking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T16:30:12Z

docs/source/reference/2_qdq_placement.rst

+      Measure of how different two insertion schemes are, typically computed as Hamming
+      distance between their insertion point sets.


Glossary entry says scheme diversity is "typically computed as Hamming distance", but the implementation uses an edit distance based on symmetric difference of insertion-point sets (InsertionScheme.distance). Please adjust the glossary to match the actual metric used.

Suggested change

Measure of how different two insertion schemes are, typically computed as Hamming

distance between their insertion point sets.

Measure of how different two insertion schemes are, computed as an edit distance

based on the symmetric difference between their insertion point sets.

Copilot · 2026-03-02T16:30:13Z

docs/source/reference/2_qdq_placement.rst

@@ -0,0 +1,1117 @@
+====================================================


The document title uses an overline+underline style. Other pages under docs/source/reference/ (e.g., 2_security.rst, 1_modelopt_api.rst) use an underline-only top-level heading; consider matching that convention for consistent rendering/styling.

Suggested change

====================================================

Copilot · 2026-03-02T16:30:13Z

docs/source/reference/2_qdq_placement.rst

+**Region Discovery Algorithm:**
+
+1. **Bottom-Up Search**: Start from individual operations
+2. **Local Expansion**: Expand forward/backward from seed nodes within step limits


This algorithm description says regions expand "forward/backward" from seed nodes, but the current implementation in region_search.py is forward-only (it builds forward reachability and traverses from inputs → outputs). This also conflicts with the later "Forward-Only Region Search" section; please update this step to reflect forward-only behavior (or document any backward traversal if it exists).

Suggested change

2. **Local Expansion**: Expand forward/backward from seed nodes within step limits

2. **Local Expansion**: Expand forward from seed nodes (inputs → outputs) within step limits

Copilot · 2026-03-02T16:30:13Z

docs/source/reference/2_qdq_placement.rst

+
+* Individual operations or small operation sequences
+* Conv, MatMul, Gemm, Add, etc.
+* Forward/backward expansion around seed nodes


This bullet mentions "Forward/backward expansion" for leaf region building, but the current RegionPartitioner logic in region_search.py traverses forward along tensor users (dataflow direction). Consider rewording to "forward expansion" for accuracy and to avoid contradicting the later forward-only section.

Suggested change

* Forward/backward expansion around seed nodes

* Forward expansion around seed nodes

Copilot · 2026-03-02T16:30:14Z

docs/source/reference/2_qdq_placement.rst

+* Schemes are filtered by minimum Hamming distance
+* Ensures cache contains diverse candidates
+* Prevents redundant similar schemes


The docs describe diversity filtering as using "Hamming distance", but the implementation uses InsertionScheme.distance() which is an edit distance computed as the size of the symmetric difference of insertion-point sets. To avoid confusion (and match the code terminology), consider calling this "edit distance" / "symmetric-difference distance" consistently.

Suggested change

* Schemes are filtered by minimum Hamming distance

* Ensures cache contains diverse candidates

* Prevents redundant similar schemes

* Schemes are filtered by a minimum edit distance (symmetric-difference over insertion-point sets)

* Ensures cache contains diverse candidates

* Prevents redundant, highly similar schemes

Copilot · 2026-03-02T16:30:14Z

docs/source/reference/2_qdq_placement.rst

+Diversity Filtering
+~~~~~~~~~~~~~~~~~~~
+
+* Compute Hamming distance between schemes


This section again refers to "Hamming distance" between schemes, but the code computes an edit distance via symmetric difference of insertion-point sets (InsertionScheme.distance). Please align the terminology here with the implementation (or clarify the equivalence if you intend a Hamming-over-bitvector interpretation).

Suggested change

* Compute Hamming distance between schemes

* Compute distance between schemes via symmetric difference of insertion-point sets

Copilot · 2026-03-02T16:30:14Z

docs/source/reference/2_qdq_placement.rst

+Low-Level API
+~~~~~~~~~~~~~
+
+Initialize the global benchmark with ``init_benchmark_instance``, then use ``benchmark_onnx_model`` for measurements. The workflow uses this same global; when calling the workflow from Python you do not need to call ``init_benchmark_instance`` yourself (the CLI does it).


This sentence is internally inconsistent: it says you don't need to call init_benchmark_instance when calling the workflow from Python because the CLI does it. In practice, the CLI initializes the global benchmark only for CLI runs; for Python usage you still need to call init_benchmark_instance (or the workflow needs to do it) before benchmark_onnx_model can return real latencies.

Suggested change

Initialize the global benchmark with ``init_benchmark_instance``, then use ``benchmark_onnx_model`` for measurements. The workflow uses this same global; when calling the workflow from Python you do not need to call ``init_benchmark_instance`` yourself (the CLI does it).

Initialize the global benchmark with ``init_benchmark_instance``, then use ``benchmark_onnx_model`` for measurements. The workflow and low-level helpers share this same global instance; when calling them from Python you must call ``init_benchmark_instance`` yourself before any benchmarking. The CLI initializes the benchmark automatically only for CLI runs.

willg-nv · 2026-03-03T02:29:04Z

According to discussion. ModelOpt API documents are generated automatically, a seperated reference doc is not needed. I have moved some sections to Auto QDQ guide doc. I will close this PR.

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

modelopt-bot mentioned this pull request Feb 14, 2026

Add Automated QDQ placement example - Part 4.1 #841

Open

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes

docs/source/reference/2_qdq_placement.rst Outdated Show resolved Hide resolved

willg-nv added 2 commits March 2, 2026 09:34

Integrate Automated QDQ placement tool - part 4.2

03356f8

Signed-off-by: Will Guo <willg@nvidia.com>

resolve comments

3778027

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.2 branch from bf5a4c2 to 3778027 Compare March 2, 2026 10:00

cjluo-nv requested review from Copilot and kevalmorabia97 March 2, 2026 16:24

Copilot started reviewing on behalf of cjluo-nv March 2, 2026 16:25 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

willg-nv closed this Mar 3, 2026

		Measure of how different two insertion schemes are, typically computed as Hamming
		distance between their insertion point sets.

		@@ -0,0 +1,1117 @@
		====================================================

	2. Local Expansion: Expand forward/backward from seed nodes within step limits
	2. Local Expansion: Expand forward from seed nodes (inputs → outputs) within step limits

	* Forward/backward expansion around seed nodes
	* Forward expansion around seed nodes

	* Compute Hamming distance between schemes
	* Compute distance between schemes via symmetric difference of insertion-point sets

	Initialize the global benchmark with ``init_benchmark_instance``, then use ``benchmark_onnx_model`` for measurements. The workflow uses this same global; when calling the workflow from Python you do not need to call ``init_benchmark_instance`` yourself (the CLI does it).
	Initialize the global benchmark with ``init_benchmark_instance``, then use ``benchmark_onnx_model`` for measurements. The workflow and low-level helpers share this same global instance; when calling them from Python you must call ``init_benchmark_instance`` yourself before any benchmarking. The CLI initializes the benchmark automatically only for CLI runs.

Conversation

willg-nv commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcunhase commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcunhase Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

willg-nv Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gcunhase Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcunhase Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

willg-nv commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

willg-nv commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

gcunhase Feb 26, 2026 •

edited

Loading