Integrate Automated QDQ placement tool - part 4.3 by willg-nv · Pull Request #843 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-03T02:52:47Z

What does this PR do?

This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Documentation
- Added comprehensive guide for Automated Q/DQ Placement Optimization, including CLI and Python API examples, configuration options, deployment strategies, and troubleshooting guidance.

copy-pr-bot · 2026-02-03T02:52:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-03T02:53:09Z

📝 Walkthrough

Walkthrough

New documentation file added detailing automated Q/DQ placement optimization for ONNX models, including quick start guides, configuration options, advanced usage patterns, CLI usage, troubleshooting, and API references.

Changes

Cohort / File(s)	Summary
Documentation – Q/DQ Placement Guide `docs/source/guides/9_qdq_placement.rst`	New comprehensive guide covering automated Q/DQ node placement optimization for ONNX models. Includes quick start via CLI and Python API, detailed operation workflow, configuration options, pattern cache management, deployment guidance, troubleshooting, and frequently asked questions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Integrate Automated QDQ placement tool - part 4.3' accurately reflects the main change: adding comprehensive documentation for the Automated Q/DQ Placement Optimization tool.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Will Guo <willg@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print("  Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.

🧹 Nitpick comments (1)

docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.

The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes.
+The optimal scheme count depends on your model's structure:
+
+* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns.
+* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.

coderabbitai · 2026-02-03T02:56:02Z

docs/source/guides/9_qdq_placement.rst

+       if autotuner.current_profile_pattern_schemes is None:
+           print("  Already profiled, skipping")
+           continue


⚠️ Potential issue | 🟡 Minor

Potentially confusing logic in code example.

The code checks if autotuner.current_profile_pattern_schemes is None: and then prints "Already profiled, skipping". This logic appears counter-intuitive—typically, checking if something is None suggests it doesn't exist or needs initialization, not that it's already complete. Without API documentation context, this example may confuse users about when regions are skipped.

Consider adding a clarifying comment explaining the API behavior, or verify this logic is correct:

# Check if already profiled (API returns None when region is complete) if autotuner.current_profile_pattern_schemes is None: print(" Already profiled, skipping") continue

🤖 Prompt for AI Agents

In `@docs/source/guides/9_qdq_placement.rst` around lines 279 - 281, The conditional checking autotuner.current_profile_pattern_schemes being None is counter-intuitive as written; update the example around the if statement that references autotuner.current_profile_pattern_schemes to either (a) add a one-line clarifying comment explaining the API semantics (e.g., that the API returns None when a region is already profiled/complete) or (b) change the condition if it was reversed after confirming the API behavior; ensure the comment references autotuner.current_profile_pattern_schemes and the subsequent print(" Already profiled, skipping") / continue so readers understand why None means "already profiled."

coderabbitai · 2026-02-03T02:56:02Z

docs/source/guides/9_qdq_placement.rst

+   python -m modelopt.onnx.quantization.autotune \
+       --model bert_large.onnx \
+       --output ./bert_large_run \
+       --pattern-cache ./bert_base_run/pattern_cache.yaml


⚠️ Potential issue | 🟡 Minor

Inconsistent filename reference.

Line 516 references ./bert_base_run/pattern_cache.yaml, but based on the output structure shown in lines 60 and 150, the pattern cache filename should be autotuner_state_pattern_cache.yaml.

📝 Proposed fix

- --pattern-cache ./bert_base_run/pattern_cache.yaml + --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--pattern-cache ./bert_base_run/pattern_cache.yaml

--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

🤖 Prompt for AI Agents

In `@docs/source/guides/9_qdq_placement.rst` at line 516, Replace the inconsistent pattern-cache filename reference `./bert_base_run/pattern_cache.yaml` with the correct `autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the flag matches the output structure; update the single occurrence of `--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml`.

gcunhase · 2026-02-26T16:49:19Z

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

Copilot

Pull request overview

This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.

Changes:

Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
Documents both CLI and Python API usage patterns
Includes troubleshooting, best practices, FAQs, and multiple examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T18:21:09Z

docs/source/guides/9_qdq_placement.rst

+       --pattern-cache ./bert_base_run/pattern_cache.yaml
+
+   python -m modelopt.onnx.quantization.autotune \
+       --model roberta_base.onnx \
+       --output ./roberta_run \
+       --pattern-cache ./bert_base_run/pattern_cache.yaml


Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change

--pattern-cache ./bert_base_run/pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \

--model roberta_base.onnx \

--output ./roberta_run \

--pattern-cache ./bert_base_run/pattern_cache.yaml

--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \

--model roberta_base.onnx \

--output ./roberta_run \

--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

Copilot · 2026-02-27T18:21:09Z

docs/source/guides/9_qdq_placement.rst

+       --model gpt2_medium.onnx \
+       --output ./gpt2_medium_run \
+       --quant-type fp8 \
+       --pattern-cache ./gpt2_small_run/pattern_cache.yaml


Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change

--pattern-cache ./gpt2_small_run/pattern_cache.yaml

--pattern-cache ./gpt2_small_run/autotuner_state_pattern_cache.yaml

Copilot · 2026-02-27T18:21:09Z

docs/source/guides/9_qdq_placement.rst

+* **50-100 schemes**: Balanced (recommended for most cases)
+* **100-200+ schemes**: Thorough exploration, use with pattern cache
+
+


Remove the extra blank line here for consistency with the RST formatting throughout the document.

Suggested change

Copilot · 2026-02-27T18:21:09Z

docs/source/guides/9_qdq_placement.rst

+API Reference
+=============
+
+For detailed API documentation, see :doc:`../reference/2_qdq_placement`.


The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.

Suggested change

For detailed API documentation, see :doc:`../reference/2_qdq_placement`.

For detailed API documentation, see the module-level documentation for ``modelopt.onnx.quantization.autotune``.

Copilot · 2026-02-27T18:21:10Z

docs/source/guides/9_qdq_placement.rst

+* Build a pattern cache library for your model family
+* Integrate optimized models into your deployment pipeline
+
+For architectural details and API reference, see :doc:`../reference/2_qdq_placement`.


The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.

Suggested change

For architectural details and API reference, see :doc:`../reference/2_qdq_placement`.

For architectural details and API reference, refer to the ModelOpt ONNX quantization documentation.

Copilot · 2026-02-27T18:21:10Z

docs/source/guides/9_qdq_placement.rst

+       --pattern-cache ./bert_base_run/pattern_cache.yaml
+
+   python -m modelopt.onnx.quantization.autotune \
+       --model roberta_base.onnx \
+       --output ./roberta_run \
+       --pattern-cache ./bert_base_run/pattern_cache.yaml


Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change

--pattern-cache ./bert_base_run/pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \

--model roberta_base.onnx \

--output ./roberta_run \

--pattern-cache ./bert_base_run/pattern_cache.yaml

--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \

--model roberta_base.onnx \

--output ./roberta_run \

--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

Integrate Automated QDQ placement tool - part 4.3

916687c

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 76c395b to 916687c Compare February 3, 2026 02:53

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

modelopt-bot mentioned this pull request Feb 14, 2026

Add Automated QDQ placement example - Part 4.1 #841

Open

gcunhase mentioned this pull request Feb 26, 2026

Add Automated QDQ placement reference - part 4.2 #842

Open

cjluo-nv requested a review from Copilot February 27, 2026 18:16

Copilot started reviewing on behalf of cjluo-nv February 27, 2026 18:17 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

	--pattern-cache ./bert_base_run/pattern_cache.yaml
	--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

	--pattern-cache ./gpt2_small_run/pattern_cache.yaml
	--pattern-cache ./gpt2_small_run/autotuner_state_pattern_cache.yaml

		* 50-100 schemes: Balanced (recommended for most cases)
		* 100-200+ schemes: Thorough exploration, use with pattern cache

	For detailed API documentation, see :doc:`../reference/2_qdq_placement`.
	For detailed API documentation, see the module-level documentation for ``modelopt.onnx.quantization.autotune``.

	For architectural details and API reference, see :doc:`../reference/2_qdq_placement`.
	For architectural details and API reference, refer to the ModelOpt ONNX quantization documentation.

Conversation

willg-nv commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gcunhase commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

willg-nv commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading