Skip to content

Integrate Automated QDQ placement tool - part 4.3#843

Open
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3
Open

Integrate Automated QDQ placement tool - part 4.3#843
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 3, 2026

What does this PR do?

This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide for Automated Q/DQ Placement Optimization, including CLI and Python API examples, configuration options, deployment strategies, and troubleshooting guidance.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

New documentation file added detailing automated Q/DQ placement optimization for ONNX models, including quick start guides, configuration options, advanced usage patterns, CLI usage, troubleshooting, and API references.

Changes

Cohort / File(s) Summary
Documentation – Q/DQ Placement Guide
docs/source/guides/9_qdq_placement.rst
New comprehensive guide covering automated Q/DQ node placement optimization for ONNX models. Includes quick start via CLI and Python API, detailed operation workflow, configuration options, pattern cache management, deployment guidance, troubleshooting, and frequently asked questions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Integrate Automated QDQ placement tool - part 4.3' accurately reflects the main change: adding comprehensive documentation for the Automated Q/DQ Placement Optimization tool.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 76c395b to 916687c Compare February 3, 2026 02:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print("  Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)

457-457: Clarify scheme selection guidance.

The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.

💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes.
+The optimal scheme count depends on your model's structure:
+
+* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns.
+* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.

Comment on lines +279 to +281
if autotuner.current_profile_pattern_schemes is None:
print(" Already profiled, skipping")
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potentially confusing logic in code example.

The code checks if autotuner.current_profile_pattern_schemes is None: and then prints "Already profiled, skipping". This logic appears counter-intuitive—typically, checking if something is None suggests it doesn't exist or needs initialization, not that it's already complete. Without API documentation context, this example may confuse users about when regions are skipped.

Consider adding a clarifying comment explaining the API behavior, or verify this logic is correct:

# Check if already profiled (API returns None when region is complete)
if autotuner.current_profile_pattern_schemes is None:
    print("  Already profiled, skipping")
    continue
🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` around lines 279 - 281, The
conditional checking autotuner.current_profile_pattern_schemes being None is
counter-intuitive as written; update the example around the if statement that
references autotuner.current_profile_pattern_schemes to either (a) add a
one-line clarifying comment explaining the API semantics (e.g., that the API
returns None when a region is already profiled/complete) or (b) change the
condition if it was reversed after confirming the API behavior; ensure the
comment references autotuner.current_profile_pattern_schemes and the subsequent
print("  Already profiled, skipping") / continue so readers understand why None
means "already profiled."

python -m modelopt.onnx.quantization.autotune \
--model bert_large.onnx \
--output ./bert_large_run \
--pattern-cache ./bert_base_run/pattern_cache.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent filename reference.

Line 516 references ./bert_base_run/pattern_cache.yaml, but based on the output structure shown in lines 60 and 150, the pattern cache filename should be autotuner_state_pattern_cache.yaml.

📝 Proposed fix
-       --pattern-cache ./bert_base_run/pattern_cache.yaml
+       --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--pattern-cache ./bert_base_run/pattern_cache.yaml
--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml
🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` at line 516, Replace the inconsistent
pattern-cache filename reference `./bert_base_run/pattern_cache.yaml` with the
correct `autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so
the flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.

@gcunhase
Copy link
Contributor

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.

Changes:

  • Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
  • Documents both CLI and Python API usage patterns
  • Includes troubleshooting, best practices, FAQs, and multiple examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +516 to +521
--pattern-cache ./bert_base_run/pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/pattern_cache.yaml
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change
--pattern-cache ./bert_base_run/pattern_cache.yaml
python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/pattern_cache.yaml
--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml
python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

Copilot uses AI. Check for mistakes.
--model gpt2_medium.onnx \
--output ./gpt2_medium_run \
--quant-type fp8 \
--pattern-cache ./gpt2_small_run/pattern_cache.yaml
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change
--pattern-cache ./gpt2_small_run/pattern_cache.yaml
--pattern-cache ./gpt2_small_run/autotuner_state_pattern_cache.yaml

Copilot uses AI. Check for mistakes.
* **50-100 schemes**: Balanced (recommended for most cases)
* **100-200+ schemes**: Thorough exploration, use with pattern cache


Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the extra blank line here for consistency with the RST formatting throughout the document.

Suggested change

Copilot uses AI. Check for mistakes.
API Reference
=============

For detailed API documentation, see :doc:`../reference/2_qdq_placement`.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.

Suggested change
For detailed API documentation, see :doc:`../reference/2_qdq_placement`.
For detailed API documentation, see the module-level documentation for ``modelopt.onnx.quantization.autotune``.

Copilot uses AI. Check for mistakes.
* Build a pattern cache library for your model family
* Integrate optimized models into your deployment pipeline

For architectural details and API reference, see :doc:`../reference/2_qdq_placement`.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.

Suggested change
For architectural details and API reference, see :doc:`../reference/2_qdq_placement`.
For architectural details and API reference, refer to the ModelOpt ONNX quantization documentation.

Copilot uses AI. Check for mistakes.
Comment on lines +516 to +521
--pattern-cache ./bert_base_run/pattern_cache.yaml

python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/pattern_cache.yaml
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.

Suggested change
--pattern-cache ./bert_base_run/pattern_cache.yaml
python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/pattern_cache.yaml
--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml
python -m modelopt.onnx.quantization.autotune \
--model roberta_base.onnx \
--output ./roberta_run \
--pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants