Integrate Automated QDQ placement tool - part 4.3#843
Integrate Automated QDQ placement tool - part 4.3#843willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughNew documentation file added detailing automated Q/DQ placement optimization for ONNX models, including quick start guides, configuration options, advanced usage patterns, CLI usage, troubleshooting, and API references. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Will Guo <willg@nvidia.com>
76c395b to
916687c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print(" Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes. +The optimal scheme count depends on your model's structure: + +* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns. +* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.
| if autotuner.current_profile_pattern_schemes is None: | ||
| print(" Already profiled, skipping") | ||
| continue |
There was a problem hiding this comment.
Potentially confusing logic in code example.
The code checks if autotuner.current_profile_pattern_schemes is None: and then prints "Already profiled, skipping". This logic appears counter-intuitive—typically, checking if something is None suggests it doesn't exist or needs initialization, not that it's already complete. Without API documentation context, this example may confuse users about when regions are skipped.
Consider adding a clarifying comment explaining the API behavior, or verify this logic is correct:
# Check if already profiled (API returns None when region is complete)
if autotuner.current_profile_pattern_schemes is None:
print(" Already profiled, skipping")
continue🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` around lines 279 - 281, The
conditional checking autotuner.current_profile_pattern_schemes being None is
counter-intuitive as written; update the example around the if statement that
references autotuner.current_profile_pattern_schemes to either (a) add a
one-line clarifying comment explaining the API semantics (e.g., that the API
returns None when a region is already profiled/complete) or (b) change the
condition if it was reversed after confirming the API behavior; ensure the
comment references autotuner.current_profile_pattern_schemes and the subsequent
print(" Already profiled, skipping") / continue so readers understand why None
means "already profiled."
| python -m modelopt.onnx.quantization.autotune \ | ||
| --model bert_large.onnx \ | ||
| --output ./bert_large_run \ | ||
| --pattern-cache ./bert_base_run/pattern_cache.yaml |
There was a problem hiding this comment.
Inconsistent filename reference.
Line 516 references ./bert_base_run/pattern_cache.yaml, but based on the output structure shown in lines 60 and 150, the pattern cache filename should be autotuner_state_pattern_cache.yaml.
📝 Proposed fix
- --pattern-cache ./bert_base_run/pattern_cache.yaml
+ --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml |
🤖 Prompt for AI Agents
In `@docs/source/guides/9_qdq_placement.rst` at line 516, Replace the inconsistent
pattern-cache filename reference `./bert_base_run/pattern_cache.yaml` with the
correct `autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so
the flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
|
Suggestion to rename this as |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.
Changes:
- Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
- Documents both CLI and Python API usage patterns
- Includes troubleshooting, best practices, FAQs, and multiple examples
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --pattern-cache ./bert_base_run/pattern_cache.yaml | ||
|
|
||
| python -m modelopt.onnx.quantization.autotune \ | ||
| --model roberta_base.onnx \ | ||
| --output ./roberta_run \ | ||
| --pattern-cache ./bert_base_run/pattern_cache.yaml |
There was a problem hiding this comment.
Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| python -m modelopt.onnx.quantization.autotune \ | |
| --model roberta_base.onnx \ | |
| --output ./roberta_run \ | |
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml | |
| python -m modelopt.onnx.quantization.autotune \ | |
| --model roberta_base.onnx \ | |
| --output ./roberta_run \ | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml |
| --model gpt2_medium.onnx \ | ||
| --output ./gpt2_medium_run \ | ||
| --quant-type fp8 \ | ||
| --pattern-cache ./gpt2_small_run/pattern_cache.yaml |
There was a problem hiding this comment.
Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.
| --pattern-cache ./gpt2_small_run/pattern_cache.yaml | |
| --pattern-cache ./gpt2_small_run/autotuner_state_pattern_cache.yaml |
| * **50-100 schemes**: Balanced (recommended for most cases) | ||
| * **100-200+ schemes**: Thorough exploration, use with pattern cache | ||
|
|
||
|
|
There was a problem hiding this comment.
Remove the extra blank line here for consistency with the RST formatting throughout the document.
| API Reference | ||
| ============= | ||
|
|
||
| For detailed API documentation, see :doc:`../reference/2_qdq_placement`. |
There was a problem hiding this comment.
The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.
| For detailed API documentation, see :doc:`../reference/2_qdq_placement`. | |
| For detailed API documentation, see the module-level documentation for ``modelopt.onnx.quantization.autotune``. |
| * Build a pattern cache library for your model family | ||
| * Integrate optimized models into your deployment pipeline | ||
|
|
||
| For architectural details and API reference, see :doc:`../reference/2_qdq_placement`. |
There was a problem hiding this comment.
The referenced documentation file ../reference/2_qdq_placement does not exist in the repository. Either create this API reference documentation file or update the reference to point to an existing file. This broken link will cause Sphinx build warnings or errors.
| For architectural details and API reference, see :doc:`../reference/2_qdq_placement`. | |
| For architectural details and API reference, refer to the ModelOpt ONNX quantization documentation. |
| --pattern-cache ./bert_base_run/pattern_cache.yaml | ||
|
|
||
| python -m modelopt.onnx.quantization.autotune \ | ||
| --model roberta_base.onnx \ | ||
| --output ./roberta_run \ | ||
| --pattern-cache ./bert_base_run/pattern_cache.yaml |
There was a problem hiding this comment.
Inconsistent pattern cache filename. Earlier in the documentation (line 60, 150, 156), the file is named autotuner_state_pattern_cache.yaml, but here it's referenced as pattern_cache.yaml. Update to use the consistent filename autotuner_state_pattern_cache.yaml throughout the document.
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| python -m modelopt.onnx.quantization.autotune \ | |
| --model roberta_base.onnx \ | |
| --output ./roberta_run \ | |
| --pattern-cache ./bert_base_run/pattern_cache.yaml | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml | |
| python -m modelopt.onnx.quantization.autotune \ | |
| --model roberta_base.onnx \ | |
| --output ./roberta_run \ | |
| --pattern-cache ./bert_base_run/autotuner_state_pattern_cache.yaml |
What does this PR do?
This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit