Skip to content

Integrate Automated QDQ autotuner - part 3.2#838

Open
willg-nv wants to merge 13 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part3.2
Open

Integrate Automated QDQ autotuner - part 3.2#838
willg-nv wants to merge 13 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part3.2

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 2, 2026

What does this PR do?

This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow.

The workflow is:

  1. uses RegionSearch to build regions
  2. generate QDQ ONNX models and evaluate perf
  3. save best model

This PR is part 2/4 of #703.

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Not in this part.
  • Did you add or update any necessary documentation?: No, document will be updated in part 4.
  • Did you update Changelog?: No, change log will be updated when all changes are ready.

Additional Information

Summary by CodeRabbit

  • New Features
    • Introduced ONNX Q/DQ autotuning framework with automatic region discovery and pattern-based optimization.
    • Added model profiling and quantization scheme generation capabilities.
    • Enabled state persistence and quantization model export functionality.
    • Introduced configuration management for quantization parameters and profiling workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@willg-nv willg-nv requested a review from a team as a code owner February 2, 2026 02:58
@willg-nv willg-nv requested a review from ajrasane February 2, 2026 02:58
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a new ONNX quantization autotuning module that enables automatic Q/DQ (Quantize/Dequantize) node insertion and optimization using pattern-based region analysis. Provides a comprehensive framework for discovering optimal insertion points, profiling schemes, and exporting quantized models.

Changes

Cohort / File(s) Summary
Module Initialization
modelopt/onnx/quantization/autotune/__init__.py
Exposes public API surface: QDQAutotuner class, configuration/exception types (Config, InsertionScheme, PatternSchemes, RegionType), insertion point abstractions, and utility classes (PatternCache, Region, RegionPattern, CombinedRegionSearch).
Core Autotuner Implementation
modelopt/onnx/quantization/autotune/autotuner.py
Implements QDQAutotunerBase and QDQAutotuner with region discovery, pattern-based Q/DQ insertion logic, profiling workflow, state management, graph mutation, insertion point resolution, and ONNX export capabilities. Supports scheme generation, convergence tracking, FP8 conversion, and pattern cache integration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Autotuner as QDQAutotuner
    participant RegionSearch as CombinedRegionSearch
    participant Profiler as Profiling System
    participant Inserter as Q/DQ Insertion
    participant Exporter as ONNX Exporter

    User->>Autotuner: initialize(config, pattern_cache)
    Autotuner->>Autotuner: Load model & init state

    User->>RegionSearch: discover regions
    RegionSearch-->>Autotuner: return regions

    loop For each region
        User->>Autotuner: set_profile_region(region)
        Autotuner->>Autotuner: Commit profiling outcomes
        Autotuner->>Profiler: Prepare region-pattern pairs
        
        loop Generate candidates
            User->>Autotuner: generate()
            Autotuner->>Inserter: Build insertion scheme
            Inserter->>Inserter: Insert Q/DQ nodes
            User->>Autotuner: submit(latency_ms)
            Autotuner->>Autotuner: Track performance metrics
        end
    end

    User->>Autotuner: export_onnx(best=True)
    Autotuner->>Inserter: Apply best scheme
    Inserter->>Exporter: Finalize Q/DQ graph
    Exporter-->>User: return quantized ONNX bytes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Integrate Automated QDQ autotuner - part 3.2' accurately describes the PR's main objective: integrating the QDQAutotuner class implementation into the codebase as part 3.2 of a larger feature series.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 1024-1029: The try/except around graph.cleanup().toposort()
swallows all exceptions (except Exception as e) and merely logs a warning, which
can hide serious graph corruption; update the handler in autotuner.py to either
catch only expected exception types (e.g., specific cleanup/toposort exceptions)
or log the error and re-raise it so execution stops on unexpected failures —
locate the graph.cleanup().toposort() call and replace the broad except with
either a narrowed except for known recoverable exceptions or add a raise after
logger.warning/failure log to propagate the error.
- Line 622: Remove the redundant local import "from datetime import datetime"
(the one added at line with the single import statement) in autotuner.py; the
module already imports datetime at the top of the file, so delete this local
import to avoid duplication and potential shadowing (look for the statement
"from datetime import datetime" inside the function or block and remove it).
- Around line 912-918: The zero-point arrays q_zp_values (and the corresponding
dq_zp_values) are created with a hardcoded dtype np.int8 which can mismatch the
QuantizeLinear/DequantizeLinear output type when quant_type is "uint8" or other
types; update their construction to use the same dtype as the computed
quant_dtype instead of np.int8 so q_zp_values and dq_zp_values match the
quantized output element type used when building q_inputs and dq_inputs (refer
to q_scale_values, q_zp_values, q_inputs and the corresponding dq_* variables to
locate where to change the dtype).
- Around line 1013-1021: The import of get_tensor_consumer_node_indices is wrong
and causes an import error; replace that import with get_tensor_consumer_nodes
and update any usage names accordingly (the code that uses tensor_users_map
already expects a defaultdict(list) so no KeyError handling is needed).
Specifically, change the symbol imported from
modelopt.onnx.quantization.graph_utils from get_tensor_consumer_node_indices to
get_tensor_consumer_nodes and ensure tensor_users_map is assigned from
get_tensor_consumer_nodes(...) where used in the autotuner (references:
get_tensor_consumer_node_indices, get_tensor_consumer_nodes, tensor_users_map).
🧹 Nitpick comments (4)
modelopt/onnx/quantization/autotune/autotuner.py (4)

229-229: Consider defining config attributes explicitly.

Using getattr(self.config, "maximum_generation_attempts", 100) with defaults (also seen at lines 718-719 and 744) suggests these attributes may not be formally defined on the Config class. This pattern makes it harder to discover available configuration options.

💡 Suggestion

Consider adding these attributes to the Config class with documented defaults rather than relying on getattr fallbacks:

# In Config class
maximum_generation_attempts: int = 100
top_percent_to_mutate: float = 0.1
minimum_schemes_to_mutate: int = 1
maximum_mutations: int = 3

333-335: Replace assertions with explicit checks for runtime validation.

Assertions on lines 333-335 (and similarly at line 314) are used for validating runtime conditions. Since assertions can be disabled with python -O, these should be explicit checks for production code.

🛡️ Proposed fix
-                full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph)
-                assert full_insertion_scheme is not None
-                all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme)
-                assert isinstance(all_region_ips, set)
+                full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph)
+                if full_insertion_scheme is None:
+                    logger.warning(f"Failed to get full insertion scheme for region {region.id}")
+                    continue
+                all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme)
+                if not isinstance(all_region_ips, set):
+                    raise TypeError(f"Expected set from pattern.matches, got {type(all_region_ips)}")

972-985: Assertions used for critical runtime validation.

These assertions validate critical invariants (node index bounds, input index bounds, tensor name matching) but can be disabled with python -O. Consider using explicit checks with ValueError/IndexError for production safety.

🛡️ Proposed fix
             if node_index is not None:
-                assert node_index < len(graph.nodes), "Node index out of range"
+                if node_index >= len(graph.nodes):
+                    raise IndexError(f"Node index {node_index} out of range (max: {len(graph.nodes) - 1})")
                 target_node = graph.nodes[node_index]
-                assert input_index is not None, "Input index must be set when node index is set"
-                assert input_index < len(target_node.inputs), (
-                    f"Input index out of range for node {target_node.name}"
-                )
+                if input_index is None:
+                    raise ValueError("Input index must be set when node index is set")
+                if input_index >= len(target_node.inputs):
+                    raise IndexError(f"Input index {input_index} out of range for node {target_node.name}")
                 original_tensor = target_node.inputs[input_index]
-                assert tensor_name == original_tensor.name, (
-                    f"Tensor name mismatch for node {target_node.name} input {input_index}"
-                )
+                if tensor_name != original_tensor.name:
+                    raise ValueError(f"Tensor name mismatch: expected '{tensor_name}', got '{original_tensor.name}'")
             else:
-                assert tensor_name in tensor_map, f"Tensor {tensor_name} not found in tensor map"
-                assert input_index is None, "Input index must be None when node index is None"
+                if tensor_name not in tensor_map:
+                    raise KeyError(f"Tensor {tensor_name} not found in tensor map")
+                if input_index is not None:
+                    raise ValueError("Input index must be None when node index is None")

1042-1049: Consider iterative approach for deep region hierarchies.

_visit_region_recursively uses recursion which could hit Python's stack limit for very deep region hierarchies. While this is unlikely for typical ONNX models, an iterative approach would be more robust.

♻️ Iterative alternative
def _visit_region_recursively(self, region: Region) -> list[Region]:
    """Iteratively traverse region hierarchy and collect all regions."""
    regions = []
    stack = [region]
    while stack:
        current = stack.pop()
        regions.append(current)
        stack.extend(current.get_children())
    return regions

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.2 branch 3 times, most recently from 1ffcf7f to bd18dfa Compare February 9, 2026 08:36
@ajrasane
Copy link
Contributor

/ok to test b02fef1

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.2 branch from c5340bb to 68ffa5a Compare February 16, 2026 23:51
@gcunhase
Copy link
Contributor

/ok to test 68ffa5a

@gcunhase
Copy link
Contributor

gcunhase commented Feb 19, 2026

Tests are failing because common.py is in #839. We may need to move it here to pass the CICD tests. @willg-nv

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.2 branch from 68ffa5a to ebba6e6 Compare February 23, 2026 02:20
@willg-nv
Copy link
Contributor Author

Tests are failing because common.py is in #839. We may need to move it here to pass the CICD tests. @willg-nv

I have fixed the test failures, please help rerun the job, thanks!

@gcunhase
Copy link
Contributor

/ok to test 6124ac9

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the core QDQAutotuner workflow for ONNX quantization autotuning, including region discovery, scheme generation/submission, Q/DQ insertion and export, plus state/pattern-cache persistence and unit tests. This builds the central driver class that later CLI/integration pieces can orchestrate.

Changes:

  • Added QDQAutotunerBase/QDQAutotuner implementation (region search, scheme generation, submit/export, save/load state, pattern-cache seeding/import).
  • Extended autotune “common” structures with PatternSchemes, PatternCache, and expanded Config.
  • Added unit tests and shared test model utilities; updated insertion point handling to treat linear ops uniformly.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
modelopt/onnx/quantization/autotune/autotuner.py Adds the main autotuner implementation, including export/insertion, profiling workflow, and persistence.
modelopt/onnx/quantization/autotune/common.py Introduces/extends core data structures: PatternSchemes, PatternCache, and Config.
modelopt/onnx/quantization/autotune/insertion_points.py Updates insertion-point resolution/skipping to use is_linear_op (Conv/ConvTranspose/Gemm/MatMul).
modelopt/onnx/quantization/autotune/__init__.py Exposes public API for autotune package.
tests/unit/onnx/quantization/autotune/test_autotuner.py Adds unit coverage for initialization, region discovery, export, scheme workflow, and state I/O.
tests/_test_utils/onnx/quantization/autotune/models.py Adds a minimal shared ONNX model builder for autotuner tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cjluo-nv pushed a commit that referenced this pull request Feb 25, 2026
## What does this PR do?

This PR integrates benchmark module to QDQ autotunner. This benchamrk
module is used to evaluate ONNX model perf.
This PR is 1/3 of #703.
Once all small PRs are merged #703 could be closed.

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

## Testing
<!-- Mention how have you tested your change if applicable. -->

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No, document
will be added in part 4.
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No, change log will be updated when all changes are merged.
## Additional Information
<!-- E.g. related issue. -->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added ONNX quantization autotuning capabilities with a consolidated
module providing streamlined import paths for core components.
* Introduced unified benchmarking framework supporting TensorRT-based
model evaluation with both command-line and Python API implementations.
* Added support for timing cache persistence, custom plugin libraries,
shape validation, and dynamic input shape configuration for flexible
model testing and optimization.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.2 branch from 4ad0b32 to 9a94dc4 Compare February 25, 2026 09:18
@modelopt-bot
Copy link

modelopt-bot commented Feb 25, 2026

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.


🚨 Critical Issues

1. Import Error:

File: (line 44)


⚠️ High Priority

2. Duplicate datetime import

File: (line 622)

Remove - already imported at module level.

3. Broad exception handling

File: (lines 1024-1029)

Narrow to specific exceptions or add after logging.

4. Zero-point dtype hardcoded to int8

File: (lines 912-918)


🔧 Medium Priority

5. Config attributes not explicitly defined

File: (lines 229, 718, 744)

Using - define these in class:

6. Assertions for runtime validation

File: (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with ).


💡 Simplification Suggestions

File size: is 1,320 lines - consider splitting:

    • Base class
    • Main implementation
    • Q/DQ insertion logic

Recursive traversal: (line 1042) - consider iterative approach for deep hierarchies.


✅ Recommendations

Before merge:

  1. Fix import error (critical)
  2. Remove duplicate datetime import
  3. Fix zero-point dtype
  4. Narrow exception handling

Nice to have:
5. Define Config attributes explicitly
6. Convert assertions to exceptions
7. Consider file refactoring

cc: @willg-nv @gcunhase

@gcunhase
Copy link
Contributor

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core class that drives the QDQ autotuning workflow. The code is generally well-structured with comprehensive documentation and tests, but several issues need addressing before merge.

✅ What's Good

  1. Well-documented - Extensive docstrings explaining the workflow
  2. Comprehensive tests - Good coverage of initialization, region discovery, export
  3. Clean architecture - Good separation between and
  4. State persistence - YAML save/load for reproducibility is well-implemented

🚨 Critical Issues (Must Fix)

1. Import Error:

Location: line 44

This will cause an ImportError at runtime. Based on CodeRabbit's review, the function name should be .

⚠️ High Priority Issues

2. Duplicate import

Location: line 622

There's a local import inside a method, but is already imported at module level. Remove the duplicate.

3. Broad exception handling in graph cleanup

Location: lines 1024-1029

This swallows all exceptions. Either catch only expected exception types (e.g., specific cleanup/toposort exceptions) or log and re-raise.

4. Zero-point dtype mismatch

Location: lines 912-918

This doesn't respect the configured . Should use variable instead.

🔧 Medium Priority Issues

5. anti-pattern for config attributes

Location: Lines 229, 718, 744

Using suggests these may not be defined in . Define them explicitly with proper defaults:

6. Assertions used for runtime validation

Location: Lines 314-316, 333-335, 972-985

Multiple statements for runtime validation. These are disabled with . Convert to explicit exceptions:

💡 Simplification Suggestions

Component Verdict Notes
✅ Keep Good abstraction
✅ Keep Main implementation
✅ Keep Necessary for scheme management
✅ Keep Good for warm-starting
Config expansion ⚠️ Clean up Some fields added via
File size concern: is 1,320 lines. Consider splitting:

    • Base class and utilities
    • Main QDQAutotuner class
    • Q/DQ insertion and ONNX export logic

Recursive traversal concern: (line 1042) uses recursion which could hit stack limits for deep hierarchies. Consider iterative approach:

✅ Recommendations

Before merge:

  1. ✅ Fix import: →
  2. ✅ Remove duplicate import
  3. ✅ Fix zero-point dtype to respect
  4. ✅ Narrow exception handling in graph cleanup

Nice to have: 5. Define all Config attributes explicitly (remove fallbacks) 6. Convert assertions to explicit exceptions 7. Consider breaking into smaller files

Overall: The PR is well-implemented and necessary for the QDQ autotuning workflow. Fix the critical import error before merge, then address high-priority items.

cc: @willg-nv @gcunhase

@willg-nv please address the critical and high priority issues raised above before merging. Thanks.

@modelopt-bot
Copy link

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.


🚨 Critical Issues

1. Import Error: get_tensor_consumer_node_indices

File: modelopt/onnx/quantization/autotune/autotuner.py (line 44)

# Current (will cause ImportError):
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices

# Should be:
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_nodes

⚠️ High Priority

2. Duplicate datetime import

File: modelopt/onnx/quantization/autotune/autotuner.py (line 622)

Remove from datetime import datetime - already imported at module level.

3. Broad exception handling

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 1024-1029)

except Exception as e:  # Too broad - catches KeyboardInterrupt, SystemExit, etc.

Narrow to specific exceptions or add raise after logging.

4. Zero-point dtype hardcoded to int8

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 912-918)

q_zp_values = np.array([q_zero_point], dtype=np.int8)  # Hardcoded
# Should use: dtype=quant_dtype

🔧 Medium Priority

5. Config attributes not explicitly defined

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 229, 718, 744)

Using getattr(self.config, "attr", default) - define these in Config class:

  • maximum_generation_attempts
  • top_percent_to_mutate
  • minimum_schemes_to_mutate
  • maximum_mutations

6. Assertions for runtime validation

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with python -O).


💡 Simplification Suggestions

File size: autotuner.py is 1,320 lines - consider splitting:

  • autotuner_base.py - Base class
  • autotuner.py - Main implementation
  • export_utils.py - Q/DQ insertion logic

Recursive traversal: _visit_region_recursively (line 1042) - consider iterative approach for deep hierarchies.


✅ Recommendations

Before merge:

  1. Fix import error (critical)
  2. Remove duplicate datetime import
  3. Fix zero-point dtype
  4. Narrow exception handling

Nice to have:
5. Define Config attributes explicitly
6. Convert assertions to exceptions
7. Consider file refactoring

cc: @willg-nv @gcunhase

@cjluo-nv
Copy link
Collaborator

  • autotuner_base.py - Base class
  • autotuner.py - Main implementation
  • export_utils.py - Q/DQ insertion logic

@willg-nv could you fix 1,2,4 and split the autotuner.py as suggested?

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv
Copy link
Contributor Author

Review: PR #838 - Integrate Automated QDQ autotuner - part 3.2

Overall Assessment

This PR implements the core QDQAutotuner class. The code is well-structured with good documentation and tests, but several critical and high-priority issues need fixing.

🚨 Critical Issues

1. Import Error: get_tensor_consumer_node_indices

File: modelopt/onnx/quantization/autotune/autotuner.py (line 44)

# Current (will cause ImportError):
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices

# Should be:
from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_nodes

⚠️ High Priority

2. Duplicate datetime import

File: modelopt/onnx/quantization/autotune/autotuner.py (line 622)

Remove from datetime import datetime - already imported at module level.

3. Broad exception handling

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 1024-1029)

except Exception as e:  # Too broad - catches KeyboardInterrupt, SystemExit, etc.

Narrow to specific exceptions or add raise after logging.

4. Zero-point dtype hardcoded to int8

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 912-918)

q_zp_values = np.array([q_zero_point], dtype=np.int8)  # Hardcoded
# Should use: dtype=quant_dtype

🔧 Medium Priority

5. Config attributes not explicitly defined

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 229, 718, 744)

Using getattr(self.config, "attr", default) - define these in Config class:

  • maximum_generation_attempts
  • top_percent_to_mutate
  • minimum_schemes_to_mutate
  • maximum_mutations

6. Assertions for runtime validation

File: modelopt/onnx/quantization/autotune/autotuner.py (lines 314-316, 333-335, 972-985)

Replace assertions with explicit exceptions (asserts are disabled with python -O).

💡 Simplification Suggestions

File size: autotuner.py is 1,320 lines - consider splitting:

  • autotuner_base.py - Base class
  • autotuner.py - Main implementation
  • export_utils.py - Q/DQ insertion logic

Recursive traversal: _visit_region_recursively (line 1042) - consider iterative approach for deep hierarchies.

✅ Recommendations

Before merge:

  1. Fix import error (critical)
  2. Remove duplicate datetime import
  3. Fix zero-point dtype
  4. Narrow exception handling

Nice to have: 5. Define Config attributes explicitly 6. Convert assertions to exceptions 7. Consider file refactoring

cc: @willg-nv @gcunhase

1, I think this is a false alarm for get_tensor_consumer_node_indices is in current repo. So there should be no ImportError

2: this issue does not exists in latest code.
3. fixed
4. this is false alarm.
5. fixed.
6. simplification suggestions applied

@willg-nv
Copy link
Contributor Author

willg-nv commented Feb 26, 2026

  • autotuner_base.py - Base class
  • autotuner.py - Main implementation
  • export_utils.py - Q/DQ insertion logic

@willg-nv could you fix 1,2,4 and split the autotuner.py as suggested?

please check comment #838 (comment)

@cjluo-nv cjluo-nv enabled auto-merge (squash) February 26, 2026 07:01
@cjluo-nv
Copy link
Collaborator

/ok to test c99edab

Signed-off-by: Will Guo <willg@nvidia.com>
auto-merge was automatically disabled February 26, 2026 07:14

Head branch was pushed to by a user without write access

@modelopt-bot
Copy link

Security Review: PRs #838 and #839

Summary

Both PRs have been reviewed against the security checklist. One security issue was identified in PR #839 with a # nosec comment bypassing Bandit security checks.


PR #838 - Integrate Automated QDQ autotuner - part 3.2

Security Check Results

Check Status Details
torch.load(..., weights_only=False) ✅ PASS Not used in this PR
numpy.load(..., allow_pickle=True) ✅ PASS Not used in this PR
trust_remote_code=True ✅ PASS Not used in this PR
eval() / exec() on external input ✅ PASS Not found
# nosec bypass ✅ PASS Not found

Code Review

The PR uses yaml.safe_load() which is the correct secure method for loading YAML files. No pickle usage, torch loading with unsafe settings, or code execution vulnerabilities were found.

Status: No security issues identified.


PR #839 - Integrate Automated QDQ placement tool - part 3.3

🔴 CRITICAL Security Issue Found

Issue: Unauthorized use of # nosec comment

File: modelopt/onnx/quantization/autotune/__main__.py
Line: ~line 32 (in DEFAULT_TIMING_CACHE definition)
Code:

DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"  # nosec B108

Security Check Results

Check Status Details
torch.load(..., weights_only=False) ✅ PASS Not used in this PR
numpy.load(..., allow_pickle=True) ✅ PASS Not used in this PR
trust_remote_code=True ✅ PASS Not used in this PR
eval() / exec() on external input ✅ PASS Not found
# nosec bypass 🔴 FAIL Found unauthorized # nosec B108 comment

Explanation

The # nosec B108 comment is used to bypass Bandit's B108 security check (probable insecure temporary file creation). According to the security policy:

"If a security-sensitive pattern is genuinely necessary, the PR must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description."

This PR does not include:

  1. Approval from @NVIDIA/modelopt-setup-codeowners
  2. An explicit justification in the PR description for why this bypass is safe

Recommended Fix

# Option 1: Remove the nosec comment and use a safer path
DEFAULT_TIMING_CACHE = os.path.join(tempfile.gettempdir(), "trtexec_timing.cache")

# Option 2: If B108 bypass is genuinely required, add justification:
# nosec B108: This path is used for TensorRT timing cache which is 
# created by trtexec and does not contain sensitive data.
# Reviewed and approved by: @NVIDIA/modelopt-setup-codeowners
DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"

Additional Security Notes

The PR uses yaml.safe_load() in PatternCache.load() and state loading, which is the secure method. No other security issues were identified.


Overall Recommendation

PR Status Required Action
#838 ✅ Approved No security issues; can merge
#839 🔴 Request Changes Remove or justify # nosec B108 comment with approval from @NVIDIA/modelopt-setup-codeowners

cc: @willg-nv @gcunhase @NVIDIA/modelopt-setup-codeowners

@willg-nv
Copy link
Contributor Author

willg-nv commented Feb 26, 2026

Security Review: PRs #838 and #839

Summary

Both PRs have been reviewed against the security checklist. One security issue was identified in PR #839 with a # nosec comment bypassing Bandit security checks.

PR #838 - Integrate Automated QDQ autotuner - part 3.2

Security Check Results

Check Status Details
torch.load(..., weights_only=False) ✅ PASS Not used in this PR
numpy.load(..., allow_pickle=True) ✅ PASS Not used in this PR
trust_remote_code=True ✅ PASS Not used in this PR
eval() / exec() on external input ✅ PASS Not found

nosec bypass ✅ PASS Not found

Code Review

The PR uses yaml.safe_load() which is the correct secure method for loading YAML files. No pickle usage, torch loading with unsafe settings, or code execution vulnerabilities were found.

Status: No security issues identified.

PR #839 - Integrate Automated QDQ placement tool - part 3.3

🔴 CRITICAL Security Issue Found

Issue: Unauthorized use of # nosec comment

File: modelopt/onnx/quantization/autotune/__main__.py Line: ~line 32 (in DEFAULT_TIMING_CACHE definition) Code:

DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"  # nosec B108

modelopt/onnx/quantization/autotune/main.py is not in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants