Integrate Automated QDQ benchmark - part 3.1 by willg-nv · Pull Request #837 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-02T02:51:54Z

What does this PR do?

This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf.
This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed.

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No, document will be added in part 4.
Did you update Changelog?: No, change log will be updated when all changes are merged.

Additional Information

Summary by CodeRabbit

New Features
- Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components.
- Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations.
- Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-02T02:51:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-02T02:52:13Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a benchmarking framework for ONNX model evaluation within the quantization autotuning subsystem. It provides an abstract Benchmark interface with two concrete implementations: TrtExecBenchmark for CLI-based benchmarking via trtexec, and TensorRTPyBenchmark for Python API-based benchmarking via TensorRT. A package initializer consolidates public exports from the autotune submodules.

Changes

Cohort / File(s)	Summary
Public API Exports `modelopt/onnx/quantization/autotune/__init__.py`	Centralizes and re-exports 17 public entities from the autotune subsystem, including benchmark classes, region abstractions, configuration, error types, and search orchestration driver.
Benchmarking Framework `modelopt/onnx/quantization/autotune/benchmark.py`	Introduces abstract `Benchmark` base class and two concrete implementations: `TrtExecBenchmark` (CLI-based timing via trtexec) and `TensorRTPyBenchmark` (Python API-based timing via TensorRT with CUDA integration, plugin loading, dynamic shape support, and statistics collection).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TensorRTPyBenchmark
    participant TensorRT as TensorRT Engine
    participant CUDA as CUDA Runtime
    participant FileSystem

    User->>TensorRTPyBenchmark: run(onnx_model_bytes)
    TensorRTPyBenchmark->>FileSystem: Load/create timing cache
    TensorRTPyBenchmark->>TensorRT: Load plugins if configured
    TensorRTPyBenchmark->>TensorRT: Build engine from ONNX
    TensorRTPyBenchmark->>TensorRT: Create context
    
    TensorRTPyBenchmark->>CUDA: Allocate device memory
    TensorRTPyBenchmark->>TensorRT: Warmup runs (N iterations)
    CUDA-->>TensorRTPyBenchmark: Kernel execution
    
    TensorRTPyBenchmark->>TensorRT: Timing runs (M iterations)
    loop Collect Latencies
        TensorRT->>CUDA: Execute inference
        CUDA-->>TensorRTPyBenchmark: Timing data
    end
    
    TensorRTPyBenchmark->>TensorRTPyBenchmark: Calculate statistics (min/max/mean/median)
    TensorRTPyBenchmark->>FileSystem: Save timing cache & results
    TensorRTPyBenchmark-->>User: Return median latency (float)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Integrate Automated QDQ benchmark - part 3.1' directly reflects the main change: introducing benchmark module (TrtExecBenchmark, TensorRTPyBenchmark) into the QDQ autotuner package, as confirmed by the added files and PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 207-212: The parameter flush_timing_cache on TrtExecBenchmark.run
is declared but unused; either remove it from the signature or implement the
same behavior as TensorRTPyBenchmark.run by flushing the timing cache before
running. To implement, add a small helper (e.g., _flush_timing_cache) or call
the appropriate trtexec/engine cache clear routine at the start of
TrtExecBenchmark.run when flush_timing_cache is True, then proceed with the
existing benchmarking flow; if you prefer removal, delete the flush_timing_cache
parameter from TrtExecBenchmark.run and any callers to keep APIs consistent with
actual behavior.
- Around line 262-266: The code raises NameError when log_file is None because
it references the undefined variable output; in the block that handles a failed
regex match (where you have match = re.search(self.latency_pattern,
result.stdout, ...)), replace the use of output with the already-available
result.stdout (or otherwise initialize output before the conditional) so the
logger.debug call uses a defined variable; update the code around the match
check that references log_file/result.stdout and self.logger to log
result.stdout instead of output when log_file is None.
- Around line 676-691: The _save_timing_cache method has inverted and wrong
checks, a typo, unsafe finally cleanup, and incorrect serialization logic:
ensure a builder config is created before use (assign config =
self.builder.create_builder_config() before the try or inside try but guard
finally), only attempt to combine when self._timing_cache is not None (change
the condition from if self._timing_cache is None to if self._timing_cache is not
None), call output_cache.combine(self._timing_cache, ignore_errors=True) (fix
typo combine), then serialize the resulting cache (or if the intent is to
persist the existing cache directly, serialize self._timing_cache) and write it
to self.timing_cache_file; finally, in the finally block only delete or close
config if it was successfully created (e.g., check if 'config' in locals() or
use a variable initialized to None) to avoid UnboundLocalError.
- Line 145: The function parameter trtexec_args currently uses a mutable default
list (trtexec_args: list = []) which can be shared across calls; change the
signature to use None (e.g., trtexec_args: Optional[list] = None) and inside the
function (in the body of the same function that declares trtexec_args)
initialize it to an empty list if None (trtexec_args = trtexec_args or [])
before use; update the type hint to Optional[list] (or List[str]) and add the
necessary typing import if absent, ensuring all references to trtexec_args in
the function use the newly initialized local list.

🧹 Nitpick comments (4)

modelopt/onnx/quantization/autotune/benchmark.py (4)
97-98: Unnecessary global statement.

The global logger declaration is not needed here since logger is already available at module scope from the import on line 57. The global keyword is only required when you intend to reassign a module-level variable inside a function/method.
Suggested fix
-        global logger
         self.timing_cache_file = timing_cache_file or "/tmp/trtexec_timing.cache"  # nosec B108
543-544: Potential dtype mismatch for integer input tensors.

np.random.randn(size) generates float64 values, which are then cast via .astype(dtype). For integer dtypes, this truncates floats to integers (all becoming 0 for values in (-1, 1)). Consider using np.random.randint for integer types or ensuring the random data is appropriate for the tensor's dtype.
Suggested approach
                 if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
-                    np.copyto(host_mem, np.random.randn(size).astype(dtype))
+                    if np.issubdtype(dtype, np.integer):
+                        np.copyto(host_mem, np.random.randint(0, 100, size=size, dtype=dtype))
+                    else:
+                        np.copyto(host_mem, np.random.randn(size).astype(dtype))
                     inputs.append({"host": host_mem, "device": device_mem, "name": tensor_name})
104-115: Base class signature mismatch with subclasses.

The abstract run() method has signature run(path_or_bytes, log_file), but both TrtExecBenchmark.run() and TensorRTPyBenchmark.run() add a flush_timing_cache parameter. This breaks Liskov substitution - code using Benchmark type hints won't know about the extra parameter. Consider adding the parameter to the base class signature.
Suggested fix for base class
     `@abstractmethod`
-    def run(self, path_or_bytes: str | bytes, log_file: str | None = None) -> float:
+    def run(self, path_or_bytes: str | bytes, log_file: str | None = None, flush_timing_cache: bool = False) -> float:
         """Run benchmark on the given ONNX model.

         Args:
             path_or_bytes: Path to the ONNX model (str) or raw model data (bytes)
             log_file: Optional path to save benchmark logs
+            flush_timing_cache: Whether to flush timing cache to disk after build

         Returns:
             Measured latency in milliseconds, or float("inf") on failure
         """
185-191: Inconsistent plugin validation behavior between benchmark implementations.

TrtExecBenchmark logs a warning and continues when a plugin library is not found (line 188-189), while TensorRTPyBenchmark raises FileNotFoundError (line 359). Consider aligning the behavior - failing fast is generally safer for configuration errors.

modelopt/onnx/quantization/autotune/benchmark.py

gcunhase · 2026-02-09T23:21:26Z

How will Auto Q/DQ dependencies be ensured in ModelOpt?

See previous discussion on pycuda requirement: https://github.com/NVIDIA/Model-Optimizer/pull/703/changes#r2733859240

How will TensorRT's dependency be ensured?

modelopt/onnx/quantization/autotune/benchmark.py

gcunhase · 2026-02-09T23:47:41Z

Should we add a test file here? This could potentially be merged with test_workflows.py in #839 (comment)

willg-nv · 2026-02-12T00:52:14Z

Should we add a test file here? This could potentially be merged with test_workflows.py in #839 (comment)

I think this will be covered by test_workflow.py. I have a question, does ModelOpt CI support running trtexec and benchmark tests?

willg-nv · 2026-02-12T00:53:20Z

How will Auto Q/DQ dependencies be ensured in ModelOpt?

See previous discussion on pycuda requirement: https://github.com/NVIDIA/Model-Optimizer/pull/703/changes#r2733859240

How will TensorRT's dependency be ensured?

As is discussed, TensorRT dependency should be installed by users. And pycuda dependency should be added to example requirements.

modelopt/onnx/quantization/autotune/benchmark.py

ajrasane · 2026-02-12T09:33:06Z

/ok to test b4855ad

gcunhase · 2026-02-17T21:20:45Z

Should we add a test file here? This could potentially be merged with test_workflows.py in #839 (comment)

I think this will be covered by test_workflow.py. I have a question, does ModelOpt CI support running trtexec and benchmark tests?

It doesn't seem like unittests run with TRT support, but examples testing do. See

Model-Optimizer/.github/workflows/example_tests.yml

Line 125 in 3801923

##### ONNX/TensorRT Example Tests #####

@kevalmorabia97 / @cjluo-nv to confirm this. Also, do GPU tests have TRT support? We have some tests that benefit from that (see test_plugin.py, for example).

kevalmorabia97 · 2026-02-18T11:15:13Z

Should we add a test file here? This could potentially be merged with test_workflows.py in #839 (comment)

I think this will be covered by test_workflow.py. I have a question, does ModelOpt CI support running trtexec and benchmark tests?

It doesn't seem like unittests run with TRT support, but examples testing do. See

Model-Optimizer/.github/workflows/example_tests.yml

Line 125 in 3801923

##### ONNX/TensorRT Example Tests #####

@kevalmorabia97 / @cjluo-nv to confirm this. Also, do GPU tests have TRT support? We have some tests that benefit from that (see test_plugin.py, for example).

@gcunhase ONNX/TensorRT Example tests (tensorrt docker) and GPU tests (pytorch docker) both run on GPU runner and both docker images have TRT installed

modelopt-bot

Review: PR #837 - Integrate Automated QDQ Benchmark

Overall Assessment

This PR adds benchmark infrastructure for QDQ autotuning with two benchmark implementations (TrtExecBenchmark using CLI and TensorRTPyBenchmark using Python API). The code is functional but has significant readability concerns in the benchmark.py file.

Key Issues Identified

1. Massive `run()` Method in `TensorRTPyBenchmark`

The run() method spans ~300 lines with deep nesting and multiple concerns:

Building TensorRT engine
Handling dynamic shapes
Memory allocation (host + device)
Warmup and timing loops
Cleanup

Recommendation: Split into helper methods:

_build_engine() - returns (serialized_engine, build_time)
_allocate_buffers() - returns (inputs, outputs, stream_handle)
_setup_execution_context() - returns context
_run_warmup()
_run_timing() - returns latencies array

Location: benchmark.py lines ~360-640

2. Hardcoded Regex Pattern

self.latency_pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"

This regex is brittle and may break with trtexec output format changes. Consider making it configurable or using structured output (XML/JSON) if trtexec supports it.

Location: benchmark.py line ~272

3. Typo in Docstring

The package docstring has "benchamrk" instead of "benchmark":

This benchamrk module is used to evaluate ONNX model perf.

Location: PR description line 7, also appears in PR title logic

4. Duplicate Docstring Parameters

TrtExecBenchmark.__init__ repeats the same parameter descriptions that are already in the parent class. Consider using :param timing_cache_file: references instead of full descriptions to reduce duplication.

Location: benchmark.py lines ~203-222

5. Complex Conditional in Shape Validation

if not (min_dim <= opt_dim <= max_dim):
    raise ValueError(
        f"Invalid shape range at dimension {i}: "
        f"min={min_dim}, opt={opt_dim}, max={max_dim}. "
        f"Must satisfy min <= opt <= max"
    )

This is readable but could be simplified by extracting into a helper or using assert style validation.

Location: benchmark.py lines ~410-416

6. Inconsistent Error Patterns

Some methods return float("inf") on failure while others raise exceptions. This inconsistency makes error handling confusing for callers. Standardize on one approach (exceptions preferred for unexpected errors).

Location: Various places in benchmark.py

Minor Suggestions

Nit: The _alloc_pinned_host() nested function inside run() could be a static method for better testability.
Nit: The logger.debug(f"Running...") messages could be consolidated or use a progress indicator for long-running benchmarks.
Nit: Consider validating warmup_runs and timing_runs are positive integers in Benchmark.__init__.

Context from Related PRs (#844-846)

Having reviewed PRs 844-846 which implement RegionPattern, RegionSearch, and RegionInspect, this PR fits well as part 3.1 (benchmark infrastructure) in the overall QDQ autotune pipeline. The region-based partitioning from those PRs will feed into these benchmarks.

Summary

The functionality is solid but the run() method needs refactoring before this merges. The 300+ line method with 5+ levels of nesting is difficult to maintain and review. Splitting it into focused helpers would significantly improve readability and testability.

Priority: Medium-High - recommend addressing the run() method split before merge.

modelopt-bot

See inline comments for specific feedback.

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-02-24T03:08:53Z

Review: PR #837 - Integrate Automated QDQ Benchmark

Overall Assessment

This PR adds benchmark infrastructure for QDQ autotuning with two benchmark implementations (TrtExecBenchmark using CLI and TensorRTPyBenchmark using Python API). The code is functional but has significant readability concerns in the benchmark.py file.

Key Issues Identified

1. Massive run() Method in TensorRTPyBenchmark

The run() method spans ~300 lines with deep nesting and multiple concerns:

Building TensorRT engine

Handling dynamic shapes

Memory allocation (host + device)

Warmup and timing loops

Cleanup

Recommendation: Split into helper methods:

_build_engine() - returns (serialized_engine, build_time)

_allocate_buffers() - returns (inputs, outputs, stream_handle)

_setup_execution_context() - returns context

_run_warmup()

_run_timing() - returns latencies array

Location: benchmark.py lines ~360-640

2. Hardcoded Regex Pattern
self.latency_pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"
This regex is brittle and may break with trtexec output format changes. Consider making it configurable or using structured output (XML/JSON) if trtexec supports it.

Location: benchmark.py line ~272

3. Typo in Docstring

The package docstring has "benchamrk" instead of "benchmark":
This benchamrk module is used to evaluate ONNX model perf.
Location: PR description line 7, also appears in PR title logic

4. Duplicate Docstring Parameters

TrtExecBenchmark.__init__ repeats the same parameter descriptions that are already in the parent class. Consider using :param timing_cache_file: references instead of full descriptions to reduce duplication.

Location: benchmark.py lines ~203-222

5. Complex Conditional in Shape Validation
if not (min_dim <= opt_dim <= max_dim):
    raise ValueError(
        f"Invalid shape range at dimension {i}: "
        f"min={min_dim}, opt={opt_dim}, max={max_dim}. "
        f"Must satisfy min <= opt <= max"
    )
This is readable but could be simplified by extracting into a helper or using assert style validation.

Location: benchmark.py lines ~410-416

6. Inconsistent Error Patterns

Some methods return float("inf") on failure while others raise exceptions. This inconsistency makes error handling confusing for callers. Standardize on one approach (exceptions preferred for unexpected errors).

Location: Various places in benchmark.py

Minor Suggestions

Nit: The _alloc_pinned_host() nested function inside run() could be a static method for better testability.

Nit: The logger.debug(f"Running...") messages could be consolidated or use a progress indicator for long-running benchmarks.

Nit: Consider validating warmup_runs and timing_runs are positive integers in Benchmark.__init__.

Context from Related PRs (#844-846)

Having reviewed PRs 844-846 which implement RegionPattern, RegionSearch, and RegionInspect, this PR fits well as part 3.1 (benchmark infrastructure) in the overall QDQ autotune pipeline. The region-based partitioning from those PRs will feed into these benchmarks.

Summary

The functionality is solid but the run() method needs refactoring before this merges. The 300+ line method with 5+ levels of nesting is difficult to maintain and review. Splitting it into focused helpers would significantly improve readability and testability.

Priority: Medium-High - recommend addressing the run() method split before merge.

resolved 1, 3, 4, 5

Signed-off-by: Will Guo <willg@nvidia.com>

modelopt-bot · 2026-02-24T06:59:10Z

Follow-up Review: PR #837

Thank you for the revisions! The code readability has improved significantly.

✅ Issues Addressed

Massive run() method refactored - The TensorRTPyBenchmark.run() method has been split into focused helpers:
- _validate_shape_range() (module-level function)
- _build_engine()
- _allocate_buffers()
- _setup_execution_context()
- _run_warmup()
- _run_timing()
This is excellent - the run() method is now readable and maintainable.
Duplicate docstrings fixed - TrtExecBenchmark.__init__ now uses :meth:Benchmark.__init__ references instead of duplicating parameters.
Shape validation extracted - The complex conditional in set_shapes now calls _validate_shape_range().

⚠️ Remaining Concerns (Minor)

Hardcoded regex still present - The self.latency_pattern in TrtExecBenchmark remains:
```
self.latency_pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"
```
Consider making this configurable or adding fallback patterns for different trtexec versions.
Nested function - _alloc_pinned_host is still nested inside _allocate_buffers. Consider making it a @staticmethod for better testability:
```
@staticmethod
def _alloc_pinned_host(size: int, dtype: np.dtype) -> tuple[Any, np.ndarray]:
    ...
```
Inconsistent error handling - Some methods return float("inf") on failure (e.g., _setup_execution_context, run), while others raise exceptions (e.g., _allocate_buffers raises RuntimeError). Consider standardizing - either always raise for unexpected errors, or document when float("inf") is expected.

Overall

The major readability issues have been resolved. The code is now much cleaner and easier to follow. LGTM pending the minor items above (which could be addressed in a follow-up if preferred).

codecov · 2026-02-24T14:35:13Z

Codecov Report

❌ Patch coverage is 14.64789% with 303 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.17%. Comparing base (52e662d) to head (57e03ab).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/autotune/benchmark.py	13.18%	303 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #837      +/-   ##
==========================================
- Coverage   73.10%   72.17%   -0.93%     
==========================================
  Files         205      207       +2     
  Lines       22294    22656     +362     
==========================================
+ Hits        16297    16352      +55     
- Misses       5997     6304     +307

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2026-02-24T16:39:54Z

@willg-nv could you fix the 2 and 3 above?

Follow-up Review: PR #837

Thank you for the revisions! The code readability has improved significantly.

✅ Issues Addressed

Massive run() method refactored - The TensorRTPyBenchmark.run() method has been split into focused helpers:

_validate_shape_range() (module-level function)

_build_engine()

_allocate_buffers()

_setup_execution_context()

_run_warmup()

_run_timing()

This is excellent - the run() method is now readable and maintainable.

Duplicate docstrings fixed - TrtExecBenchmark.__init__ now uses :meth:Benchmark.__init__ references instead of duplicating parameters.

Shape validation extracted - The complex conditional in set_shapes now calls _validate_shape_range().

⚠️ Remaining Concerns (Minor)
Hardcoded regex still present - The self.latency_pattern in TrtExecBenchmark remains:
self.latency_pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"
Consider making this configurable or adding fallback patterns for different trtexec versions.
Nested function - _alloc_pinned_host is still nested inside _allocate_buffers. Consider making it a @staticmethod for better testability:
@staticmethod
def _alloc_pinned_host(size: int, dtype: np.dtype) -> tuple[Any, np.ndarray]:
    ...
Inconsistent error handling - Some methods return float("inf") on failure (e.g., _setup_execution_context, run), while others raise exceptions (e.g., _allocate_buffers raises RuntimeError). Consider standardizing - either always raise for unexpected errors, or document when float("inf") is expected.
Overall

The major readability issues have been resolved. The code is now much cleaner and easier to follow. LGTM pending the minor items above (which could be addressed in a follow-up if preferred).

@willg-nv could you fix 2 & 3?

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-02-25T02:15:18Z

@willg-nv could you fix the 2 and 3 above?
Follow-up Review: PR #837

Thank you for the revisions! The code readability has improved significantly.

✅ Issues Addressed

Massive run() method refactored - The TensorRTPyBenchmark.run() method has been split into focused helpers:

_validate_shape_range() (module-level function)

_build_engine()

_allocate_buffers()

_setup_execution_context()

_run_warmup()

_run_timing()

This is excellent - the run() method is now readable and maintainable.

Duplicate docstrings fixed - TrtExecBenchmark.__init__ now uses :meth:Benchmark.__init__ references instead of duplicating parameters.

Shape validation extracted - The complex conditional in set_shapes now calls _validate_shape_range().

⚠️ Remaining Concerns (Minor)
Hardcoded regex still present - The self.latency_pattern in TrtExecBenchmark remains:
self.latency_pattern = r"\[I\]\s+Latency:.*?median\s*=\s*([\d.]+)\s*ms"
Consider making this configurable or adding fallback patterns for different trtexec versions.
Nested function - _alloc_pinned_host is still nested inside _allocate_buffers. Consider making it a @staticmethod for better testability:
@staticmethod
def _alloc_pinned_host(size: int, dtype: np.dtype) -> tuple[Any, np.ndarray]:
    ...
Inconsistent error handling - Some methods return float("inf") on failure (e.g., _setup_execution_context, run), while others raise exceptions (e.g., _allocate_buffers raises RuntimeError). Consider standardizing - either always raise for unexpected errors, or document when float("inf") is expected.
Overall

The major readability issues have been resolved. The code is now much cleaner and easier to follow. LGTM pending the minor items above (which could be addressed in a follow-up if preferred).
@willg-nv could you fix 2 & 3?

fixed

cjluo-nv · 2026-02-25T06:53:04Z

/ok to test 57e03ab

willg-nv requested a review from a team as a code owner February 2, 2026 02:51

willg-nv requested a review from gcunhase February 2, 2026 02:51

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

This was referenced Feb 2, 2026

Integrate Automated QDQ autotuner - part 3.2 #838

Open

Integrate Automated QDQ placement tool - part 3.3 #839

Open

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.1 branch 2 times, most recently from 3780d4f to f405dbb Compare February 9, 2026 08:39

gcunhase reviewed Feb 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/benchmark.py Show resolved Hide resolved

willg-nv requested a review from a team as a code owner February 10, 2026 09:33

willg-nv requested a review from kevalmorabia97 February 10, 2026 09:33

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.1 branch from a548f43 to 6a7f38a Compare February 10, 2026 09:44