[NVBug 5702186] Fix awq model export for Gemma3 by meenchen · Pull Request #793 · NVIDIA/Model-Optimizer

meenchen · 2026-01-16T22:51:11Z

What does this PR do?

Type of change: Bug fix

Overview: norms laers in Gemma that use (1 + weight) in forward, we will fold pre_quant_scale into the effective weight. That is to find folded w' subject to: 1 + w' = (1 + w) * s => w' = (1 + w) * s -1

Usage

# Add a code snippet demonstrating how to use this

Testing

./scripts/huggingface_example.sh --model google/gemma-3-1b-it --quant int4_awq

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Improvements
- Enhanced quantization utilities to better handle various LayerNorm variants and normalization patterns, including support for weight-offset variants and zero-centered gamma configurations.
- Optimized pre-quantization layer normalization fusion to apply conditional weight scaling strategies based on normalization type.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

coderabbitai · 2026-01-16T22:51:22Z

📝 Walkthrough

Walkthrough

A single file in the quantization export module is refactored to introduce conditional weight-folding logic for LayerNorm fusion. A new helper function detects LayerNorm variants using weight-plus-one patterns, and the pre-quantization scale folding is now conditionally applied either as direct weight multiplication or via the detected weight-plus-one mechanism.

Changes

Cohort / File(s)	Summary
LayerNorm quantization fusion logic `modelopt/torch/export/quant_utils.py`	Added `_layernorm_uses_weight_plus_one()` helper to detect LayerNorm variants (e.g., LayerNorm1P, Gemma RMSNorm) with zero-centered gamma. Refactored `fuse_prequant_layernorm()` to conditionally fold pre_quant_scale: applies (weight + 1) pattern for detected variants, otherwise simple multiplication. Bias folding updated to use pre_quant_scale directly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the specific bug being fixed (awq model export for Gemma3) and references the NVBug ticket, directly matching the core change in the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-16T23:02:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.22%. Comparing base (db76b1e) to head (707140c).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   74.23%   74.22%   -0.01%     
==========================================
  Files         192      192              
  Lines       19033    19035       +2     
==========================================
  Hits        14129    14129              
- Misses       4904     4906       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Edwardf0t1

LGTM

Edwardf0t1 · 2026-01-16T23:04:54Z

modelopt/torch/export/quant_utils.py

+def _layernorm_uses_weight_plus_one(module: torch.nn.Module) -> bool:
+    if any(
+        name in type(module).__name__
+        for name in ["LayerNorm1P", "GemmaRMSNorm", "Gemma2RMSNorm", "Gemma3RMSNorm"]


Does LayerNorm1P appears in Gemma3 only?

No, but nemotron uses it, so adding it in the change for future proof: https://github.com/huggingface/transformers/blob/24807bfcf4a21286fa2a7e728f381ddaaca7bbc7/src/transformers/models/nemotron/modeling_nemotron.py#L88

## What does this PR do? **Type of change:** Bug fix  **Overview:** norms laers in Gemma that use (1 + weight) in forward, we will fold pre_quant_scale into the effective weight. That is to find folded w' subject to: `1 + w' = (1 + w) * s` => `w' = (1 + w) * s -1` ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ./scripts/huggingface_example.sh --model google/gemma-3-1b-it --quant int4_awq ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information   ## Summary by CodeRabbit * **Improvements** * Enhanced quantization utilities to better handle various LayerNorm variants and normalization patterns, including support for weight-offset variants and zero-centered gamma configurations. * Optimized pre-quantization layer normalization fusion to apply conditional weight scaling strategies based on normalization type. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

fix gemma int4_awq

707140c

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

meenchen requested a review from a team as a code owner January 16, 2026 22:51

meenchen requested a review from Edwardf0t1 January 16, 2026 22:51

meenchen requested a review from cjluo-nv January 16, 2026 22:51

Edwardf0t1 approved these changes Jan 16, 2026

View reviewed changes

meenchen self-assigned this Jan 16, 2026

cjluo-nv approved these changes Jan 16, 2026

View reviewed changes

meenchen merged commit 38fb120 into main Jan 18, 2026
45 of 49 checks passed

meenchen deleted the weimingc/fix_int4awq_gemma branch January 18, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug 5702186] Fix awq model export for Gemma3#793

[NVBug 5702186] Fix awq model export for Gemma3#793
meenchen merged 1 commit intomainfrom
weimingc/fix_int4awq_gemma

meenchen commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

Edwardf0t1 left a comment

Uh oh!

Edwardf0t1 Jan 16, 2026

Uh oh!

meenchen Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

meenchen commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meenchen commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

codecov bot commented Jan 16, 2026 •

edited

Loading