[NVBug 5702186] Fix awq model export for Gemma3#793
Conversation
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
📝 WalkthroughWalkthroughA single file in the quantization export module is refactored to introduce conditional weight-folding logic for LayerNorm fusion. A new helper function detects LayerNorm variants using weight-plus-one patterns, and the pre-quantization scale folding is now conditionally applied either as direct weight multiplication or via the detected weight-plus-one mechanism. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #793 +/- ##
==========================================
- Coverage 74.23% 74.22% -0.01%
==========================================
Files 192 192
Lines 19033 19035 +2
==========================================
Hits 14129 14129
- Misses 4904 4906 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| def _layernorm_uses_weight_plus_one(module: torch.nn.Module) -> bool: | ||
| if any( | ||
| name in type(module).__name__ | ||
| for name in ["LayerNorm1P", "GemmaRMSNorm", "Gemma2RMSNorm", "Gemma3RMSNorm"] |
There was a problem hiding this comment.
Does LayerNorm1P appears in Gemma3 only?
There was a problem hiding this comment.
No, but nemotron uses it, so adding it in the change for future proof: https://github.com/huggingface/transformers/blob/24807bfcf4a21286fa2a7e728f381ddaaca7bbc7/src/transformers/models/nemotron/modeling_nemotron.py#L88
## What does this PR do? **Type of change:** Bug fix <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** norms laers in Gemma that use (1 + weight) in forward, we will fold pre_quant_scale into the effective weight. That is to find folded w' subject to: `1 + w' = (1 + w) * s` => `w' = (1 + w) * s -1` ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ./scripts/huggingface_example.sh --model google/gemma-3-1b-it --quant int4_awq ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Improvements** * Enhanced quantization utilities to better handle various LayerNorm variants and normalization patterns, including support for weight-offset variants and zero-centered gamma configurations. * Optimized pre-quantization layer normalization fusion to apply conditional weight scaling strategies based on normalization type. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
## What does this PR do? **Type of change:** Bug fix <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** norms laers in Gemma that use (1 + weight) in forward, we will fold pre_quant_scale into the effective weight. That is to find folded w' subject to: `1 + w' = (1 + w) * s` => `w' = (1 + w) * s -1` ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ./scripts/huggingface_example.sh --model google/gemma-3-1b-it --quant int4_awq ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Improvements** * Enhanced quantization utilities to better handle various LayerNorm variants and normalization patterns, including support for weight-offset variants and zero-centered gamma configurations. * Optimized pre-quantization layer normalization fusion to apply conditional weight scaling strategies based on normalization type. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
What does this PR do?
Type of change: Bug fix
Overview: norms laers in Gemma that use (1 + weight) in forward, we will fold pre_quant_scale into the effective weight. That is to find folded w' subject to:
1 + w' = (1 + w) * s=>w' = (1 + w) * s -1Usage
# Add a code snippet demonstrating how to use thisTesting
./scripts/huggingface_example.sh --model google/gemma-3-1b-it --quant int4_awq
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.