support static NVFP4 HF export by Fridah-nv · Pull Request #858 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-02-05T18:59:08Z

What does this PR do?

Type of change: ? new feature

Overview: ?
Supports export NVFP4StaticQantizer in unified huggingface checkpoint, as a deployment path for PTQ algorithms such as MSE

Usage

# checkpoint generation
python examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B  --qformat nvfp4_mse --export_path test-Qwen3-8B-Instruct-MSE-FP8-sweep-FP4 --kv_cache_qformat none --trust_remote_code

Testing

Tested generated Qwen3 8B checkpoint with trtllm serve and nv_eval example in Model-Optimizer-Internal/examples/nv_eval.

NV eval results:

| Groups |Version|Filter|n-shot|  Metric   |   |Value |   |Stderr|
|--------|-------|------|------|-----------|---|-----:|---|-----:|
|mmlu_str|       |none  |      |exact_match|↑  |0.7186|±  |0.0036|

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added support for static NVFP4 quantizers that utilize pre-computed calibration scales.
- Introduced new NVFP4 W4A4 quantization configuration with optional FP8 scale sweep.
Performance Improvements
- Static quantizers now skip unnecessary dynamic scaling factor recalculation.
- Unified quantization handling for improved consistency and efficiency.

copy-pr-bot · 2026-02-05T18:59:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-05T18:59:16Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces support for NVFP4StaticQuantizer to handle pre-computed weight scales alongside existing dynamic quantizers. Changes distinguish between static and dynamic NVFP4 variants across export utilities, configuration definitions, and tensor scaling operations. Static quantizers bypass dynamic calibration; dynamic quantizers proceed normally.

Changes

Cohort / File(s)	Summary
Export utilities `modelopt/torch/export/quant_utils.py`, `modelopt/torch/export/unified_export_hf.py`	Added recognition of NVFP4StaticQuantizer in import and weight scaling logic. Export utilities now conditionally skip dynamic scaling computation for static quantizers and use pre-computed scales instead. Config generation labels static NVFP4 as `nvfp4_static` and fusion preprocessing unifies `global_amax` for static quantizers.
Quantization configuration `modelopt/torch/quantization/config.py`	Introduced new `NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG` constant with static weight quantizer and dynamic input quantizer, combined with MSE calibration and FP8 scale sweep enablement.
Tensor scaling logic `modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	Added dual-path quantizer handling: `_is_static_quantizer` detects static quantizers via `global_amax`, and new `get_weights_scaling_factor_from_quantizer` method computes per-block scales for static quantizers from pre-computed values or delegates to dynamic path, with optional FP8 quantization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title 'support static NVFP4 HF export' accurately summarizes the main change—adding export support for static NVFP4 quantizers in Hugging Face export, which is reflected across all modified files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fridah/static-fp4-export

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

modelopt/torch/quantization/config.py

modelopt/torch/export/quant_utils.py

modelopt/torch/export/unified_export_hf.py

modelopt/torch/export/quant_utils.py

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

codecov · 2026-02-06T23:08:05Z

Codecov Report

❌ Patch coverage is 19.23077% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.16%. Comparing base (d78797b) to head (9a0416c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...odelopt/torch/quantization/qtensor/nvfp4_tensor.py	16.00%	21 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #858      +/-   ##
==========================================
- Coverage   73.09%   72.16%   -0.93%     
==========================================
  Files         205      207       +2     
  Lines       22301    22680     +379     
==========================================
+ Hits        16300    16367      +67     
- Misses       6001     6313     +312

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 509-520: The static NVFP4 path uses a scale computed from the
untransposed weight which will mismatch when is_bmm_expert_weight is True;
update the branch that checks is_nvfp4_static to either (1) recompute/reshape
weight_scale using the transposed weight by calling
NVFP4QTensor.get_weights_scaling_factor (or
get_weights_scaling_factor_from_quantizer) on the transposed weight before
calling to_quantized_weight, or (2) add an explicit guard that raises a clear
error when is_bmm_expert_weight and isinstance(weight_quantizer,
NVFP4StaticQuantizer) to prevent misuse; modify the code around the
is_nvfp4_static check (referencing is_bmm_expert_weight, weight_quantizer,
NVFP4StaticQuantizer, NVFP4QTensor.get_weights_scaling_factor[_from_quantizer],
and to_quantized_weight) accordingly.

In `@modelopt/torch/quantization/config.py`:
- Around line 391-411: The new NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG config is
defined but not added to the exported choices set; update the choices collection
(the variable named choices) to include "NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG"
alongside the other NVFP4 entries so it becomes discoverable by algorithms.py
(which expects all supported quantization format names in choices). Locate the
choices definition and append the new config name in the same style/ordering as
the other NVFP4_* entries.

🧹 Nitpick comments (2)

modelopt/torch/quantization/qtensor/nvfp4_tensor.py (2)
55-58: Duck-typing check vs isinstance — inconsistency with other call sites.

_is_static_quantizer uses duck typing (hasattr + is not None), but quant_utils.py and unified_export_hf.py use isinstance(weight_quantizer, NVFP4StaticQuantizer). If any non-NVFP4StaticQuantizer object happens to carry a global_amax attribute, this check could produce false positives. Consider aligning on isinstance for consistency, or documenting why duck typing is preferred here.
Proposed fix
+    from modelopt.torch.quantization.nn import NVFP4StaticQuantizer
+
     `@classmethod`
     def _is_static_quantizer(cls, weight_quantizer) -> bool:
         """Check if the weight quantizer is a static NVFP4 quantizer with pre-computed amax."""
-        return hasattr(weight_quantizer, "global_amax") and weight_quantizer.global_amax is not None
+        return isinstance(weight_quantizer, NVFP4StaticQuantizer) and weight_quantizer.global_amax is not None
110-130: Zero-scale handling differs between static and dynamic paths.

In the static path, per_block_scale[per_block_scale == 0] = 1.0 is applied before the 448.0 / per_block_scale_max normalization (Line 118 vs 127). In the dynamic path (get_weights_scaling_factor, Line 167), the same sentinel is applied after the normalization by weights_scaling_factor_2.

For all-zero blocks this is harmless (quantized values are 0, so the scale is irrelevant during dequant), but the resulting FP8 scale values will differ between the two paths for those blocks. This could complicate debugging or round-trip comparisons.

To align, move the zero guard after normalization:
Align zero-handling with dynamic path
             per_block_scale = per_block_amax / 6.0
-            per_block_scale[per_block_scale == 0] = 1.0
 
             # Reshape per_block_scale to match weight's block structure
             num_blocks_per_row = weight.shape[-1] // block_size
             expected_shape = (*weight.shape[:-1], num_blocks_per_row)
             per_block_scale = per_block_scale.view(expected_shape)
 
             # Quantize scales to FP8
             if not keep_high_precision:
                 per_block_scale = (per_block_scale * 448.0 / per_block_scale_max).to(
                     torch.float8_e4m3fn
                 )
+            per_block_scale_float = per_block_scale.float()
+            per_block_scale_float[per_block_scale_float == 0] = 1.0
+            per_block_scale = per_block_scale_float.to(per_block_scale.dtype)

coderabbitai · 2026-02-06T23:37:17Z

modelopt/torch/export/unified_export_hf.py

+
+        # Check if this is a static NVFP4 quantizer (has pre-computed scales from MSE calibration)
+        # For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor
+        is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+
+        if not is_nvfp4_static:
+            # For dynamic NVFP4, compute scales from weights
+            weight_scale = NVFP4QTensor.get_weights_scaling_factor(
+                weight,
+                block_size=block_size,
+                weights_scaling_factor_2=weight_scale_2,
+            )[0]


⚠️ Potential issue | 🟠 Major

Potential shape mismatch for BMM-style expert weights with static NVFP4.

When is_bmm_expert_weight is True, the weight is transposed at Line 506–508 (e.g., from (E, in_dim, out_dim) → (E, out_dim, in_dim)). The dynamic path (Lines 516–520) correctly recomputes weight_scale from the transposed weight. However, the static path skips recomputation and uses the scale that was computed by get_weight_scaling_factor (Line 461) from the untransposed weight.

Since the static path in NVFP4QTensor.get_weights_scaling_factor_from_quantizer (nvfp4_tensor.py Line 121–123) reshapes the per-block scale using the weight's original shape, the scale would have shape (*untransposed_shape[:-1], num_blocks) which won't match the transposed weight layout expected by to_quantized_weight.

This would fail with a shape error if static NVFP4 quantizers are ever used with Llama4TextExperts or GptOssExperts. If that combination is currently not expected, a guard would prevent a confusing error later:

Proposed guard

# Check if this is a static NVFP4 quantizer (has pre-computed scales from MSE calibration) # For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer) + if is_nvfp4_static and is_bmm_expert_weight: + raise NotImplementedError( + "Static NVFP4 quantization is not yet supported for BMM-style expert weights " + "(Llama4TextExperts, GptOssExperts). Use dynamic NVFP4 quantization instead." + ) + if not is_nvfp4_static:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check if this is a static NVFP4 quantizer (has pre-computed scales from MSE calibration)

# For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor

is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)

if not is_nvfp4_static:

# For dynamic NVFP4, compute scales from weights

weight_scale = NVFP4QTensor.get_weights_scaling_factor(

weight,

block_size=block_size,

weights_scaling_factor_2=weight_scale_2,

)[0]

# Check if this is a static NVFP4 quantizer (has pre-computed scales from MSE calibration)

# For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor

is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)

if is_nvfp4_static and is_bmm_expert_weight:

raise NotImplementedError(

"Static NVFP4 quantization is not yet supported for BMM-style expert weights "

"(Llama4TextExperts, GptOssExperts). Use dynamic NVFP4 quantization instead."

)

if not is_nvfp4_static:

# For dynamic NVFP4, compute scales from weights

weight_scale = NVFP4QTensor.get_weights_scaling_factor(

weight,

block_size=block_size,

weights_scaling_factor_2=weight_scale_2,

)[0]

🤖 Prompt for AI Agents

In `@modelopt/torch/export/unified_export_hf.py` around lines 509 - 520, The static NVFP4 path uses a scale computed from the untransposed weight which will mismatch when is_bmm_expert_weight is True; update the branch that checks is_nvfp4_static to either (1) recompute/reshape weight_scale using the transposed weight by calling NVFP4QTensor.get_weights_scaling_factor (or get_weights_scaling_factor_from_quantizer) on the transposed weight before calling to_quantized_weight, or (2) add an explicit guard that raises a clear error when is_bmm_expert_weight and isinstance(weight_quantizer, NVFP4StaticQuantizer) to prevent misuse; modify the code around the is_nvfp4_static check (referencing is_bmm_expert_weight, weight_quantizer, NVFP4StaticQuantizer, NVFP4QTensor.get_weights_scaling_factor[_from_quantizer], and to_quantized_weight) accordingly.

@Fridah-nv could you fix this?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

We are not supporting static NVFP4 BMM yet, it won't be trigger in calibration algorithm, let me add a guard here

modelopt/torch/quantization/config.py

Edwardf0t1

@Fridah-nv Could we add a sample command in the PR description and the MMLU results you have?

Also, is the checkpoint format the same comparing static NVFP4 vs our default NVFP4?

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv · 2026-02-07T01:18:46Z

@Fridah-nv Could we add a sample command in the PR description and the MMLU results you have?

Added, thanks for reminding.

Also, is the checkpoint format the same comparing static NVFP4 vs our default NVFP4?

Yes, I checked that hf_quant_config.json and config.json are the same for the two formats.

cjluo-nv · 2026-02-09T22:52:33Z

modelopt/torch/export/quant_utils.py

+
+    # Handle NVFP4 variants (static or dynamic)
+    is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+    if is_nvfp4_static or quantization_format in [


what's the quantization_format for NVFP4StaticQuantizer? do we need is_nvfp4_static here?

I add this branch because I think we don't need _ensure_weight_quantizer_calibrated for NVFP4 static case. All NVFP4StaticQuantizer should be calibrated by the time of export

I'm not sure about the case for _ensure_weight_quantizer_calibrated to be effective. I understand activation quantizers might not be calibrated. But why weight quantizers might not be reached before export? If this is needed, I can ensure _ensure_weight_quantizer_calibrated work with NVFP4StaticQuantizer

I understand the case that the weight quantizers of some experts might not be calibrated now. Updated _ensure_weight_quantizer_calibrated to handle NVFP4StaticQuantizer

cjluo-nv · 2026-02-09T22:52:41Z

modelopt/torch/export/quant_utils.py

-    # Calibrate weight quantizer if amax is not set for all NVFP4 variants
-    if quantization_format in [
+    is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+    if is_nvfp4_static or quantization_format in [


cjluo-nv · 2026-02-09T22:54:32Z

modelopt/torch/export/unified_export_hf.py

+        # For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor
+        is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+
+        if not is_nvfp4_static:


do we need to handle the else condition?

The static NVFP4 case is already handled by L459-L462

else: sub_module.register_buffer( quantizer_attrs.weight_scale, get_weight_scaling_factor(sub_module, weight_name) )

Which calls NVFP4QTensor.get_weights_scaling_factor_from_quantizer that handles both static and dynamic NVFP4 case.

These lines L511-L514 for handling dynamic NVFP4 could be removed potentially, I just need to double check on the dynamic BMM case

Update: removed this if branch and add a warning before BMM quantizers handling.

modelopt/torch/quantization/qtensor/nvfp4_tensor.py

cjluo-nv · 2026-02-09T22:59:16Z

modelopt/torch/export/unified_export_hf.py

+
+        # Check if this is a static NVFP4 quantizer (has pre-computed scales from MSE calibration)
+        # For static NVFP4, weight_scale is already computed from static _amax values in get_weight_scaling_factor
+        is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+
+        if not is_nvfp4_static:
+            # For dynamic NVFP4, compute scales from weights
+            weight_scale = NVFP4QTensor.get_weights_scaling_factor(
+                weight,
+                block_size=block_size,
+                weights_scaling_factor_2=weight_scale_2,
+            )[0]


@Fridah-nv could you fix this?

Fridah-nv · 2026-02-06T02:13:10Z

modelopt/torch/export/quant_utils.py

+                for module in modules:
+                    if hasattr(module.weight_quantizer, "_global_amax"):
+                        module.weight_quantizer._global_amax = unified_global_amax
+


TRTLLM requires fusible linear layers to have same weight_scale_2 , so here I set global_amax to be the max value across all modules being fused. It's also what the dynamic path is doing.
After discussing with cursor, I think this won't have much impact to accuracy:

The quantization math: weights_scaling_factor_2 = _global_amax / (6.0 * 448.0) weights_scaling_factor = _amax / (6.0 * wsf2) = _amax * 448.0 / _global_amax Effective scale used in quantization: weights_scaling_factor * weights_scaling_factor_2 = _amax / 6.0

Since the effective scale is wsf * wsf2 = _amax / 6.0, the _amax values are determined purely by the weight values themselves through MSE calibration, not by _global_amax.

The _global_amax is just a separate parameter that determines how to split the representation between the two scale factors (wsf and wsf2), but since they multiply together and _global_amax cancels out, it has no effect on the actual quantization.

I think it would still have some effect since wsf and wsf2 are kept in different precision, but the gap should not be large. @realAsma Please let me know if it makes sense to you.

@Fridah-nv are you saying that. e.g. for QKV,

You keep the wsf1 not changed, but just pick the wsf2 of QKV to be the max of the individual wsf2?

Only the global_amax(wsf1) needs to be unified, wsf2 won't be updated. The concern above is that MSE algo ties wsf1 to wsf2, I'm worried that updating wsf1 only might affect accuracy. But it looks fine in math and in our experiements results.

modelopt/torch/export/quant_utils.py

modelopt/torch/export/unified_export_hf.py

Fridah-nv · 2026-02-24T18:58:29Z

modelopt/torch/export/quant_utils.py

+
+    # Handle NVFP4 variants (static or dynamic)
+    is_nvfp4_static = isinstance(weight_quantizer, NVFP4StaticQuantizer)
+    if is_nvfp4_static or quantization_format in [


I add this branch because I think we don't need _ensure_weight_quantizer_calibrated for NVFP4 static case. All NVFP4StaticQuantizer should be calibrated by the time of export

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

…NVFP4StaticQuantizer, update changelog Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

modelopt-bot · 2026-02-25T00:28:50Z

modelopt/torch/quantization/qtensor/nvfp4_tensor.py

@@ -52,12 +52,87 @@ def get_e2m1_bounds(cls, device):
            cls.e2m1_bounds_on_device[device] = e2m1_bounds.to(device)


Review Summary

Overall this PR looks good. The author has made significant improvements in the latest commits to consolidate NVFP4 logic into rather than scattering it across export utilities. The unified approach via follows the normal NVFP4 path convention well.

Major Points

1. Convention Alignment - GOOD ✅
The latest refactor moves logic to which is the right approach. This centralizes NVFP4-specific behavior in one place.

2. Static vs Dynamic Path Handling ✅
The check followed by branching to either static (pre-computed amax) or dynamic (compute from weights) is clean and follows existing patterns.

Responses to Existing Comments

On realAsma's config naming comment (line 391):
The rename to is good - it clarifies this is W4A4 (weights 4-bit, activations 4-bit).

On realAsma's scale computation suggestion (line ~302):
The current implementation uses:

This is mathematically equivalent to the suggestion and the final form is cleaner. The simplification works out to , then multiplied by 448 for FP8 quantization. This looks correct.

On cjluo-nv's isinstance question:
The author addressed this - is now sufficient. The attribute check was removed in later commits.

On global_amax vs _amax:

= per-block amax values (pre-computed during MSE calibration)

= per-tensor max for the global scale

The naming is clear in .

Minor Suggestions

Docstring consistency: The docstrings in the new methods are good and explain the dual path (static vs dynamic).

BMM expert handling: CodeRabbit's concern about BMM-style expert weights with static NVFP4 is valid - the warning in is appropriate as a temporary measure.

Test coverage: The added tests () properly validate that static and dynamic paths produce identical outputs when given the same calibration data.

Conclusion

LGTM. The PR successfully:

Adds static NVFP4 export support following existing conventions

Consolidates NVFP4 logic in

Maintains backward compatibility with dynamic quantizers

Includes appropriate tests

Addresses reviewer feedback from previous rounds

cjluo-nv · 2026-02-25T07:03:09Z

modelopt/torch/export/unified_export_hf.py

        for expert_type in ["Llama4TextExperts", "GptOssExperts"]
    )
+    if is_bmm_expert_weight and isinstance(weight_quantizer, NVFP4StaticQuantizer):
+        warnings.warn(


how about just throw exception here? Prefer failing fast instead of unexpected accuracy bug.

cjluo-nv · 2026-02-25T07:04:06Z

CHANGELOG.rst

 - Add LTX-2 and Wan2.2 (T2V) support in the diffusers quantization workflow.
 - Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
 - Add support for image-text data calibration in PTQ for Nemotron VL models.
+- Add support for advanced weight scale search for NVFP4 quantization and its export path.


this is not a 0.42 feature. Please move to 0.43

Copilot

Pull request overview

Adds support for exporting static NVFP4 (pre-calibrated) quantization into the unified Hugging Face checkpoint flow, enabling a deployment path for PTQ methods like MSE-based scale search.

Changes:

Add NVFP4 static-quantizer aware scale extraction and lazy calibration during export (quant_utils.py, nvfp4_tensor.py).
Introduce an NVFP4 W4A4 MSE (+ optional FP8 scale sweep) quantization config and expose it via hf_ptq.py as --qformat nvfp4_mse.
Expand unit/GPU tests to cover NVFP4 static quantizer export paths and unified HF export for the new qformat.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	Adds APIs to derive NVFP4 scaling factors from quantizers (static vs dynamic).
`modelopt/torch/export/quant_utils.py`	Adds lazy calibration for `NVFP4StaticQuantizer`, updates NVFP4 scaling-factor extraction, and threads `nvfp4_static` through config processing.
`modelopt/torch/export/unified_export_hf.py`	Imports `NVFP4StaticQuantizer` and warns about unsupported BMM expert weights for static NVFP4.
`modelopt/torch/quantization/config.py`	Adds `NVFP4_W4A4_WEIGHT_MSE_FP8_SWEEP_CFG`.
`examples/llm_ptq/hf_ptq.py`	Adds `nvfp4_mse` qformat option mapped to the new config.
`tests/_test_utils/torch/export/utils.py`	Extends `partial_nvfp4_config` to quantize an additional toy layer.
`tests/gpu/torch/export/test_export.py`	Updates expected NVFP4 exclude list to match the updated toy config.
`tests/unit/torch/export/test_get_quantization.py`	Adds a unit test covering static NVFP4 quantizer detection/config export.
`tests/gpu/torch/export/test_export_weight_gpu.py`	Adds a GPU test comparing dynamic vs static NVFP4 export results.
`tests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.py`	Adds unified HF export coverage for `nvfp4_mse`.
`CHANGELOG.rst`	Documents the new NVFP4 advanced scale search + export support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

modelopt/torch/export/unified_export_hf.py

tests/gpu/torch/export/test_export_weight_gpu.py

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv force-pushed the fridah/static-fp4-export branch from 9f69993 to df4e6a9 Compare February 5, 2026 19:02

realAsma reviewed Feb 5, 2026

View reviewed changes

modelopt/torch/quantization/config.py Outdated Show resolved Hide resolved

realAsma reviewed Feb 5, 2026

View reviewed changes

modelopt/torch/export/quant_utils.py Outdated Show resolved Hide resolved

realAsma reviewed Feb 5, 2026

View reviewed changes

modelopt/torch/export/quant_utils.py Outdated Show resolved Hide resolved

realAsma reviewed Feb 5, 2026

View reviewed changes

modelopt/torch/export/quant_utils.py Outdated Show resolved Hide resolved

cjluo-nv reviewed Feb 6, 2026

View reviewed changes

modelopt/torch/export/unified_export_hf.py Outdated Show resolved Hide resolved

modelopt/torch/export/quant_utils.py Outdated Show resolved Hide resolved

modelopt/torch/export/quant_utils.py Outdated Show resolved Hide resolved

Base automatically changed from asma/refactor-scale-sweep to main February 6, 2026 19:47

Fridah-nv added 4 commits February 6, 2026 21:59

export support for NVFP4 static

55c2d35

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

tmp:experimental config

642f99b

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

fix layer fusion

519dc2a

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

address reviewers feedback, delegate scaling factor calculation to NV…

e0606cb

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv force-pushed the fridah/static-fp4-export branch from c1ea842 to e0606cb Compare February 6, 2026 22:56

Fridah-nv marked this pull request as ready for review February 6, 2026 23:31

Fridah-nv requested review from a team as code owners February 6, 2026 23:31

Fridah-nv requested review from jingyu-ml and sychen52 February 6, 2026 23:31

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

Fridah-nv changed the title ~~Fridah/static fp4 export~~ support static NVFP4 HF export Feb 7, 2026

Edwardf0t1 reviewed Feb 7, 2026

View reviewed changes

update example script for export

9725c34

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv requested a review from a team as a code owner February 7, 2026 01:09

Fridah-nv requested a review from meenchen February 7, 2026 01:09

cjluo-nv reviewed Feb 9, 2026

View reviewed changes

Fridah-nv commented Feb 24, 2026

View reviewed changes

remove special case of NVFP4 static quantizer in general export logic

e407987

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv added 2 commits February 24, 2026 22:35

add unit tests, update _ensure_weight_quantizer_calibrated to handle …

f61e982

…NVFP4StaticQuantizer, update changelog Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Merge branch 'main' into fridah/static-fp4-export

972f898

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

modelopt-bot reviewed Feb 25, 2026

View reviewed changes

cjluo-nv requested a review from Copilot February 25, 2026 06:57

Copilot started reviewing on behalf of cjluo-nv February 25, 2026 06:58 View session

cjluo-nv reviewed Feb 25, 2026

View reviewed changes

Copilot AI reviewed Feb 25, 2026

View reviewed changes

modelopt/torch/export/unified_export_hf.py Show resolved Hide resolved

tests/gpu/torch/export/test_export_weight_gpu.py Show resolved Hide resolved

add more feedback

9a0416c

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

cjluo-nv approved these changes Feb 25, 2026

View reviewed changes

		@@ -52,12 +52,87 @@ def get_e2m1_bounds(cls, device):
		cls.e2m1_bounds_on_device[device] = e2m1_bounds.to(device)

Conversation

Fridah-nv commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Fridah-nv commented Feb 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fridah-nv commented Feb 5, 2026 •

edited

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading

coderabbitai bot Feb 6, 2026 •

edited

Loading