Support mixed-precision per layer quant config in config.json by Edwardf0t1 · Pull Request #929 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-02-24T20:06:19Z

What does this PR do?

Type of change: ?

Overview: Support mixed-precision per layer quant config in config.json, since it's the first-class source of truth in deployment FWs.

Usage

python3 -c "
from modelopt.torch.export.convert_hf_config import convert_hf_quant_config_format
import json

# Test 1: Existing FP8 case still works
fp8_config = {
    'producer': {'name': 'modelopt', 'version': '0.29.0'},
    'quantization': {
        'quant_algo': 'FP8',
        'kv_cache_quant_algo': 'FP8',
        'exclude_modules': ['lm_head'],
    },
}
result = convert_hf_quant_config_format(fp8_config)
print('=== FP8 (existing) ===')
print(json.dumps(result, indent=2))
assert result['quant_algo'] == 'FP8'
assert 'group_0' in result['config_groups']
assert result['ignore'] == ['lm_head']

# Test 2: Mixed precision
mixed_config = {
    'producer': {'name': 'modelopt', 'version': '0.29.0'},
    'quantization': {
        'quant_algo': 'MIXED_PRECISION',
        'kv_cache_quant_algo': 'FP8',
        'quantized_layers': {
            'model.layers.0.self_attn.q_proj': {'quant_algo': 'FP8'},
            'model.layers.0.self_attn.k_proj': {'quant_algo': 'FP8'},
            'model.layers.0.mlp.gate_proj': {'quant_algo': 'NVFP4', 'group_size': 16},
            'model.layers.0.mlp.up_proj': {'quant_algo': 'NVFP4', 'group_size': 16},
            'model.layers.1.self_attn.q_proj': {'quant_algo': 'FP8'},
        },
    },
}
result = convert_hf_quant_config_format(mixed_config)
print()
print('=== MIXED_PRECISION ===')
print(json.dumps(result, indent=2))

assert result['quant_algo'] == 'MIXED_PRECISION'
assert 'config_groups' in result
assert 'quantized_layers' in result
# Should have 2 groups: one for FP8 layers, one for NVFP4 layers
assert len(result['config_groups']) == 2

# Verify per-layer detail is preserved
assert 'model.layers.0.self_attn.q_proj' in result['quantized_layers']
assert result['quantized_layers']['model.layers.0.mlp.gate_proj']['quant_algo'] == 'NVFP4'

# Check that FP8 group has correct targets
for gname, gcfg in result['config_groups'].items():
    if gcfg.get('weights', {}).get('num_bits') == 8:
        assert len(gcfg['targets']) == 3  # 3 FP8 layers
    elif gcfg.get('weights', {}).get('num_bits') == 4:
        assert len(gcfg['targets']) == 2  # 2 NVFP4 layers

print()
print('All tests passed!')
"

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added support for MIXED_PRECISION quantization configurations in model export, enabling automatic aggregation of layers by their individual quantization settings.
- Enhanced quantization config handling to dynamically manage multiple configuration groups for complex quantization scenarios.
- Maintained backward compatibility with existing quantization algorithms.

copy-pr-bot · 2026-02-24T20:06:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-24T20:06:31Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces support for MIXED_PRECISION quantization algorithm in Hugging Face config conversion by adding a helper function that maps quantization algorithms to group configurations, and enhancing the main converter to aggregate layers by their quantization configs into multiple config groups.

Changes

Cohort / File(s)	Summary
Quantization Config Grouping `modelopt/torch/export/convert_hf_config.py`	Added `_quant_algo_to_group_config()` helper function to map quantization algorithms and group sizes to configuration dictionaries. Enhanced `convert_hf_quant_config_format()` to support MIXED_PRECISION by aggregating layers with matching quantization configs into separate config groups (group_0, group_1, etc.). Imported `defaultdict` from collections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Support mixed-precision per layer quant config in config.json' directly and clearly describes the main change: adding support for mixed-precision, per-layer quantization configuration in config.json files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zhiyu/mixed-precision-config-json

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

modelopt/torch/export/convert_hf_config.py (1)

162-181: DRY: Existing FP8/NVFP4 branches duplicate the new helper.

The inline config dicts for FP8 (lines 163-166) and NVFP4 (lines 171-178) duplicate the logic in _quant_algo_to_group_config. Reusing the helper keeps both paths consistent and avoids future drift (e.g., if FP8 config shape changes, you'd need to update two places).

Suggested refactor

     if quant_algo_value == "FP8":
-        config_group_details = {
-            "input_activations": {"dynamic": False, "num_bits": 8, "type": "float"},
-            "weights": {"dynamic": False, "num_bits": 8, "type": "float"},
-            "targets": ["Linear"],
-        }
+        config_group_details = _quant_algo_to_group_config("FP8")
+        config_group_details["targets"] = ["Linear"]
         new_config["config_groups"] = {"group_0": config_group_details}
     elif quant_algo_value == "NVFP4":
         group_size = original_quantization_details.get("group_size", 16)
-        config_group_details = {
-            "input_activations": {
-                "dynamic": False,
-                "num_bits": 4,
-                "type": "float",
-                "group_size": group_size,
-            },
-            "weights": {"dynamic": False, "num_bits": 4, "type": "float", "group_size": group_size},
-            "targets": ["Linear"],
-        }
+        config_group_details = _quant_algo_to_group_config("NVFP4", group_size)
+        config_group_details["targets"] = ["Linear"]
         new_config["config_groups"] = {"group_0": config_group_details}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/convert_hf_config.py` around lines 162 - 181, The FP8
and NVFP4 branches duplicate the config construction; replace the inline dicts
with calls to the existing helper _quant_algo_to_group_config to avoid
duplication: in the block that checks quant_algo_value == "FP8" and the block
for "NVFP4" use _quant_algo_to_group_config(quant_algo_value,
original_quantization_details) (or pass any required args used by the helper) to
produce config_group_details and then set new_config["config_groups"] =
{"group_0": config_group_details}; ensure you preserve group_size handling by
relying on the helper’s logic rather than rebuilding the dict inline.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/export/convert_hf_config.py`:
- Around line 54-71: The branch handling quant_algo in ("NVFP4_AWQ",
"W4A16_AWQ", "W4A8_AWQ") computes act_bits incorrectly by checking only "A8" in
quant_algo; update the act_bits logic to detect A16 explicitly (or parse the
numeric suffix after 'A') so "W4A16_AWQ" yields act_bits=16, "W4A8_AWQ" yields 8
and fall back to 4 otherwise; change the calculation near the quant_algo
variable in convert_hf_config.py and keep the returned dict structure unchanged.
- Around line 103-105: The fallback branch in convert_hf_config (the final else
that currently returns {"quant_algo": quant_algo}) produces a config shape
inconsistent with other branches (which return {"input_activations": {...},
"weights": {...}}) and can break downstream parsing; replace that silent
fallback with an explicit error: in the else branch (where quant_algo is
unsupported, e.g., during MIXED_PRECISION), raise a ValueError (including
quant_algo in the message) to refuse unsupported algorithms rather than
returning a structurally incompatible dict so callers must handle or convert to
a valid config.
- Around line 188-191: The grouping key creation using
tuple(sorted(layer_cfg.items())) will fail if layer_cfg contains nested
dicts/lists; replace this fragile approach by serializing layer_cfg to a
canonical JSON string (e.g., using json.dumps with sort_keys=True and compact
separators) to produce a stable, hashable key for algo_to_layers; update
convert_hf_config.py to import json and compute key = json.dumps(layer_cfg,
sort_keys=True, separators=(',', ':'), ensure_ascii=False) (or fall back to
repr(layer_cfg) if json serialization fails) when iterating over
quantized_layers to preserve deterministic grouping.
- Around line 198-201: The MIXED_PRECISION branch is incorrectly assigning layer
names (layer_names) to group_config["targets"] (used for module class matching)
which breaks compressed-tensors; change the assignment so
group_config["targets"] contains module class names (e.g.,
module.__class__.__name__) derived from the layer configs instead of the layer
keys, and if you still need the original layer name list preserve it under a
separate field (e.g., keep your existing quantized_layers or add
quantized_layer_names) so group_config (from _quant_algo_to_group_config)
retains class-name targets while layer_name lists live in a distinct key.

---

Nitpick comments:
In `@modelopt/torch/export/convert_hf_config.py`:
- Around line 162-181: The FP8 and NVFP4 branches duplicate the config
construction; replace the inline dicts with calls to the existing helper
_quant_algo_to_group_config to avoid duplication: in the block that checks
quant_algo_value == "FP8" and the block for "NVFP4" use
_quant_algo_to_group_config(quant_algo_value, original_quantization_details) (or
pass any required args used by the helper) to produce config_group_details and
then set new_config["config_groups"] = {"group_0": config_group_details}; ensure
you preserve group_size handling by relying on the helper’s logic rather than
rebuilding the dict inline.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75b5da9 and 7980ab7.

📒 Files selected for processing (1)

modelopt/torch/export/convert_hf_config.py

modelopt/torch/export/convert_hf_config.py

coderabbitai · 2026-02-24T20:13:31Z

modelopt/torch/export/convert_hf_config.py

+        for layer_name, layer_cfg in quantized_layers.items():
+            # Create a hashable key from the layer config
+            key = tuple(sorted(layer_cfg.items()))
+            algo_to_layers[key].append(layer_name)


⚠️ Potential issue | 🟡 Minor

tuple(sorted(layer_cfg.items())) will crash if any layer config value is a dict or list.

If a quantized_layers entry ever contains a nested structure (e.g., {"quant_algo": "X", "extra": {"key": "val"}}), tuple(sorted(...)) will raise TypeError because dicts/lists aren't hashable. Currently the expected payloads are flat, but this is fragile.

A safer approach is to serialize the config to a canonical JSON string for the grouping key:

Suggested defensive fix

+import json ... for layer_name, layer_cfg in quantized_layers.items(): - # Create a hashable key from the layer config - key = tuple(sorted(layer_cfg.items())) + # Create a hashable key from the layer config (handles nested values) + key = json.dumps(layer_cfg, sort_keys=True) algo_to_layers[key].append(layer_name) ... for idx, (config_key, layer_names) in enumerate(algo_to_layers.items()): - layer_cfg = dict(config_key) + layer_cfg = json.loads(config_key)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/convert_hf_config.py` around lines 188 - 191, The grouping key creation using tuple(sorted(layer_cfg.items())) will fail if layer_cfg contains nested dicts/lists; replace this fragile approach by serializing layer_cfg to a canonical JSON string (e.g., using json.dumps with sort_keys=True and compact separators) to produce a stable, hashable key for algo_to_layers; update convert_hf_config.py to import json and compute key = json.dumps(layer_cfg, sort_keys=True, separators=(',', ':'), ensure_ascii=False) (or fall back to repr(layer_cfg) if json serialization fails) when iterating over quantized_layers to preserve deterministic grouping.

coderabbitai · 2026-02-24T20:13:31Z

modelopt/torch/export/convert_hf_config.py

+
+            group_config = _quant_algo_to_group_config(algo, layer_group_size)
+            group_config["targets"] = sorted(layer_names)
+            config_groups[f"group_{idx}"] = group_config


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file to understand the full context wc -l modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 112

🏁 Script executed:

# Read the relevant section around lines 198-201 plus surrounding context sed -n '170,230p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2628

🏁 Script executed:

# Check how _quant_algo_to_group_config is defined and what it returns for different algos rg -n "_quant_algo_to_group_config" -A 20 -B 5 modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2474

🏁 Script executed:

# Search for "targets" usage in the file to understand the semantics rg -n "targets" -B 3 -A 3 modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1544

🏁 Script executed:

# Check if compressed-tensors is available and how it uses "targets" fd -e py -e md | xargs rg "targets" | grep -i "quant\|config" | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 886

🏁 Script executed:

# Check how quantized_layers is structured - trace back where it comes from rg -n "quantized_layers" -B 5 -A 5 modelopt/torch/export/convert_hf_config.py | head -80

Repository: NVIDIA/Model-Optimizer

Length of output: 1475

🏁 Script executed:

# Search for where quantized_layers gets populated in the original data rg -rn "quantized_layers" --type py modelopt/torch/export/ | grep -v "Binary" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 1180

🏁 Script executed:

# Check the structure of quantized_layers in the input to understand what layer_name looks like rg -rn "quantized_layers" --type py modelopt/torch/ -B 3 -A 3 | grep -A 10 "layer_name" | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1073

🏁 Script executed:

# Search for test files or examples that show the structure of quantized_layers fd -e py -path "*/test*" | xargs rg "quantized_layers" -B 2 -A 2 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 3656

🏁 Script executed:

# Look at more of the test to see the full context of quantized_layers structure sed -n '1,100p' tests/gpu/torch/export/test_export.py | grep -A 20 "quantized_layers"

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there's any downstream code that validates or processes "targets" field rg -rn "targets" --type py modelopt/ | grep -E "(config|quant)" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 844

🏁 Script executed:

# Check the exact current state of lines around 200 to confirm if it's "targets" or "n" sed -n '195,210p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 755

🏁 Script executed:

# Also check lines 163-180 to see what the FP8/NVFP4 branches use sed -n '160,182p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1133

🏁 Script executed:

# Check the docstring that mentions the field name sed -n '110,125p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 835

targets in MIXED_PRECISION branch contains layer names instead of module types, breaking quantization matching.

In FP8 and NVFP4 branches, targets correctly holds module class names (e.g., ["Linear"]), which compressed-tensors matches against module.__class__.__name__. However, the MIXED_PRECISION branch assigns layer names (e.g., "layer1", "layer3") to targets on line 200, which violates the documented contract in lines 115–117 and produces non-functional configs.

compressed-tensors will fail to match these layer names against module types. Either extract module class names from the layer configs or move the layer name list to a separate field like you already do for quantized_layers (line 205).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/convert_hf_config.py` around lines 198 - 201, The MIXED_PRECISION branch is incorrectly assigning layer names (layer_names) to group_config["targets"] (used for module class matching) which breaks compressed-tensors; change the assignment so group_config["targets"] contains module class names (e.g., module.__class__.__name__) derived from the layer configs instead of the layer keys, and if you still need the original layer name list preserve it under a separate field (e.g., keep your existing quantized_layers or add quantized_layer_names) so group_config (from _quant_algo_to_group_config) retains class-name targets while layer_name lists live in a distinct key.

codecov · 2026-02-24T22:22:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.09%. Comparing base (ef5a2df) to head (92a3fa7).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #929      +/-   ##
==========================================
+ Coverage   72.07%   72.09%   +0.01%     
==========================================
  Files         207      207              
  Lines       22691    22691              
==========================================
+ Hits        16355    16358       +3     
+ Misses       6336     6333       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds support for exporting mixed-precision, per-layer quantization settings into the Hugging Face config.json quantization config (via convert_hf_quant_config_format), aligning it with the compressed-tensors/llm-compressor-style config_groups layout while preserving per-layer detail.

Changes:

Introduces _quant_algo_to_group_config to map per-layer quant_algo settings to config_groups entries.
Adds a new MIXED_PRECISION conversion path that groups layers by identical per-layer quantization configs and emits multiple config_groups.
Preserves the original per-layer mapping under quantized_layers for consumers needing full detail.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T07:32:17Z

modelopt/torch/export/convert_hf_config.py

+    elif quant_algo_value == "MIXED_PRECISION":
+        quantized_layers = original_quantization_details.get("quantized_layers", {})
+
+        # Group layers by their unique quantization config so each distinct
+        # (quant_algo, group_size, ...) combination becomes one config_group.
+        algo_to_layers: dict[tuple, list[str]] = defaultdict(list)
+        for layer_name, layer_cfg in quantized_layers.items():
+            # Create a hashable key from the layer config
+            key = tuple(sorted(layer_cfg.items()))
+            algo_to_layers[key].append(layer_name)
+
+        config_groups: dict[str, Any] = {}
+        for idx, (config_key, layer_names) in enumerate(algo_to_layers.items()):
+            layer_cfg = dict(config_key)
+            algo = layer_cfg.get("quant_algo", "")
+            layer_group_size = layer_cfg.get("group_size")
+
+            group_config = _quant_algo_to_group_config(algo, layer_group_size)
+            group_config["targets"] = sorted(layer_names)
+            config_groups[f"group_{idx}"] = group_config
+
+        new_config["config_groups"] = config_groups
+        # Preserve the full per-layer detail for consumers that need it.
+        new_config["quantized_layers"] = quantized_layers


The new MIXED_PRECISION conversion branch introduces non-trivial behavior (layer grouping, config_groups synthesis, preserving quantized_layers) but there isn’t any automated test coverage for it. Since convert_hf_quant_config_format is already used in tests for existing FP8 export behavior, it would be good to add a unit test that asserts: (1) grouping creates the expected number of groups for a mixed config, and (2) per-layer fields like group_size are reflected correctly in the produced group configs.

Copilot · 2026-02-25T07:32:18Z

modelopt/torch/export/convert_hf_config.py

+
+        # Group layers by their unique quantization config so each distinct
+        # (quant_algo, group_size, ...) combination becomes one config_group.
+        algo_to_layers: dict[tuple, list[str]] = defaultdict(list)


Type annotation algo_to_layers: dict[tuple, list[str]] uses tuple without type parameters. With the repo’s strict mypy settings, this is typically flagged as “Missing type parameters for generic type 'tuple'”. Consider tightening this to the actual key type being used here (e.g., tuple[tuple[str, Any], ...]) so type checking stays clean.

Suggested change

algo_to_layers: dict[tuple, list[str]] = defaultdict(list)

algo_to_layers: dict[tuple[tuple[str, Any], ...], list[str]] = defaultdict(list)

Copilot · 2026-02-25T07:32:18Z

modelopt/torch/export/convert_hf_config.py

+    elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
+        gs = group_size or 128
+        weight_bits = 4
+        act_bits = 8 if "A8" in quant_algo else 4
+        return {
+            "input_activations": {
+                "dynamic": False,
+                "num_bits": act_bits,


In _quant_algo_to_group_config, W4A16_AWQ is treated the same as NVFP4_AWQ/W4A8_AWQ and assigns input_activations bits (act_bits becomes 4 here). But W4A16_* implies weight-only quantization with unquantized (16-bit) activations (see INT4_AWQ_CFG where *input_quantizer is disabled). This mapping will incorrectly advertise activation quantization in the exported config_groups. Consider handling W4A16_AWQ as a separate case (no input_activations, and appropriate weight config) instead of grouping it with the A8/NVFP4 AWQ variants.

Copilot · 2026-02-25T07:32:18Z

modelopt/torch/export/convert_hf_config.py

+            algo = layer_cfg.get("quant_algo", "")
+            layer_group_size = layer_cfg.get("group_size")
+
+            group_config = _quant_algo_to_group_config(algo, layer_group_size)


The function docstring notes that targets are PyTorch module types (e.g., "Linear" via module.__class__.__name__), but the new MIXED_PRECISION path sets targets to fully-qualified layer names (e.g., model.layers.0.self_attn.q_proj). Either update the docstring/commentary to reflect the MIXED_PRECISION behavior, or switch MIXED_PRECISION targets to the same targeting semantics used by the FP8/NVFP4 branches to avoid confusing downstream consumers.

Suggested change

group_config = _quant_algo_to_group_config(algo, layer_group_size)

group_config = _quant_algo_to_group_config(algo, layer_group_size)

# Note: in the MIXED_PRECISION path, `targets` is a list of fully-qualified

# layer *names* (e.g., "model.layers.0.self_attn.q_proj"), not module

# type names (e.g., "Linear" from `module.__class__.__name__`). This

# differs from other branches (such as FP8/NVFP4), and consumers that

# need per-layer details should use `quantized_layers` instead.

Copilot · 2026-02-25T07:32:18Z

modelopt/torch/export/convert_hf_config.py

+    elif quant_algo == "NVFP4":
+        gs = group_size or 16
+        return {
+            "input_activations": {
+                "dynamic": False,
+                "num_bits": 4,
+                "type": "float",
+                "group_size": gs,
+            },
+            "weights": {"dynamic": False, "num_bits": 4, "type": "float", "group_size": gs},
+        }
+    elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
+        gs = group_size or 128
+        weight_bits = 4


_quant_algo_to_group_config uses gs = group_size or <default>, which treats 0 as “missing” and silently replaces it with the default. Elsewhere in this file (e.g., the NVFP4 non-mixed path) group_size is read with .get(..., default) and would preserve an explicit 0. For consistency and to avoid hiding invalid inputs, consider checking group_size is None (or explicitly validating group_size > 0) instead of relying on truthiness.

jingyu-ml · 2026-02-25T06:22:03Z

modelopt/torch/export/convert_hf_config.py

+    elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
+        gs = group_size or 128
+        weight_bits = 4
+        act_bits = 8 if "A8" in quant_algo else 4


Is this a bug?
A8 not in W4A16_AWQ so the act_bits falls back to 4, but the act_bits should be 16

It's already addressed in the latest commit.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 marked this pull request as ready for review February 24, 2026 20:08

Edwardf0t1 requested a review from a team as a code owner February 24, 2026 20:08

Edwardf0t1 requested review from ChenhanYu and sychen52 February 24, 2026 20:08

sychen52 approved these changes Feb 24, 2026

View reviewed changes

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

Edwardf0t1 requested review from cjluo-nv and jingyu-ml February 24, 2026 22:28

sychen52 mentioned this pull request Feb 25, 2026

add mixed precision support for modelopt vllm-project/vllm#35047

Merged

5 tasks

cjluo-nv requested a review from Copilot February 25, 2026 07:27

Copilot started reviewing on behalf of cjluo-nv February 25, 2026 07:27 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Edwardf0t1 requested a review from realAsma February 25, 2026 21:43

jingyu-ml approved these changes Feb 26, 2026

View reviewed changes

Edwardf0t1 force-pushed the zhiyu/mixed-precision-config-json branch from 77b54c5 to 8852b12 Compare February 26, 2026 08:36

Edwardf0t1 added 3 commits February 26, 2026 08:46

add per layer quant config in config.json

0a14daf

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

add per layer quant config in config.json

c6b035d

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

address reviews

92a3fa7

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 force-pushed the zhiyu/mixed-precision-config-json branch from 8852b12 to 92a3fa7 Compare February 26, 2026 16:46

Edwardf0t1 enabled auto-merge (squash) February 26, 2026 19:36

Edwardf0t1 merged commit a6cbcba into main Feb 26, 2026
60 of 64 checks passed

Edwardf0t1 deleted the zhiyu/mixed-precision-config-json branch February 26, 2026 21:31

	algo_to_layers: dict[tuple, list[str]] = defaultdict(list)
	algo_to_layers: dict[tuple[tuple[str, Any], ...], list[str]] = defaultdict(list)

-            group_config = _quant_algo_to_group_config(algo, layer_group_size)
+            group_config = _quant_algo_to_group_config(algo, layer_group_size)
+            # Note: in the MIXED_PRECISION path, `targets` is a list of fully-qualified
+            # layer *names* (e.g., "model.layers.0.self_attn.q_proj"), not module
+            # type names (e.g., "Linear" from `module.__class__.__name__`). This
+            # differs from other branches (such as FP8/NVFP4), and consumers that
+            # need per-layer details should use `quantized_layers` instead.

Conversation

Edwardf0t1 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Edwardf0t1 commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

codecov bot commented Feb 24, 2026 •

edited

Loading