Skip to content

Support mixed-precision per layer quant config in config.json#929

Merged
Edwardf0t1 merged 3 commits intomainfrom
zhiyu/mixed-precision-config-json
Feb 26, 2026
Merged

Support mixed-precision per layer quant config in config.json#929
Edwardf0t1 merged 3 commits intomainfrom
zhiyu/mixed-precision-config-json

Conversation

@Edwardf0t1
Copy link
Contributor

@Edwardf0t1 Edwardf0t1 commented Feb 24, 2026

What does this PR do?

Type of change: ?

Overview: Support mixed-precision per layer quant config in config.json, since it's the first-class source of truth in deployment FWs.

Usage

python3 -c "
from modelopt.torch.export.convert_hf_config import convert_hf_quant_config_format
import json

# Test 1: Existing FP8 case still works
fp8_config = {
    'producer': {'name': 'modelopt', 'version': '0.29.0'},
    'quantization': {
        'quant_algo': 'FP8',
        'kv_cache_quant_algo': 'FP8',
        'exclude_modules': ['lm_head'],
    },
}
result = convert_hf_quant_config_format(fp8_config)
print('=== FP8 (existing) ===')
print(json.dumps(result, indent=2))
assert result['quant_algo'] == 'FP8'
assert 'group_0' in result['config_groups']
assert result['ignore'] == ['lm_head']

# Test 2: Mixed precision
mixed_config = {
    'producer': {'name': 'modelopt', 'version': '0.29.0'},
    'quantization': {
        'quant_algo': 'MIXED_PRECISION',
        'kv_cache_quant_algo': 'FP8',
        'quantized_layers': {
            'model.layers.0.self_attn.q_proj': {'quant_algo': 'FP8'},
            'model.layers.0.self_attn.k_proj': {'quant_algo': 'FP8'},
            'model.layers.0.mlp.gate_proj': {'quant_algo': 'NVFP4', 'group_size': 16},
            'model.layers.0.mlp.up_proj': {'quant_algo': 'NVFP4', 'group_size': 16},
            'model.layers.1.self_attn.q_proj': {'quant_algo': 'FP8'},
        },
    },
}
result = convert_hf_quant_config_format(mixed_config)
print()
print('=== MIXED_PRECISION ===')
print(json.dumps(result, indent=2))

assert result['quant_algo'] == 'MIXED_PRECISION'
assert 'config_groups' in result
assert 'quantized_layers' in result
# Should have 2 groups: one for FP8 layers, one for NVFP4 layers
assert len(result['config_groups']) == 2

# Verify per-layer detail is preserved
assert 'model.layers.0.self_attn.q_proj' in result['quantized_layers']
assert result['quantized_layers']['model.layers.0.mlp.gate_proj']['quant_algo'] == 'NVFP4'

# Check that FP8 group has correct targets
for gname, gcfg in result['config_groups'].items():
    if gcfg.get('weights', {}).get('num_bits') == 8:
        assert len(gcfg['targets']) == 3  # 3 FP8 layers
    elif gcfg.get('weights', {}).get('num_bits') == 4:
        assert len(gcfg['targets']) == 2  # 2 NVFP4 layers

print()
print('All tests passed!')
"

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for MIXED_PRECISION quantization configurations in model export, enabling automatic aggregation of layers by their individual quantization settings.
    • Enhanced quantization config handling to dynamically manage multiple configuration groups for complex quantization scenarios.
    • Maintained backward compatibility with existing quantization algorithms.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces support for MIXED_PRECISION quantization algorithm in Hugging Face config conversion by adding a helper function that maps quantization algorithms to group configurations, and enhancing the main converter to aggregate layers by their quantization configs into multiple config groups.

Changes

Cohort / File(s) Summary
Quantization Config Grouping
modelopt/torch/export/convert_hf_config.py
Added _quant_algo_to_group_config() helper function to map quantization algorithms and group sizes to configuration dictionaries. Enhanced convert_hf_quant_config_format() to support MIXED_PRECISION by aggregating layers with matching quantization configs into separate config groups (group_0, group_1, etc.). Imported defaultdict from collections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support mixed-precision per layer quant config in config.json' directly and clearly describes the main change: adding support for mixed-precision, per-layer quantization configuration in config.json files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zhiyu/mixed-precision-config-json

Comment @coderabbitai help to get the list of available commands and usage tips.

@Edwardf0t1 Edwardf0t1 marked this pull request as ready for review February 24, 2026 20:08
@Edwardf0t1 Edwardf0t1 requested a review from a team as a code owner February 24, 2026 20:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
modelopt/torch/export/convert_hf_config.py (1)

162-181: DRY: Existing FP8/NVFP4 branches duplicate the new helper.

The inline config dicts for FP8 (lines 163-166) and NVFP4 (lines 171-178) duplicate the logic in _quant_algo_to_group_config. Reusing the helper keeps both paths consistent and avoids future drift (e.g., if FP8 config shape changes, you'd need to update two places).

Suggested refactor
     if quant_algo_value == "FP8":
-        config_group_details = {
-            "input_activations": {"dynamic": False, "num_bits": 8, "type": "float"},
-            "weights": {"dynamic": False, "num_bits": 8, "type": "float"},
-            "targets": ["Linear"],
-        }
+        config_group_details = _quant_algo_to_group_config("FP8")
+        config_group_details["targets"] = ["Linear"]
         new_config["config_groups"] = {"group_0": config_group_details}
     elif quant_algo_value == "NVFP4":
         group_size = original_quantization_details.get("group_size", 16)
-        config_group_details = {
-            "input_activations": {
-                "dynamic": False,
-                "num_bits": 4,
-                "type": "float",
-                "group_size": group_size,
-            },
-            "weights": {"dynamic": False, "num_bits": 4, "type": "float", "group_size": group_size},
-            "targets": ["Linear"],
-        }
+        config_group_details = _quant_algo_to_group_config("NVFP4", group_size)
+        config_group_details["targets"] = ["Linear"]
         new_config["config_groups"] = {"group_0": config_group_details}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/convert_hf_config.py` around lines 162 - 181, The FP8
and NVFP4 branches duplicate the config construction; replace the inline dicts
with calls to the existing helper _quant_algo_to_group_config to avoid
duplication: in the block that checks quant_algo_value == "FP8" and the block
for "NVFP4" use _quant_algo_to_group_config(quant_algo_value,
original_quantization_details) (or pass any required args used by the helper) to
produce config_group_details and then set new_config["config_groups"] =
{"group_0": config_group_details}; ensure you preserve group_size handling by
relying on the helper’s logic rather than rebuilding the dict inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/export/convert_hf_config.py`:
- Around line 54-71: The branch handling quant_algo in ("NVFP4_AWQ",
"W4A16_AWQ", "W4A8_AWQ") computes act_bits incorrectly by checking only "A8" in
quant_algo; update the act_bits logic to detect A16 explicitly (or parse the
numeric suffix after 'A') so "W4A16_AWQ" yields act_bits=16, "W4A8_AWQ" yields 8
and fall back to 4 otherwise; change the calculation near the quant_algo
variable in convert_hf_config.py and keep the returned dict structure unchanged.
- Around line 103-105: The fallback branch in convert_hf_config (the final else
that currently returns {"quant_algo": quant_algo}) produces a config shape
inconsistent with other branches (which return {"input_activations": {...},
"weights": {...}}) and can break downstream parsing; replace that silent
fallback with an explicit error: in the else branch (where quant_algo is
unsupported, e.g., during MIXED_PRECISION), raise a ValueError (including
quant_algo in the message) to refuse unsupported algorithms rather than
returning a structurally incompatible dict so callers must handle or convert to
a valid config.
- Around line 188-191: The grouping key creation using
tuple(sorted(layer_cfg.items())) will fail if layer_cfg contains nested
dicts/lists; replace this fragile approach by serializing layer_cfg to a
canonical JSON string (e.g., using json.dumps with sort_keys=True and compact
separators) to produce a stable, hashable key for algo_to_layers; update
convert_hf_config.py to import json and compute key = json.dumps(layer_cfg,
sort_keys=True, separators=(',', ':'), ensure_ascii=False) (or fall back to
repr(layer_cfg) if json serialization fails) when iterating over
quantized_layers to preserve deterministic grouping.
- Around line 198-201: The MIXED_PRECISION branch is incorrectly assigning layer
names (layer_names) to group_config["targets"] (used for module class matching)
which breaks compressed-tensors; change the assignment so
group_config["targets"] contains module class names (e.g.,
module.__class__.__name__) derived from the layer configs instead of the layer
keys, and if you still need the original layer name list preserve it under a
separate field (e.g., keep your existing quantized_layers or add
quantized_layer_names) so group_config (from _quant_algo_to_group_config)
retains class-name targets while layer_name lists live in a distinct key.

---

Nitpick comments:
In `@modelopt/torch/export/convert_hf_config.py`:
- Around line 162-181: The FP8 and NVFP4 branches duplicate the config
construction; replace the inline dicts with calls to the existing helper
_quant_algo_to_group_config to avoid duplication: in the block that checks
quant_algo_value == "FP8" and the block for "NVFP4" use
_quant_algo_to_group_config(quant_algo_value, original_quantization_details) (or
pass any required args used by the helper) to produce config_group_details and
then set new_config["config_groups"] = {"group_0": config_group_details}; ensure
you preserve group_size handling by relying on the helper’s logic rather than
rebuilding the dict inline.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75b5da9 and 7980ab7.

📒 Files selected for processing (1)
  • modelopt/torch/export/convert_hf_config.py

Comment on lines +188 to +191
for layer_name, layer_cfg in quantized_layers.items():
# Create a hashable key from the layer config
key = tuple(sorted(layer_cfg.items()))
algo_to_layers[key].append(layer_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

tuple(sorted(layer_cfg.items())) will crash if any layer config value is a dict or list.

If a quantized_layers entry ever contains a nested structure (e.g., {"quant_algo": "X", "extra": {"key": "val"}}), tuple(sorted(...)) will raise TypeError because dicts/lists aren't hashable. Currently the expected payloads are flat, but this is fragile.

A safer approach is to serialize the config to a canonical JSON string for the grouping key:

Suggested defensive fix
+import json
 ...
         for layer_name, layer_cfg in quantized_layers.items():
-            # Create a hashable key from the layer config
-            key = tuple(sorted(layer_cfg.items()))
+            # Create a hashable key from the layer config (handles nested values)
+            key = json.dumps(layer_cfg, sort_keys=True)
             algo_to_layers[key].append(layer_name)
 ...
         for idx, (config_key, layer_names) in enumerate(algo_to_layers.items()):
-            layer_cfg = dict(config_key)
+            layer_cfg = json.loads(config_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/convert_hf_config.py` around lines 188 - 191, The
grouping key creation using tuple(sorted(layer_cfg.items())) will fail if
layer_cfg contains nested dicts/lists; replace this fragile approach by
serializing layer_cfg to a canonical JSON string (e.g., using json.dumps with
sort_keys=True and compact separators) to produce a stable, hashable key for
algo_to_layers; update convert_hf_config.py to import json and compute key =
json.dumps(layer_cfg, sort_keys=True, separators=(',', ':'), ensure_ascii=False)
(or fall back to repr(layer_cfg) if json serialization fails) when iterating
over quantized_layers to preserve deterministic grouping.

Comment on lines +198 to +201

group_config = _quant_algo_to_group_config(algo, layer_group_size)
group_config["targets"] = sorted(layer_names)
config_groups[f"group_{idx}"] = group_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file to understand the full context
wc -l modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 112


🏁 Script executed:

# Read the relevant section around lines 198-201 plus surrounding context
sed -n '170,230p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2628


🏁 Script executed:

# Check how _quant_algo_to_group_config is defined and what it returns for different algos
rg -n "_quant_algo_to_group_config" -A 20 -B 5 modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2474


🏁 Script executed:

# Search for "targets" usage in the file to understand the semantics
rg -n "targets" -B 3 -A 3 modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1544


🏁 Script executed:

# Check if compressed-tensors is available and how it uses "targets"
fd -e py -e md | xargs rg "targets" | grep -i "quant\|config" | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 886


🏁 Script executed:

# Check how quantized_layers is structured - trace back where it comes from
rg -n "quantized_layers" -B 5 -A 5 modelopt/torch/export/convert_hf_config.py | head -80

Repository: NVIDIA/Model-Optimizer

Length of output: 1475


🏁 Script executed:

# Search for where quantized_layers gets populated in the original data
rg -rn "quantized_layers" --type py modelopt/torch/export/ | grep -v "Binary" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 1180


🏁 Script executed:

# Check the structure of quantized_layers in the input to understand what layer_name looks like
rg -rn "quantized_layers" --type py modelopt/torch/ -B 3 -A 3 | grep -A 10 "layer_name" | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1073


🏁 Script executed:

# Search for test files or examples that show the structure of quantized_layers
fd -e py -path "*/test*" | xargs rg "quantized_layers" -B 2 -A 2 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 3656


🏁 Script executed:

# Look at more of the test to see the full context of quantized_layers structure
sed -n '1,100p' tests/gpu/torch/export/test_export.py | grep -A 20 "quantized_layers"

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there's any downstream code that validates or processes "targets" field
rg -rn "targets" --type py modelopt/ | grep -E "(config|quant)" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 844


🏁 Script executed:

# Check the exact current state of lines around 200 to confirm if it's "targets" or "n"
sed -n '195,210p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 755


🏁 Script executed:

# Also check lines 163-180 to see what the FP8/NVFP4 branches use
sed -n '160,182p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1133


🏁 Script executed:

# Check the docstring that mentions the field name
sed -n '110,125p' modelopt/torch/export/convert_hf_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 835


targets in MIXED_PRECISION branch contains layer names instead of module types, breaking quantization matching.

In FP8 and NVFP4 branches, targets correctly holds module class names (e.g., ["Linear"]), which compressed-tensors matches against module.__class__.__name__. However, the MIXED_PRECISION branch assigns layer names (e.g., "layer1", "layer3") to targets on line 200, which violates the documented contract in lines 115–117 and produces non-functional configs.

compressed-tensors will fail to match these layer names against module types. Either extract module class names from the layer configs or move the layer name list to a separate field like you already do for quantized_layers (line 205).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/convert_hf_config.py` around lines 198 - 201, The
MIXED_PRECISION branch is incorrectly assigning layer names (layer_names) to
group_config["targets"] (used for module class matching) which breaks
compressed-tensors; change the assignment so group_config["targets"] contains
module class names (e.g., module.__class__.__name__) derived from the layer
configs instead of the layer keys, and if you still need the original layer name
list preserve it under a separate field (e.g., keep your existing
quantized_layers or add quantized_layer_names) so group_config (from
_quant_algo_to_group_config) retains class-name targets while layer_name lists
live in a distinct key.

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.09%. Comparing base (ef5a2df) to head (92a3fa7).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #929      +/-   ##
==========================================
+ Coverage   72.07%   72.09%   +0.01%     
==========================================
  Files         207      207              
  Lines       22691    22691              
==========================================
+ Hits        16355    16358       +3     
+ Misses       6336     6333       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for exporting mixed-precision, per-layer quantization settings into the Hugging Face config.json quantization config (via convert_hf_quant_config_format), aligning it with the compressed-tensors/llm-compressor-style config_groups layout while preserving per-layer detail.

Changes:

  • Introduces _quant_algo_to_group_config to map per-layer quant_algo settings to config_groups entries.
  • Adds a new MIXED_PRECISION conversion path that groups layers by identical per-layer quantization configs and emits multiple config_groups.
  • Preserves the original per-layer mapping under quantized_layers for consumers needing full detail.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +182 to +205
elif quant_algo_value == "MIXED_PRECISION":
quantized_layers = original_quantization_details.get("quantized_layers", {})

# Group layers by their unique quantization config so each distinct
# (quant_algo, group_size, ...) combination becomes one config_group.
algo_to_layers: dict[tuple, list[str]] = defaultdict(list)
for layer_name, layer_cfg in quantized_layers.items():
# Create a hashable key from the layer config
key = tuple(sorted(layer_cfg.items()))
algo_to_layers[key].append(layer_name)

config_groups: dict[str, Any] = {}
for idx, (config_key, layer_names) in enumerate(algo_to_layers.items()):
layer_cfg = dict(config_key)
algo = layer_cfg.get("quant_algo", "")
layer_group_size = layer_cfg.get("group_size")

group_config = _quant_algo_to_group_config(algo, layer_group_size)
group_config["targets"] = sorted(layer_names)
config_groups[f"group_{idx}"] = group_config

new_config["config_groups"] = config_groups
# Preserve the full per-layer detail for consumers that need it.
new_config["quantized_layers"] = quantized_layers
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MIXED_PRECISION conversion branch introduces non-trivial behavior (layer grouping, config_groups synthesis, preserving quantized_layers) but there isn’t any automated test coverage for it. Since convert_hf_quant_config_format is already used in tests for existing FP8 export behavior, it would be good to add a unit test that asserts: (1) grouping creates the expected number of groups for a mixed config, and (2) per-layer fields like group_size are reflected correctly in the produced group configs.

Copilot uses AI. Check for mistakes.

# Group layers by their unique quantization config so each distinct
# (quant_algo, group_size, ...) combination becomes one config_group.
algo_to_layers: dict[tuple, list[str]] = defaultdict(list)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotation algo_to_layers: dict[tuple, list[str]] uses tuple without type parameters. With the repo’s strict mypy settings, this is typically flagged as “Missing type parameters for generic type 'tuple'”. Consider tightening this to the actual key type being used here (e.g., tuple[tuple[str, Any], ...]) so type checking stays clean.

Suggested change
algo_to_layers: dict[tuple, list[str]] = defaultdict(list)
algo_to_layers: dict[tuple[tuple[str, Any], ...], list[str]] = defaultdict(list)

Copilot uses AI. Check for mistakes.
Comment on lines 54 to 61
elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
gs = group_size or 128
weight_bits = 4
act_bits = 8 if "A8" in quant_algo else 4
return {
"input_activations": {
"dynamic": False,
"num_bits": act_bits,
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _quant_algo_to_group_config, W4A16_AWQ is treated the same as NVFP4_AWQ/W4A8_AWQ and assigns input_activations bits (act_bits becomes 4 here). But W4A16_* implies weight-only quantization with unquantized (16-bit) activations (see INT4_AWQ_CFG where *input_quantizer is disabled). This mapping will incorrectly advertise activation quantization in the exported config_groups. Consider handling W4A16_AWQ as a separate case (no input_activations, and appropriate weight config) instead of grouping it with the A8/NVFP4 AWQ variants.

Copilot uses AI. Check for mistakes.
algo = layer_cfg.get("quant_algo", "")
layer_group_size = layer_cfg.get("group_size")

group_config = _quant_algo_to_group_config(algo, layer_group_size)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function docstring notes that targets are PyTorch module types (e.g., "Linear" via module.__class__.__name__), but the new MIXED_PRECISION path sets targets to fully-qualified layer names (e.g., model.layers.0.self_attn.q_proj). Either update the docstring/commentary to reflect the MIXED_PRECISION behavior, or switch MIXED_PRECISION targets to the same targeting semantics used by the FP8/NVFP4 branches to avoid confusing downstream consumers.

Suggested change
group_config = _quant_algo_to_group_config(algo, layer_group_size)
group_config = _quant_algo_to_group_config(algo, layer_group_size)
# Note: in the MIXED_PRECISION path, `targets` is a list of fully-qualified
# layer *names* (e.g., "model.layers.0.self_attn.q_proj"), not module
# type names (e.g., "Linear" from `module.__class__.__name__`). This
# differs from other branches (such as FP8/NVFP4), and consumers that
# need per-layer details should use `quantized_layers` instead.

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 56
elif quant_algo == "NVFP4":
gs = group_size or 16
return {
"input_activations": {
"dynamic": False,
"num_bits": 4,
"type": "float",
"group_size": gs,
},
"weights": {"dynamic": False, "num_bits": 4, "type": "float", "group_size": gs},
}
elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
gs = group_size or 128
weight_bits = 4
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_quant_algo_to_group_config uses gs = group_size or <default>, which treats 0 as “missing” and silently replaces it with the default. Elsewhere in this file (e.g., the NVFP4 non-mixed path) group_size is read with .get(..., default) and would preserve an explicit 0. For consistency and to avoid hiding invalid inputs, consider checking group_size is None (or explicitly validating group_size > 0) instead of relying on truthiness.

Copilot uses AI. Check for mistakes.
@Edwardf0t1 Edwardf0t1 requested a review from realAsma February 25, 2026 21:43
elif quant_algo in ("NVFP4_AWQ", "W4A16_AWQ", "W4A8_AWQ"):
gs = group_size or 128
weight_bits = 4
act_bits = 8 if "A8" in quant_algo else 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug?
A8 not in W4A16_AWQ so the act_bits falls back to 4, but the act_bits should be 16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already addressed in the latest commit.

@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/mixed-precision-config-json branch from 77b54c5 to 8852b12 Compare February 26, 2026 08:36
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/mixed-precision-config-json branch from 8852b12 to 92a3fa7 Compare February 26, 2026 16:46
@Edwardf0t1 Edwardf0t1 enabled auto-merge (squash) February 26, 2026 19:36
@Edwardf0t1 Edwardf0t1 merged commit a6cbcba into main Feb 26, 2026
60 of 64 checks passed
@Edwardf0t1 Edwardf0t1 deleted the zhiyu/mixed-precision-config-json branch February 26, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants