Add missing sliding window param into ModelArgs #16687

DannyYuyang-quic · 2026-01-19T06:48:35Z

Summary

Fix follow-up to: #16684

add the sliding_window and local_rope_theta parameters to ModelArgs so Gemma2 configs load correctly in static llama flow.
integrate all local attention related config under ModelArgs for consistency.

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py TestExampleLLMScript.test_static_llm_model -m SM8750 -s ${SERIAL_NUM} -b build-android -a . --executorch_root . --model_name gemma2-2b

pytorch-bot · 2026-01-19T06:48:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16687

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6018b6b with merge base fed6ff1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-19T06:49:18Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

DannyYuyang-quic · 2026-01-19T07:06:40Z

Follow‑up to: #16684 — thanks to @mergennachin for adding the missing files for Gemma2.

@cccclai @mergennachin hi, while validating the Gemma2 config, I noticed that the recently introduced sliding_window parameter in 2b_config.json was not yet part of ModelArgs, which caused the Gemma2 config loading path to break in static llama flow ModelArgs(**json.load(f)). So this PR adds the missing field in ModelArgs.

In addition, since Transformers was upgraded to 5.0, it has changed how configs are loaded, which also makes maintenance more challenging. I think it was also a good moment to move all local attention related parameters into ModelArgs.

cc: @haowhsu-quic @jethroqti

DannyYuyang-quic · 2026-01-19T07:13:39Z

examples/models/gemma2/config/2b_config.json

-  "layer_types": ["local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global", "local", "global"],
-  "rope_local_base_freq": 10000.0
+  "layer_types": ["sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention", "sliding_attention", "full_attention"]


@cccclai @mergennachin
Regarding the layer_types added in #16684, I noticed that the current config uses:

"layer_types": ["local", "global", "local", "global", ...]

In this PR, I switched the names to:

"layer_types": ["sliding_attention", "full_attention", ...]

This change was only to preserve HuggingFace’s original naming, there is no other intention behind it.
we completely open to reverting back to "local" / "global" if that is the preferred naming style on your side.
Please let me know which naming you prefer, and I'm happy to keep it accordingly.

mergennachin

See inline

examples/qualcomm/oss_scripts/llama/model/static_llama.py

mergennachin · 2026-01-23T14:59:33Z

@DannyYuyang-quic

This is past the 1.1 branch cut, does it need to be cherry-picked? If so, please look at cherry-pick instructions at #16365

DannyYuyang-quic · 2026-01-24T06:33:19Z

@pytorchbot cherry-pick --onto release/1.1 -c critical

DannyYuyang-quic · 2026-01-24T06:40:31Z

This is past the 1.1 branch cut, does it need to be cherry-picked? If so, please look at cherry-pick instructions at #16365

Thanks for the cherry-pick instructions!
This needs to be cherry-picked, and I’ve already followed the instructions and applied the cherry-pick onto the 1.1 release.

mergennachin · 2026-01-25T16:59:02Z

You need to land this first and then cherry-pick

cccclai · 2026-01-26T20:18:58Z

can you rebase then we can land?

…on related parameters into ModelArgs

mergennachin · 2026-01-27T16:47:38Z

We won't cherry-pick to 1.1 fyi since this is too late at this point...

DannyYuyang-quic · 2026-01-28T02:55:36Z

@cccclai, @mergennachin
No problem. If users run into this issue, I’ll point them to this PR and ask them to reference this patch before the next release.
Thanks for the update!

DannyYuyang-quic requested review from cccclai and lucylq as code owners January 19, 2026 06:48

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 19, 2026

DannyYuyang-quic commented Jan 19, 2026

View reviewed changes

mergennachin reviewed Jan 20, 2026

View reviewed changes

examples/qualcomm/oss_scripts/llama/model/static_llama.py Outdated Show resolved Hide resolved

examples/qualcomm/oss_scripts/llama/model/static_llama.py Show resolved Hide resolved

examples/qualcomm/oss_scripts/llama/model/static_llama.py Outdated Show resolved Hide resolved

DannyYuyang-quic force-pushed the dev1/danny/integrate_local_attn_related_params branch from 17c8298 to e2b7c0f Compare January 22, 2026 14:13

mergennachin mentioned this pull request Jan 22, 2026

Sliding window issues in Gemma models #15593

Closed

mergennachin approved these changes Jan 22, 2026

View reviewed changes

IgorSwat mentioned this pull request Jan 23, 2026

Gemma 3 270M support, please software-mansion/react-native-executorch#642

Open

fix(gemma2): add missing sliding_window param and unify local-attenti…

6018b6b

…on related parameters into ModelArgs

DannyYuyang-quic force-pushed the dev1/danny/integrate_local_attn_related_params branch from e2b7c0f to 6018b6b Compare January 27, 2026 02:23

mergennachin merged commit 883af3f into pytorch:main Jan 28, 2026
144 checks passed

Add missing sliding window param into ModelArgs #16687

Add missing sliding window param into ModelArgs #16687

Uh oh!

Conversation

DannyYuyang-quic commented Jan 19, 2026

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16687

✅ No Failures

Uh oh!

github-actions bot commented Jan 19, 2026

This PR needs a release notes: label

Uh oh!

DannyYuyang-quic commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DannyYuyang-quic Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergennachin commented Jan 23, 2026

Uh oh!

DannyYuyang-quic commented Jan 24, 2026

Uh oh!

DannyYuyang-quic commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergennachin commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Jan 26, 2026

Uh oh!

mergennachin commented Jan 27, 2026

Uh oh!

DannyYuyang-quic commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jan 19, 2026 •

edited

Loading

This PR needs a `release notes:` label

DannyYuyang-quic commented Jan 19, 2026 •

edited

Loading

DannyYuyang-quic Jan 19, 2026 •

edited

Loading

DannyYuyang-quic commented Jan 24, 2026 •

edited

Loading

mergennachin commented Jan 25, 2026 •

edited

Loading

DannyYuyang-quic commented Jan 28, 2026 •

edited

Loading