Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model #16624

jethroqti · 2026-01-15T15:15:21Z

Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model

Summary:

e2e script for GA Static Gemma2-2B perf: 16a4w block quant token rate in kv mode: ~= 34.864699 tokens/sec (SM8650) acc: PPL ~= (fp:9.608 -> htp:11.446) in wikitext dataset
add Gemma2 2B instruct model params config
remove qk-norm related parts
add soft capping in two places: attention module and after model output
update README with Gemma2 End-to-End example
add unit test for Gemma2-2B
add parameters in class MultiScopeAwareLlamaModel to support static llama architecture required by Gemma2
used params: conv2d: 16a4w_block, atten_WV: 16a8w, block_size: 32, num_sharding=4

Test plan:

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SN} -H ${HOST} -m ${CHIPID} --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model gemma2-2b --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

Summary: Enable one GA model: Static Gemma2-2B - e2e script for GA Static Gemma2-2B perf: 16a4w block quant token rate in kv mode: ~= 34.864699 tokens/sec (SM8650) acc: PPL ~= (fp:9.608 -> htp:11.446) in wikitext dataset - add Gemma2 2B instruct model params config - remove qk-norm related parts - add soft capping in two places: attention module and after model output - update README with Gemma2 End-to-End example - add unit test for Gemma2-2B - add parameters in class MultiScopeAwareLlamaModel to support static llama architecture required by Gemma2 - used params: conv2d: 16a4w_block, atten_WV: 16a8w, block_size: 32, num_sharding=4 Test plan: python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SN} -H ${HOST} -m ${CHIPID} --temperature 0 --model_mode kv --max_seq_len 1024 --prefill_ar_len 128 --decoder_model gemma2-2b --prompt "I would like to learn python, could you teach me with a simple example?" --tasks wikitext --limit 1

pytorch-bot · 2026-01-15T15:15:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16624

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 1 Unrelated Failure

As of commit 34b58dd with merge base f680623 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / test-static-llama-qnn-linux (stories_110m) / linux-job (gh)
##[error]The operation was canceled.
pull / test-static-llama-qnn-linux (stories_260k_bc) / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / run-emulator (gh) (trunk failure)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jethroqti · 2026-01-15T15:16:08Z

@pytorchbot label "release notes: qualcomm"

jethroqti · 2026-01-15T15:24:09Z

This PR is to enable GA static Gemma2-2B instruct. Please have a look. Thanks

@cccclai @haowhsu-quic @DannyYuyang-quic

meta-codesync · 2026-01-16T00:06:14Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D90815003.

mergennachin · 2026-01-17T15:09:19Z

@cccclai @jethroqti test-static-llama-qnn-linux job started failing after this PR

https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=test-static-llama-qnn-linux&mergeEphemeralLF=true

cc @rascani

mergennachin · 2026-01-17T16:12:22Z

@jethroqti see #16684

Follow-up to #16624 where it didn't `git add` the necessary folder `examples/models/gemma2`. As a result, it didn't have the necessary config and conversion script when landing the PR Our CI is timing out due to ``` # test_qnn_delegate.py:6430-6433 p = subprocess.Popen(cmds, stdout=subprocess.DEVNULL) with Listener((self.ip, self.port)) as listener: conn = listener.accept() # ← BLOCKS forever waiting for connection where it already had failed with not found module p.communicate() ``` https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=test-static-llama-qnn-linux&mergeEphemeralLF=true

cccclai · 2026-01-17T18:57:41Z

Oh sorry I should pay closer attention to the CI - didn't catch the timeout issue and thank you for the forward fix

jethroqti · 2026-01-17T23:45:45Z

@jethroqti see #16684

Got it. I am handling now. Sorry for this.

jethroqti requested review from cccclai and lucylq as code owners January 15, 2026 15:15

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2026

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jan 15, 2026

cccclai approved these changes Jan 16, 2026

View reviewed changes

cccclai merged commit e725fe4 into pytorch:main Jan 16, 2026
150 of 156 checks passed

mergennachin mentioned this pull request Jan 17, 2026

Fix missing import in Gemma2-2B model #16684

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model #16624

Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model #16624

jethroqti commented Jan 15, 2026

Uh oh!

pytorch-bot bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

jethroqti commented Jan 15, 2026

Uh oh!

jethroqti commented Jan 15, 2026

Uh oh!

meta-codesync bot commented Jan 16, 2026

Uh oh!

Uh oh!

mergennachin commented Jan 17, 2026 •

edited

Loading

Uh oh!

mergennachin commented Jan 17, 2026

Uh oh!

cccclai commented Jan 17, 2026

Uh oh!

jethroqti commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model #16624

Qualcomm AI Engine Direct - Enable GA Static Gemma2-2B model #16624

Conversation

jethroqti commented Jan 15, 2026

Uh oh!

pytorch-bot bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16624

❌ 2 Cancelled Jobs, 1 Unrelated Failure

Uh oh!

jethroqti commented Jan 15, 2026

Uh oh!

jethroqti commented Jan 15, 2026

Uh oh!

meta-codesync bot commented Jan 16, 2026

Uh oh!

Uh oh!

mergennachin commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergennachin commented Jan 17, 2026

Uh oh!

cccclai commented Jan 17, 2026

Uh oh!

jethroqti commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jan 15, 2026 •

edited

Loading

mergennachin commented Jan 17, 2026 •

edited

Loading