Add lora to static attention #16611

lucylq · 2026-01-15T00:41:46Z

Summary

Add lora modules to static attention

Pass in adapter checkpoint & config
Convert checkpoint to meta format
Filter for LoraLinears in quantization
Update linears to be LoraLinear, if adapter checkpoint/config are passed in

Test plan

Export & run llama1b lora model. See test plan in #16606

File sizes of regular llama1b and lora llama1b.

-rw-r--r--@ 1 lfq  staff   884356621 Jan 14 15:50 coreml_llama1b.pte
-rw-r--r--  1 lfq  staff   907086740 Jan 15 10:50 coreml-llama1b-lora.pte

pytorch-bot · 2026-01-15T00:41:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16611

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit e3c6b34 with merge base 33974d5 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/llama/static_attention.py:
pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 255
Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t e3ce587bf00ef3e2af9829783b5c0d4b55477ab9f7c030751fead4ebde7b5679 /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-models-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-15T00:47:28Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR adds LoRA (Low-Rank Adaptation) support to static attention for CoreML model export. The changes enable exporting Llama models with LoRA adapters by integrating adapter configurations, converting checkpoint formats, and updating linear layers to support LoRA.

Changes:

Added LoRA configuration parameters (target_modules, lora_rank, lora_alpha) to StaticAttention class
Updated export script to load and merge LoRA adapter weights with base model checkpoints
Added filtering logic for quantization to handle both regular nn.Linear and LoRALinear layers

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
examples/models/llama/static_attention.py	Adds LoRA configuration and helper function to create linear/LoRALinear layers based on target modules
examples/models/llama/model_args.py	Updates comment to mention unsloth format support in addition to torchtune
examples/apple/coreml/llama/export_static_llm_coreml.py	Adds adapter checkpoint/config loading, weight conversion, renaming for static attention, and LoRALinear support in quantization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/models/llama/static_attention.py