-
Notifications
You must be signed in to change notification settings - Fork 800
Add lora to static attention #16611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add lora to static attention #16611
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16611
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Unrelated FailureAs of commit e3c6b34 with merge base 33974d5 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
d1e19e2 to
6c3ab0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds LoRA (Low-Rank Adaptation) support to static attention for CoreML model export. The changes enable exporting Llama models with LoRA adapters by integrating adapter configurations, converting checkpoint formats, and updating linear layers to support LoRA.
Changes:
- Added LoRA configuration parameters (target_modules, lora_rank, lora_alpha) to StaticAttention class
- Updated export script to load and merge LoRA adapter weights with base model checkpoints
- Added filtering logic for quantization to handle both regular nn.Linear and LoRALinear layers
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| examples/models/llama/static_attention.py | Adds LoRA configuration and helper function to create linear/LoRALinear layers based on target modules |
| examples/models/llama/model_args.py | Updates comment to mention unsloth format support in addition to torchtune |
| examples/apple/coreml/llama/export_static_llm_coreml.py | Adds adapter checkpoint/config loading, weight conversion, renaming for static attention, and LoRALinear support in quantization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6c3ab0c to
55f3957
Compare
55f3957 to
081c8b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # LoRA weights (lora_a and lora_b) | ||
| for lora_suffix in ["lora_a.weight", "lora_b.weight"]: | ||
| if f"layers.{i}.attention.wq.{lora_suffix}" in checkpoint: | ||
| checkpoint[f"layers.{i}.attention.wqs.0.{lora_suffix}"] = checkpoint.pop( | ||
| f"layers.{i}.attention.wq.{lora_suffix}" | ||
| ) | ||
| if f"layers.{i}.attention.wk.{lora_suffix}" in checkpoint: | ||
| checkpoint[f"layers.{i}.attention.wks.0.{lora_suffix}"] = checkpoint.pop( | ||
| f"layers.{i}.attention.wk.{lora_suffix}" | ||
| ) | ||
| if f"layers.{i}.attention.wv.{lora_suffix}" in checkpoint: | ||
| checkpoint[f"layers.{i}.attention.wvs.0.{lora_suffix}"] = checkpoint.pop( | ||
| f"layers.{i}.attention.wv.{lora_suffix}" | ||
| ) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoRA weight renaming logic handles wq, wk, and wv but is missing handling for wo (output projection). According to convert_weights.py, o_proj LoRA weights are converted to layers.{}.attention.wo.lora_a.weight and layers.{}.attention.wo.lora_b.weight. These weights need to be preserved as-is (not renamed) since wo is not converted to a ModuleList in static attention, unlike wqs/wks/wvs. However, if there are LoRA weights for wo, they should be explicitly handled to ensure they're loaded correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c8b61da to
e3c6b34
Compare
|
@lucylq does this export the base model and the lora adapter as separate models? How does that work with weight sharing? It seems that the lora model would be as big as the base model in the flow in this PR? |
@metascroy This PR just adds support for lora in static attention and the export/runner scripts. There are changes to the export script that we need to make to construct a multi-method model, before applying weight-sharing. |
Summary
Add lora modules to static attention
Test plan
Export & run llama1b lora model. See test plan in #16606
File sizes of regular llama1b and lora llama1b.