Skip to content

Conversation

@lucylq
Copy link
Contributor

@lucylq lucylq commented Jan 15, 2026

Summary

Add lora modules to static attention

  1. Pass in adapter checkpoint & config
  2. Convert checkpoint to meta format
  3. Filter for LoraLinears in quantization
  4. Update linears to be LoraLinear, if adapter checkpoint/config are passed in

Test plan

Export & run llama1b lora model. See test plan in #16606

File sizes of regular llama1b and lora llama1b.

-rw-r--r--@ 1 lfq  staff   884356621 Jan 14 15:50 coreml_llama1b.pte
-rw-r--r--  1 lfq  staff   907086740 Jan 15 10:50 coreml-llama1b-lora.pte

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16611

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit e3c6b34 with merge base 33974d5 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@lucylq lucylq mentioned this pull request Jan 15, 2026
@lucylq lucylq force-pushed the lfq.add-lora-to-static-attention branch from d1e19e2 to 6c3ab0c Compare January 15, 2026 18:46
@lucylq lucylq marked this pull request as ready for review January 15, 2026 18:52
Copilot AI review requested due to automatic review settings January 15, 2026 18:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds LoRA (Low-Rank Adaptation) support to static attention for CoreML model export. The changes enable exporting Llama models with LoRA adapters by integrating adapter configurations, converting checkpoint formats, and updating linear layers to support LoRA.

Changes:

  • Added LoRA configuration parameters (target_modules, lora_rank, lora_alpha) to StaticAttention class
  • Updated export script to load and merge LoRA adapter weights with base model checkpoints
  • Added filtering logic for quantization to handle both regular nn.Linear and LoRALinear layers

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
examples/models/llama/static_attention.py Adds LoRA configuration and helper function to create linear/LoRALinear layers based on target modules
examples/models/llama/model_args.py Updates comment to mention unsloth format support in addition to torchtune
examples/apple/coreml/llama/export_static_llm_coreml.py Adds adapter checkpoint/config loading, weight conversion, renaming for static attention, and LoRALinear support in quantization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lucylq lucylq force-pushed the lfq.add-lora-to-static-attention branch from 6c3ab0c to 55f3957 Compare January 15, 2026 22:10
Copilot AI review requested due to automatic review settings January 15, 2026 22:12
@lucylq lucylq force-pushed the lfq.add-lora-to-static-attention branch from 55f3957 to 081c8b5 Compare January 15, 2026 22:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +191 to +203
# LoRA weights (lora_a and lora_b)
for lora_suffix in ["lora_a.weight", "lora_b.weight"]:
if f"layers.{i}.attention.wq.{lora_suffix}" in checkpoint:
checkpoint[f"layers.{i}.attention.wqs.0.{lora_suffix}"] = checkpoint.pop(
f"layers.{i}.attention.wq.{lora_suffix}"
)
if f"layers.{i}.attention.wk.{lora_suffix}" in checkpoint:
checkpoint[f"layers.{i}.attention.wks.0.{lora_suffix}"] = checkpoint.pop(
f"layers.{i}.attention.wk.{lora_suffix}"
)
if f"layers.{i}.attention.wv.{lora_suffix}" in checkpoint:
checkpoint[f"layers.{i}.attention.wvs.0.{lora_suffix}"] = checkpoint.pop(
f"layers.{i}.attention.wv.{lora_suffix}"
)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoRA weight renaming logic handles wq, wk, and wv but is missing handling for wo (output projection). According to convert_weights.py, o_proj LoRA weights are converted to layers.{}.attention.wo.lora_a.weight and layers.{}.attention.wo.lora_b.weight. These weights need to be preserved as-is (not renamed) since wo is not converted to a ModuleList in static attention, unlike wqs/wks/wvs. However, if there are LoRA weights for wo, they should be explicitly handled to ensure they're loaded correctly.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 15, 2026 22:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lucylq lucylq force-pushed the lfq.add-lora-to-static-attention branch from c8b61da to e3c6b34 Compare January 15, 2026 22:46
@metascroy
Copy link
Contributor

@lucylq does this export the base model and the lora adapter as separate models?

How does that work with weight sharing? It seems that the lora model would be as big as the base model in the flow in this PR?

@lucylq
Copy link
Contributor Author

lucylq commented Jan 16, 2026

@lucylq does this export the base model and the lora adapter as separate models?

How does that work with weight sharing? It seems that the lora model would be as big as the base model in the flow in this PR?

@metascroy This PR just adds support for lora in static attention and the export/runner scripts.

There are changes to the export script that we need to make to construct a multi-method model, before applying weight-sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants