Add abstract base class for attention mechanisms with unified interface #8039

iseeyuan · 2025-01-29T19:22:14Z

Summary

Add abstract base class for attention mechanisms with unified interface.
It creates the interface to multiple Attention definitions, like NPU friendly attentions, or other attention types like Multi-Head Latent Attention (MLA) used in Deepseek.
A simple registry is provided to easily add and register a new attention class.

Moved the current attention implementation to attention.py and rename it to AttentionMHA.

Test plan

CI

pytorch-bot · 2025-01-29T19:22:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8039

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 2 Unrelated Failures

As of commit 9ccf542 with merge base c0676fe ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-eval_llama-mmlu-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-llava-runner-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

iseeyuan · 2025-01-29T19:45:42Z

examples/models/llama/attention.py

+        mask: Optional[torch.Tensor] = None,
+        input_pos: Optional[torch.Tensor] = None,
+        in_cache_state: Optional[Any] = None,
+        out_cache_state: Optional[Any] = None,


replace them with kwargs

directly using kwargs may break type-safe. Keep them as is and consider using TypedDict and Unpack for kwarg type checking later.

CypherpunkSamurai · 2025-01-29T20:45:03Z

what is the progress on this pr?

im currently trying to convert a distilled deekseek R1 to pte using example scripts

Let me know if I can help testing this out.

iseeyuan · 2025-01-29T21:02:16Z

@CypherpunkSamurai I'm trying to complete the refactor and land it this week. Let me create an issue of adding the MLA to this interface.

iseeyuan · 2025-01-30T16:17:18Z

@pytorchbot label "topic: not user facing"

jackzhxng

Is this ready for review? Also would probably tag with an actual release note tag since I assume this would be good to highlight in 0.6 release notes

sxu · 2025-01-30T21:29:53Z

examples/models/llama/attention.py

+from executorch.examples.models.llama.rope import Rope
+
+
+class Attention(nn.Module, ABC):


So far a specialized implementation is only used during lowering and on device, and it needs to be able to accept checkpoint from whatever definition was used during training. What do see as the usage pattern going forward? Is the AttentionMHA below the standard definition that specialization of this class needs to support?

@sxu

Is the AttentionMHA below the standard definition that specialization of this class needs to support?

Not necessarily. The attention type is added into the model args. Usually the model arg and checkpoint will be saved in one place. We use model arg to build the model, and load the checkpoint as state_dict. If the checkpoint does not match the model architecture there will be error. We don't break the standard process of PyTorch.

I see, we don't usual expect the training to be done on a specialized NPU implementation, but I guess we can tweak the state dict loading on a case by case basis.

iseeyuan · 2025-01-31T02:55:36Z

Is this ready for review? Also would probably tag with an actual release note tag since I assume this would be good to highlight in 0.6 release notes

@dvorjackz It's currently in the example model. I'm fine to promote this to extension/llm, rename it to llm_transformer, or simply transformer, and mark it user faced.

facebook-github-bot · 2025-01-31T14:21:28Z

@iseeyuan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ce (#8039) Summary: Add abstract base class for attention mechanisms with unified interface. It creates the interface to multiple Attention definitions, like NPU friendly attentions, or other attention types like Multi-Head Latent Attention (MLA) used in Deepseek. A simple registry is provided to easily add and register a new attention class. Moved the current attention implementation to attention.py and rename it to AttentionMHA. Test Plan: CI Differential Revision: D68956201 Pulled By: iseeyuan

facebook-github-bot · 2025-01-31T16:36:25Z

This pull request was exported from Phabricator. Differential Revision: D68956201

sxu

Looks reasonable to me. The cache, mask, and rope frequencies still need to passed down from the top level transformer and updates from each layer returned need to be returned in a follow up PR.

sxu · 2025-01-31T16:30:36Z

examples/models/llama/attention.py

+from executorch.examples.models.llama.rope import Rope
+
+
+class Attention(nn.Module, ABC):


I see, we don't usual expect the training to be done on a specialized NPU implementation, but I guess we can tweak the state dict loading on a case by case basis.

iseeyuan · 2025-01-31T17:43:23Z

Looks reasonable to me. The cache, mask, and rope frequencies still need to passed down from the top level transformer and updates from each layer returned need to be returned in a follow up PR.

Thanks @sxu ! I'll land this one when all CI pass, and please feel free to add further PRs to improve it.

…ce (#8039) Summary: Add abstract base class for attention mechanisms with unified interface. It creates the interface to multiple Attention definitions, like NPU friendly attentions, or other attention types like Multi-Head Latent Attention (MLA) used in Deepseek. A simple registry is provided to easily add and register a new attention class. Moved the current attention implementation to attention.py and rename it to AttentionMHA. Test Plan: CI Reviewed By: tarun292 Differential Revision: D68956201 Pulled By: iseeyuan

facebook-github-bot · 2025-01-31T21:37:51Z

This pull request was exported from Phabricator. Differential Revision: D68956201

…ce (#8039) Summary: Add abstract base class for attention mechanisms with unified interface. It creates the interface to multiple Attention definitions, like NPU friendly attentions, or other attention types like Multi-Head Latent Attention (MLA) used in Deepseek. A simple registry is provided to easily add and register a new attention class. Moved the current attention implementation to attention.py and rename it to AttentionMHA. Test Plan: CI Reviewed By: tarun292 Differential Revision: D68956201 Pulled By: iseeyuan

facebook-github-bot · 2025-01-31T22:19:49Z

This pull request was exported from Phabricator. Differential Revision: D68956201

…ce (#8039) Summary: Add abstract base class for attention mechanisms with unified interface. It creates the interface to multiple Attention definitions, like NPU friendly attentions, or other attention types like Multi-Head Latent Attention (MLA) used in Deepseek. A simple registry is provided to easily add and register a new attention class. Moved the current attention implementation to attention.py and rename it to AttentionMHA. Test Plan: CI Reviewed By: tarun292 Differential Revision: D68956201 Pulled By: iseeyuan

facebook-github-bot · 2025-01-31T23:00:52Z

This pull request was exported from Phabricator. Differential Revision: D68956201

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2025

iseeyuan commented Jan 29, 2025

View reviewed changes

iseeyuan mentioned this pull request Jan 29, 2025

Reproduce/enable DeepSeek R1 Distill Llama 8B #7981

Closed

iseeyuan force-pushed the attention branch 3 times, most recently from ce1b50c to 00ec564 Compare January 30, 2025 16:16

iseeyuan changed the title ~~[WIP] Add abstract base class for attention mechanisms with unified interface~~ Add abstract base class for attention mechanisms with unified interface Jan 30, 2025

pytorch-bot bot added the topic: not user facing label Jan 30, 2025

jackzhxng reviewed Jan 30, 2025

View reviewed changes

sxu reviewed Jan 30, 2025

View reviewed changes

iseeyuan force-pushed the attention branch 2 times, most recently from e29e337 to 433fabb Compare January 31, 2025 14:17

facebook-github-bot force-pushed the attention branch from 433fabb to 9c19d3c Compare January 31, 2025 16:35

facebook-github-bot added the fb-exported label Jan 31, 2025

sxu reviewed Jan 31, 2025

View reviewed changes

tarun292 approved these changes Jan 31, 2025

View reviewed changes

facebook-github-bot force-pushed the attention branch from 9c19d3c to 013ca61 Compare January 31, 2025 21:37

facebook-github-bot force-pushed the attention branch from 013ca61 to c3167c3 Compare January 31, 2025 22:19

facebook-github-bot force-pushed the attention branch from c3167c3 to 9ccf542 Compare January 31, 2025 23:00

jackzhxng added release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava and removed topic: not user facing labels Feb 1, 2025

facebook-github-bot merged commit a972e73 into main Feb 1, 2025
41 of 47 checks passed

facebook-github-bot deleted the attention branch February 1, 2025 01:55

		from executorch.examples.models.llama.rope import Rope


		class Attention(nn.Module, ABC):

Add abstract base class for attention mechanisms with unified interface #8039

Add abstract base class for attention mechanisms with unified interface #8039

Uh oh!

Conversation

iseeyuan commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8039

⏳ 1 Pending, 2 Unrelated Failures

Uh oh!

iseeyuan Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

iseeyuan Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

CypherpunkSamurai commented Jan 29, 2025

Uh oh!

iseeyuan commented Jan 29, 2025

Uh oh!

iseeyuan commented Jan 30, 2025

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

sxu Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

iseeyuan Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sxu Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

iseeyuan commented Jan 31, 2025

Uh oh!

facebook-github-bot commented Jan 31, 2025

Uh oh!

facebook-github-bot commented Jan 31, 2025

Uh oh!

sxu left a comment

Choose a reason for hiding this comment

Uh oh!

sxu Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

iseeyuan commented Jan 31, 2025

Uh oh!

facebook-github-bot commented Jan 31, 2025

Uh oh!

facebook-github-bot commented Jan 31, 2025

Uh oh!

facebook-github-bot commented Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

iseeyuan commented Jan 29, 2025 •

edited

Loading

pytorch-bot bot commented Jan 29, 2025 •

edited

Loading

iseeyuan Jan 31, 2025 •

edited

Loading