Skip to content

Add et version of TorchTune MHA for swapping with custom op #5912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Oct 5, 2024

DRAFT

Summary

Add version of TorchTune MHA which factors out the transposes, repeated interleaves, kv cache updates, and sdpa torch ops so that they can be replaced by the custom sdpa_with_kv_cache op.

Command to export:

python -m examples.models.llama2.export_llama --model llama3_2_vision --checkpoint examples/models/llama3_2_vision/consolidated.pth --params examples/models/llama3_2_vision/params/demo_config.json -kv -X -d bf16 --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --use_kv_cache --quantize_kv_cache --use_sdpa_with_kv_cache]$ ./install_requirements.sh --pybind; python -m examples.models.llama2.export_llama --model llama3_2_vision --checkpoint examples/models/llama3_2_vision/consolidated.pth --params examples/models/llama3_2_vision/params/demo_config.json -kv -X -d bf16 --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte

Test plan

Tested eager and ExecuTorch executions with test plan described in #6610

PR chain:

Copy link

pytorch-bot bot commented Oct 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5912

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 3145bde with merge base 8f9fb7e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 5, 2024
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Differential Revision: D64027696

Pulled By: dvorjackz
@jackzhxng jackzhxng force-pushed the jz/tt-llama-3 branch 2 times, most recently from 1cbed8a to 80aa6d1 Compare October 8, 2024 20:02
jackzhxng added a commit that referenced this pull request Oct 8, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Differential Revision: D64027696

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Reviewed By: tarun292

Differential Revision: D64027696

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Reviewed By: tarun292

Differential Revision: D64027696

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Reviewed By: tarun292

Differential Revision: D64027696

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Reviewed By: tarun292

Differential Revision: D64027696

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2024
Summary:
For situations where the forward has non-position arguments, such as https://github.com/pytorch/torchtune/blob/3c450ef5f1fbe8237f899e942fd5222491a47ca7/torchtune/modules/transformer.py#L519

PR chain:
- **YOU ARE HERE ~>** [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Pull Request resolved: #5765

Test Plan:
Exported Stories110M model.
```
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```

Reviewed By: tarun292

Differential Revision: D64027696

Pulled By: dvorjackz

fbshipit-source-id: 15ecfb458c6194159140d4c601e5443a2e524fdc
facebook-github-bot pushed a commit that referenced this pull request Oct 11, 2024
Summary:
- Removes redundant steps in the Llama2 export
- Factors out checkpointing to be shared with future Llama models (namely 3.2 multimodal)
- Comments and orders code more clearly

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- **YOU ARE HERE ~>** [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Ensure export + eval is similar before and after for Stories 110M:
```
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len 2048 --limit 1000
```


Before:
```
wikitext: {'word_perplexity,none': 14464.645927166595, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.99788806086652, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.5844545973083983, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

After:
```
wikitext: {'word_perplexity,none': 14464.299192404438, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.997861173678705, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.584448130015399, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

Differential Revision: D64145852

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 14, 2024
Summary:
- Removes redundant steps in the Llama2 export
- Factors out checkpointing to be shared with future Llama models (namely 3.2 multimodal)
- Comments and orders code more clearly

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- **YOU ARE HERE ~>** [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Ensure export + eval is similar before and after for Stories 110M:
```
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len 2048 --limit 1000
```


Before:
```
wikitext: {'word_perplexity,none': 14464.645927166595, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.99788806086652, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.5844545973083983, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

After:
```
wikitext: {'word_perplexity,none': 14464.299192404438, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.997861173678705, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.584448130015399, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

Reviewed By: dbort

Differential Revision: D64145852

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 15, 2024
Summary:
- Removes redundant steps in the Llama2 export
- Factors out checkpointing to be shared with future Llama models (namely 3.2 multimodal)
- Comments and orders code more clearly

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- **YOU ARE HERE ~>** [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)


Test Plan:
Ensure export + eval is similar before and after for Stories 110M:
```
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len 2048 --limit 1000
```


Before:
```
wikitext: {'word_perplexity,none': 14464.645927166595, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.99788806086652, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.5844545973083983, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

After:
```
wikitext: {'word_perplexity,none': 14464.299192404438, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.997861173678705, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.584448130015399, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

Reviewed By: dbort

Differential Revision: D64145852

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Oct 15, 2024
Summary:
- Removes redundant steps in the Llama2 export
- Factors out checkpointing to be shared with future Llama models (namely 3.2 multimodal)
- Comments and orders code more clearly

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- **YOU ARE HERE ~>** [Llama2 model cleanup](#5859)
- [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Pull Request resolved: #5859

Test Plan:
Ensure export + eval is similar before and after for Stories 110M:
```
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len 2048 --limit 1000
```

Before:
```
wikitext: {'word_perplexity,none': 14464.645927166595, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.99788806086652, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.5844545973083983, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

After:
```
wikitext: {'word_perplexity,none': 14464.299192404438, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 5.997861173678705, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.584448130015399, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```

Reviewed By: malfet, dbort

Differential Revision: D64145852

Pulled By: dvorjackz

fbshipit-source-id: daeee834955e154e7c8262ce776bd3039991027d
@jackzhxng jackzhxng changed the base branch from jz/tt-llama-2 to jz/native-runner-tt November 1, 2024 18:52
facebook-github-bot pushed a commit that referenced this pull request Nov 11, 2024
Summary:
Specify model to export in the CLI.


Test Plan:
Exported the stories 110M model.
```
python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv
```

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Runner changes for TorchTune Llama3.2 vision text decoder](#6610)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Differential Revision: D65612837

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Nov 12, 2024
Summary:
Specify model to export in the CLI.


Test Plan:
Exported the stories 110M model.
```
python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv
```

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Runner changes for TorchTune Llama3.2 vision text decoder](#6610)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Reviewed By: helunwencser

Differential Revision: D65612837

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Nov 12, 2024
Summary:
Specify model to export in the CLI.


Test Plan:
Exported the stories 110M model.
```
python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv
```

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Runner changes for TorchTune Llama3.2 vision text decoder](#6610)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Reviewed By: helunwencser

Differential Revision: D65612837

Pulled By: dvorjackz
facebook-github-bot pushed a commit that referenced this pull request Nov 13, 2024
Summary:
Specify model to export in the CLI.


Test Plan:
Exported the stories 110M model.
```
python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv
```

PR chain:
- [Add kwarg example inputs to eager model base](#5765)
- [Llama2 model cleanup](#5859)
- **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910)
- [Export TorchTune llama3_2_vision in ET](#5911)
- [Runner changes for TorchTune Llama3.2 vision text decoder](#6610)
- [Add et version of TorchTune MHA for swapping with custom op](#5912)

Reviewed By: helunwencser

Differential Revision: D65612837

Pulled By: dvorjackz
@jackzhxng jackzhxng closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants