@@ -73,30 +73,31 @@ LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials
73
73
74
74
| Model | Model size | Author | Reference |
75
75
| ----| ----| ----| ----|
76
- | CodeGemma | 7B | Google | [ Google Team, Google Deepmind] ( https://ai.google.dev/gemma/docs/codegemma ) |
77
- | Code Llama | 7B, 13B, 34B, 70B | Meta AI | [ Rozière et al. 2023] ( https://arxiv.org/abs/2308.12950 ) |
78
- | Danube2 | 1.8B | H2O.ai | [ H2O.ai] ( https://h2o.ai/platform/danube-1-8b/ ) |
79
- | Dolly | 3B, 7B, 12B | Databricks | [ Conover et al. 2023] ( https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm ) |
80
- | Falcon | 7B, 40B, 180B | TII UAE | [ TII 2023] ( https://falconllm.tii.ae ) |
81
- | FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models ) |
82
- | Function Calling Llama 2 | 7B | Trelis | [ Trelis et al. 2023] ( https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2 ) |
83
- | Gemma | 2B, 7B | Google | [ Google Team, Google Deepmind] ( https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf ) |
84
- | Llama 2 | 7B, 13B, 70B | Meta AI | [ Touvron et al. 2023] ( https://arxiv.org/abs/2307.09288 ) |
85
- | Llama 3 | 8B, 70B | Meta AI | [ Meta AI 2024] ( https://github.com/meta-llama/llama3 ) |
86
- | LongChat | 7B, 13B | LMSYS | [ LongChat Team 2023] ( https://lmsys.org/blog/2023-06-29-longchat/ ) |
87
- | Mixtral MoE | 8x7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/mixtral-of-experts/ ) |
88
- | Mistral | 7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
89
- | Nous-Hermes | 7B, 13B, 70B | NousResearch | [ Org page] ( https://huggingface.co/NousResearch ) |
90
- | OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [ Geng & Liu 2023] ( https://github.com/openlm-research/open_llama ) |
91
- | Phi | 1.3B, 2.7B | Microsoft Research | [ Li et al. 2023] ( https://arxiv.org/abs/2309.05463 ) |
76
+ | CodeGemma | 7B | Google | [ Google Team, Google Deepmind] ( https://ai.google.dev/gemma/docs/codegemma ) |
77
+ | Code Llama | 7B, 13B, 34B, 70B | Meta AI | [ Rozière et al. 2023] ( https://arxiv.org/abs/2308.12950 ) |
78
+ | Danube2 | 1.8B | H2O.ai | [ H2O.ai] ( https://h2o.ai/platform/danube-1-8b/ ) |
79
+ | Dolly | 3B, 7B, 12B | Databricks | [ Conover et al. 2023] ( https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm ) |
80
+ | Falcon | 7B, 40B, 180B | TII UAE | [ TII 2023] ( https://falconllm.tii.ae ) |
81
+ | FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models ) |
82
+ | Function Calling Llama 2 | 7B | Trelis | [ Trelis et al. 2023] ( https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2 ) |
83
+ | Gemma | 2B, 7B | Google | [ Google Team, Google Deepmind] ( https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf ) |
84
+ | Llama 2 | 7B, 13B, 70B | Meta AI | [ Touvron et al. 2023] ( https://arxiv.org/abs/2307.09288 ) |
85
+ | Llama 3 | 8B, 70B | Meta AI | [ Meta AI 2024] ( https://github.com/meta-llama/llama3 ) |
86
+ | LongChat | 7B, 13B | LMSYS | [ LongChat Team 2023] ( https://lmsys.org/blog/2023-06-29-longchat/ ) |
87
+ | MicroLlama | 300M | Ken Wang | [ MicroLlama repo] ( https://github.com/keeeeenw/MicroLlama )
88
+ | Mixtral MoE | 8x7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/mixtral-of-experts/ ) |
89
+ | Mistral | 7B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
90
+ | Nous-Hermes | 7B, 13B, 70B | NousResearch | [ Org page] ( https://huggingface.co/NousResearch ) |
91
+ | OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [ Geng & Liu 2023] ( https://github.com/openlm-research/open_llama ) |
92
+ | Phi | 1.3B, 2.7B | Microsoft Research | [ Li et al. 2023] ( https://arxiv.org/abs/2309.05463 ) |
92
93
| Platypus | 7B, 13B, 70B | Lee et al. | [ Lee, Hunter, and Ruiz 2023] ( https://arxiv.org/abs/2308.07317 ) |
93
- | Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [ Biderman et al. 2023] ( https://arxiv.org/abs/2304.01373 ) |
94
- | RedPajama-INCITE | 3B, 7B | Together | [ Together 2023] ( https://together.ai/blog/redpajama-models-v1 ) |
95
- | StableCode | 3B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stablecode-llm-generative-ai-coding ) |
96
- | StableLM | 3B, 7B | Stability AI | [ Stability AI 2023] ( https://github.com/Stability-AI/StableLM ) |
97
- | StableLM Zephyr | 3B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stablecode-llm-generative-ai-coding ) |
98
- | TinyLlama | 1.1B | Zhang et al. | [ Zhang et al. 2023] ( https://github.com/jzhang38/TinyLlama ) |
99
- | Vicuna | 7B, 13B, 33B | LMSYS | [ Li et al. 2023] ( https://lmsys.org/blog/2023-03-30-vicuna/ )
94
+ | Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [ Biderman et al. 2023] ( https://arxiv.org/abs/2304.01373 ) |
95
+ | RedPajama-INCITE | 3B, 7B | Together | [ Together 2023] ( https://together.ai/blog/redpajama-models-v1 ) |
96
+ | StableCode | 3B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stablecode-llm-generative-ai-coding ) |
97
+ | StableLM | 3B, 7B | Stability AI | [ Stability AI 2023] ( https://github.com/Stability-AI/StableLM ) |
98
+ | StableLM Zephyr | 3B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stablecode-llm-generative-ai-coding ) |
99
+ | TinyLlama | 1.1B | Zhang et al. | [ Zhang et al. 2023] ( https://github.com/jzhang38/TinyLlama ) |
100
+ | Vicuna | 7B, 13B, 33B | LMSYS | [ Li et al. 2023] ( https://lmsys.org/blog/2023-03-30-vicuna/ ) |
100
101
101
102
** Tip** : You can list all available models by running the ` litgpt download list ` command.
102
103
0 commit comments