Skip to content

Conversation

CharlieFRuan
Copy link
Member

@CharlieFRuan CharlieFRuan commented Aug 23, 2024

This PR adds the newly release Phi3.5-mini, adding the following model_ids to our prebuilt model list:

  • Phi-3.5-mini-instruct-q4f16_1-MLC (4k KVCache)
  • Phi-3.5-mini-instruct-q4f32_1-MLC (4k KVCache)
  • Phi-3.5-mini-instruct-q4f16_1-MLC-1k (1k KVCache)
  • Phi-3.5-mini-instruct-q4f16_1-MLC-1k (1k KVCache)

See mlc-ai/binary-mlc-llm-libs#136 for on which commits of TVM and MLC-LLM this is compiled with.

Note that Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can take advantage of in WebLLM by increasing ModelRecord.overrides.context_window_size or specifying it in ChatOptions when loading a model, as long as there is enough VRAM.

@CharlieFRuan CharlieFRuan merged commit 2639a80 into mlc-ai:main Aug 23, 2024
1 check passed
CharlieFRuan added a commit that referenced this pull request Aug 23, 2024
### Change
- #555

### TVMjs
- Updated to current head:
apache/tvm@1518008
  - Main change is apache/tvm#17251
- This is needed for WASMs compiled after
apache/tvm#17257 is merged (e.g. Phi-3.5). TVM
global functions that returns bool need this PR to run correctly (e.g.
`AcceptToken()` in BNFGrammar) in runtime.
- However, these are backward compatible to WASMs compiled prior to this
PR. Tested with Phi-3 (old WASM) running grammar.
jingyi-zhao-01 pushed a commit to jingyi-zhao-01/web-llm that referenced this pull request Dec 8, 2024
This PR adds the newly release Phi3.5-mini, adding the following
`model_id`s to our prebuilt model list:
- `Phi-3.5-mini-instruct-q4f16_1-MLC` (4k KVCache)
- `Phi-3.5-mini-instruct-q4f32_1-MLC` (4k KVCache)
- `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache)
- `Phi-3.5-mini-instruct-q4f16_1-MLC-1k` (1k KVCache)

See mlc-ai/binary-mlc-llm-libs#136 for on which
commits of TVM and MLC-LLM this is compiled with.

Note that Phi-3.5-mini comes with support up to 128K context (unlike
Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM
supports, which you can take advantage of in WebLLM by increasing
`ModelRecord.overrides.context_window_size` or specifying it in
`ChatOptions` when loading a model, as long as there is enough VRAM.
jingyi-zhao-01 pushed a commit to jingyi-zhao-01/web-llm that referenced this pull request Dec 8, 2024
### Change
- mlc-ai#555

### TVMjs
- Updated to current head:
apache/tvm@1518008
  - Main change is apache/tvm#17251
- This is needed for WASMs compiled after
apache/tvm#17257 is merged (e.g. Phi-3.5). TVM
global functions that returns bool need this PR to run correctly (e.g.
`AcceptToken()` in BNFGrammar) in runtime.
- However, these are backward compatible to WASMs compiled prior to this
PR. Tested with Phi-3 (old WASM) running grammar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant