Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Develop #442

Closed
wants to merge 63 commits into from
Closed

Develop #442

wants to merge 63 commits into from

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Nov 12, 2023

The pending PRs were interrelated, but I didn't want to leave main in a half-working state, so I've merged all the PRs into a new develop branch. The plan is to work on this branch and leave main in maintenance mode until this is ready.

Closes #365, closes #403, closes #439, closes #77.

This integrates:

  • a GGML version upgrade
  • GGUF support
  • BERT support
  • APIs for context-shuffling

This is the to-do list:

  • Update to the latest GGML
  • Fix CUDA inference
  • Fix OpenCL inference
  • Fix Metal inference
  • Fix the embedded tokenizer
  • Readd quantisation
  • Modularize the model definitions (i.e. move block inference to the block struct)
  • Fix models (ensure they're uncommented in llm):
    • Fix BLOOM
    • Fix GPT-NeoX
    • Fix Falcon
    • Fix GPT-2
    • Fix GPT-J
    • Fix MPT
    • Fix BERT
  • Remove the expects
  • Fix the TODOs

oppiliappan and others added 30 commits August 7, 2023 14:55
Co-authored-by: Lukas Kreussel <[email protected]>
Co-authored-by: Philpax <[email protected]>
* with some heavy caveats, see the PR
Build against newer GGML version
Add "context swap" functions to session and add "decoded_tokens" to snapshot read/write
@philpax philpax closed this Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why is the feed_prompt process so slow? Metal Prompt Feeding Support GGUF Swap strategy for infinite output
4 participants