-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
Description
Currently, we store separate tensors for each expert:
This leads to large number of possible "source" tensors for the _id
ops which increases significantly the size of struct ggml_tensor
on the stack:
Additionally, the Metal implementation is currently hacked to support up to 8 experts and extension to more than that is not completely obvious:
We should improve this, with one possible way being to store the data for the experts into a single tensor and address is with appropriate offsets
Jeximo, mtasic85, maziyarpanahi, FrederikAbitz and Martin1991zab