-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
performanceSpeed related topicsSpeed related topicsrefactoringRefactoringRefactoringroadmapPart of a roadmap projectPart of a roadmap project
Description
This is a big one
The only reason we use BLAS is that we don't have efficient implementation of matrix x matrix
multiplication. Naively doing parallel dot products is not optimal. We need to implement some of the fundamental GEMM optimizations such as block tiling and we need to implement this in a compact way that reuses the existing dot product code and supports all quantization types
More comments on this:
Green-Sky, lin72h, daniandtheweb, mirek190, cmp-nct and 15 moreLostRuins, mqy, CRD716, stelf, teleprint-me and 1 more
Metadata
Metadata
Assignees
Labels
performanceSpeed related topicsSpeed related topicsrefactoringRefactoringRefactoringroadmapPart of a roadmap projectPart of a roadmap project