Open
Listed in
Description
Here is how it would work:
- Add a module
from A import special_operation
that operates on numpy arrays and implements the operation using numpy arrays - Add custom type annotations
from A import memory_device_l1
and use it in annotating numpy arrays - Add a plugin that implements an ASR->ASR pass that transforms all these annotations and special operations into low level C / API calls for the specific hardware API
The module A
and the plugin (as an so library) will be shipped externally, not as part of LPython.
This will allow anybody to extend LPython to work for their custom hardware.
Activity
Smit-create commentedon Aug 9, 2023
I'll be interested in adding some small support using MSL (for Apple M1). And we can see what design requirements are needed. I need to find some good resources for learning MSL.
certik commentedon Aug 14, 2023
@Smit-create here is how llama.cpp uses Metal to use the GPU (I think) on M1: ggml-org/llama.cpp#2615, so let's figure out how to run their kernels and then how to generate them using LPython.
Here is another repository how to run Metal from C++: https://github.com/larsgeb/m1-gpu-cpp
certik commentedon Aug 30, 2023
The custom hardware backend will also be just a CPU with SIMD instructions. Annotating arrays to be able to write vectorized code using NumPy array, and the CPU/SIMD ASR backend will take it and ensure that correct LLVM code is generated, so that the final binary is using the CPU vector instructions and code runs at maximum speed.