Split off the CPU implementation

Currently NNlib.jl provides both the abstract interface and a CPU implementation of its functions, which is becoming a problem now that NNlib.jl depends on LoopVectorization.jl. I think the package may need to be split in an abstract interface and a CPU implementation, e.g., like AbstractFFTs/FFTW.jl. Such a split need not hurt usability, like the FFTW example shows.

Case in point, installing CUDA.jl (which implements the NNlib.jl interfaces for use with CuArrays) requires the following additional dependencies when integrating with NNlib: CpuId, SIMDPirates, DocStringExtensions, OffsetArrays, SLEEFPirates, LoopVectorization, VectorizationBase, NNPACK_jll, UnPack. The JLL is annoying, but okay. The fact that there's so many packages however causes the time to precompile CUDA.jl to increase by a whopping 50%, from 20s to 30s, as measured with `hyperfine`:

```
hyperfine 'julia -e "Base.compilecache(Base.PkgId(Base.UUID(\"052768ef-5323-5732-b1bb-66c8b64840ba\"), \"CUDA\"))"'
```

I'm not familiar enough with the ML stack / NNlib.jl to figure out how that would exactly look like, but I do think we can improve things here. I'd rather not `@requires` the NNLib integration in CUDA.jl and lose semver tracking etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Split off the CPU implementation #224

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Split off the CPU implementation #224

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions