Skip to content

Split off the CPU implementation #224

Closed
@maleadt

Description

@maleadt

Currently NNlib.jl provides both the abstract interface and a CPU implementation of its functions, which is becoming a problem now that NNlib.jl depends on LoopVectorization.jl. I think the package may need to be split in an abstract interface and a CPU implementation, e.g., like AbstractFFTs/FFTW.jl. Such a split need not hurt usability, like the FFTW example shows.

Case in point, installing CUDA.jl (which implements the NNlib.jl interfaces for use with CuArrays) requires the following additional dependencies when integrating with NNlib: CpuId, SIMDPirates, DocStringExtensions, OffsetArrays, SLEEFPirates, LoopVectorization, VectorizationBase, NNPACK_jll, UnPack. The JLL is annoying, but okay. The fact that there's so many packages however causes the time to precompile CUDA.jl to increase by a whopping 50%, from 20s to 30s, as measured with hyperfine:

hyperfine 'julia -e "Base.compilecache(Base.PkgId(Base.UUID(\"052768ef-5323-5732-b1bb-66c8b64840ba\"), \"CUDA\"))"'

I'm not familiar enough with the ML stack / NNlib.jl to figure out how that would exactly look like, but I do think we can improve things here. I'd rather not @requires the NNLib integration in CUDA.jl and lose semver tracking etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions