Skip to content

Pytorch ROCm Init is slow (3+ minutes) #1410

Closed
@Delaunay

Description

@Delaunay

After investigation of ROCm/pytorch#1232
we found the root cause of the issue to be https://github.com/pytorch/builder/blob/main/common/install_rocm_drm.sh#L100

  • The code looks for "amdgpu.ids" at the root of the executable (removing /bin/exec) (i.e /path/to/bin/python => /path/to)
  1. The check_for_location_of_amdgpuids always returns 0 so the search is never cancelled once the first file is found.
  2. because the search is never cancelled the entire python installation is walked through
  3. the file is not even installed close to the python installation so looking there is pointless anyway
  4. the amgpu.ids is just a mapping of product id to marketing names; the features is far from critical but takes a significant amount of time for no reasons.
  5. other part of the code will directly load /opt/amdgpu/share/libdrm/amdgpu.ids so why bother looking for it if we know where it is.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions