Skip to content

Improve performance of reduce sum for 3D shapes #1785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 126 commits into
base: release/2.4
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
16db906
Updates to build on Jammy
pruthvistony Jul 28, 2023
ca7fcb9
Set ROCM_PATH ENV in Centos docker container
pruthvistony Aug 13, 2023
1e13b02
[UB22.04] Updates to support latest scipy
pruthvistony Aug 18, 2023
7132452
Updated condition for libstc++ for Jammy
pruthvistony Sep 5, 2023
e942f9c
Fix ROCm installation failure in Ubuntu22.04 (#1285)
WBobby Sep 18, 2023
f548074
Build required version of libpng for CentOS7
pruthvistony Aug 18, 2023
fbfb835
Changes to support docker v23
pruthvistony Sep 25, 2023
a73847b
Skipped certain distributed tests (#1383)
BLOrange-AMD Apr 17, 2024
7727031
temporarily ignore certificate check for Miniconda
yanyao-wang Apr 12, 2024
6e9aa77
[release/2.1] Skip certificate check for CentOS7 since certificate ex…
jithunnair-amd Apr 24, 2024
518a0a7
release/2.2 triton commit pin for rocm6.1 conditionalisation (#1369)
jataylo Mar 16, 2024
b126f41
Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION withou…
jithunnair-amd May 2, 2024
c2c73d7
[release/2.3] Include ROCm patch version unconditionally in triton ve…
jithunnair-amd May 9, 2024
9759cfb
Removing logic added by upstream to set PYTORCH_EXTRA_INSTALL_REQUIRE…
jithunnair-amd Jun 3, 2024
5366b62
increase tensor size to force out of memory exception on MI300X (#1450)
dnikolaev-amd Jul 8, 2024
f66244b
Fix lxml requirement (#1454)
jithunnair-amd Jul 9, 2024
c25bf0c
[release/2.3] [SWDEV-469514] hipGraphExecDestroy requires an explicit…
pragupta Jul 12, 2024
2827617
[release/2.2] cudagraph explicit sync only after capture_begin() (#14…
dnikolaev-amd Aug 1, 2024
f57dc6d
SWDEV-472761: limit sympy version to 1.12.1 or lower (#1482)
dnikolaev-amd Jul 19, 2024
4f5b0b3
[release2.3] fix test_vmapvjpvjp and skip test_profiler_experimental_…
dnikolaev-amd Jul 26, 2024
bc99aff
Update requirements.txt (#1498)
jithunnair-amd Jul 28, 2024
5144034
Use Triton commit with TRITON_LLVM_TARBALL_PATH changes
dnikolaev-amd Aug 8, 2024
f81a9ac
related_commits for release/2.4
dnikolaev-amd Aug 8, 2024
07ce684
Merge pull request #1525 from ROCm/IFU_rel_2.4
pruthvistony Aug 9, 2024
dbbe001
Update lib paths for Almalinux/Manylinux2_28 and remove ROCm<6.0 support
jithunnair-amd Aug 10, 2024
af2171b
Undo inadvertent introduction of scripts/amd/setup_rocm_libs.sh call …
jithunnair-amd Aug 11, 2024
b35eecf
Use triton with TRITON_BUILD_PROTON fix
jithunnair-amd Aug 15, 2024
90f6c49
[MPS][TYPE_PROMOTION] Fix Clamp (#133260)
pytorchbot Aug 13, 2024
4d895a2
[Doc] update guide install mkl-static from conda to pip (#133328)
pytorchbot Aug 14, 2024
f19d678
fix for launching kernel invalid config error when calling embedding …
pytorchbot Aug 14, 2024
bf5d809
Fix recent build error on ppc64le (#133416)
pytorchbot Aug 14, 2024
c4d3553
[release/2.4] Use rocm6.2 aotriton shared library tarball and reduce …
jithunnair-amd Aug 16, 2024
a37bb27
Triton LLVM server moved to public oaitriton server
pruthvistony Aug 16, 2024
d299124
Aotriton commit update to handle triton llvm error
pruthvistony Aug 16, 2024
655b699
AOTriton commit update
pruthvistony Aug 17, 2024
02c8fdc
Updated AOTriton commit to branch internal-ci/0.6
pruthvistony Aug 19, 2024
29fc9ff
nccl dump wait delay (#1540)
ramcherukuri Aug 20, 2024
62a6fd5
[release/2.4] update torchvision in related_commits (#1546)
dnikolaev-amd Aug 20, 2024
62e166f
[release/2.4] Use shared library tarball for aotriton for CI base doc…
jithunnair-amd Aug 21, 2024
d5608f3
[release/2.4] pin sympy==1.12.1 and skip pytorch-nightly installation…
dnikolaev-amd Aug 22, 2024
3b54c45
[release/2.4] skip test_typing if numpy less then 1.21 (#1563)
dnikolaev-amd Aug 28, 2024
a5eb9d7
[release/2.4] Pinned versions for required packages (#1578)
iupaikov-amd Sep 9, 2024
7e5ac3e
[ROCm] Prevent accidental enablement of efficient attention. (#134531…
xinyazhang Sep 9, 2024
1488da9
Revert "[release/2.4] Pinned versions for required packages (#1578)"
pruthvistony Sep 9, 2024
f4c8ad5
[release/2.4] [ROCM] Properly disable Flash Attention/Efficient Atten…
xinyazhang Sep 11, 2024
4360582
[ROCm] slow torch.sum optimization by increasing max_values_per_threa…
jerrymannil Sep 12, 2024
c1b6f60
[ROCm] torch.sum optimization by increasing min_values_per_thread (#1…
jerrymannil Sep 13, 2024
83504b7
Backport AOTriton 0.7b to ROCm/PyTorch's release/2.4 branch (#1587)
xinyazhang Sep 13, 2024
d2d58fd
Enable AOTriton kernel compression
pruthvistony Sep 16, 2024
e0f6b99
[release/2.4] add tlparse into requirements-ci.txt (#1608) (#1614)
dnikolaev-amd Oct 4, 2024
f6753da
SWDEV-487470 : Move philox arguments to GPU and properly enable ME at…
xinyazhang Oct 4, 2024
f89f62b
Changes to support clang-19
pruthvistony Jul 22, 2024
7507e5c
Change the pinned version to AOTriton 0.7.1b (#1621)
xinyazhang Oct 7, 2024
2c67900
llvm update for backward-breaking APIs in 18 and 19
jeffdaily Oct 7, 2024
837ece3
[release/2.4] Fixed error string assertion in test_invalid_devices (#…
iupaikov-amd Oct 8, 2024
1024f36
[ROCm] Use IPT=8 for block radix sort (#1619) (#1636)
jerrymannil Oct 18, 2024
d407a5b
HIPCC compilations should also respect GLIBCXX_USE_CXX11_ABI flag (#1…
jithunnair-amd Oct 19, 2024
6bfff3d
Performance tuning sum reduce for 1D and 2D tensors (#1646)
doru1004 Oct 22, 2024
8fc55a8
[ROCm] Enabling vector length 8 for vectorized elementwise kernel (#1…
jerrymannil Oct 23, 2024
e671542
Changes to support UB 24.04 build (#1633)
pruthvistony Oct 23, 2024
d710d40
[ROCm]: Fix torch cuda device to consider CUDA_VISIBLE_DEVICES (#1650)
jagadish-amd Oct 24, 2024
0436e08
[ROCM] Enable *_load_dwordx4 ISA for BFloat16 and Half. (#1638)
carlobertolli Oct 28, 2024
de3e990
[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs (#128750)
jerrymannil Jun 19, 2024
43fe1b6
[release/2.4][ROCm] Fix largeIndexBlockSize (#1656)
pragupta Oct 30, 2024
7e8d2d3
[release/2.4] ModuleTracker: Add explicit garbage collection (#1660)
pragupta Oct 30, 2024
1b080dd
Remove vector length of 8 for half precision types for vectorized ele…
jerrymannil Oct 30, 2024
8818a4e
[ROCm] Set thread work size to 8 for elementwise kernels (#1670)
jerrymannil Nov 4, 2024
289d76c
[release/2.4] Enabled forced padding for multiple tests in test_pad_m…
iupaikov-amd Nov 5, 2024
ea09b0e
[release/2.4] Skipped test_slice_mm_bandwidth_computation on big gpus…
iupaikov-amd Nov 5, 2024
c7c258b
[release/2.4] Skipped test_slice_mm_bandwidth_computation and test_pa…
iupaikov-amd Nov 5, 2024
2d506f9
[ROCm] Enable vector size for 8 for half precision types in elementwi…
jerrymannil Nov 5, 2024
785f693
[release/2.4] TunableOp use dense size calculations as minimum sizes …
jeffdaily Nov 5, 2024
d610bbd
Fix 491578 issue - socket_power metric AMDSMI and clockrate function …
amd-sriram Nov 5, 2024
d8b5817
[ROCm] Add int4 support (#129710) (#1676)
jerrymannil Nov 5, 2024
619b266
[ROCm] fastSpecializedAtomicAdd for MI300 (#135770) (#1677)
jerrymannil Nov 5, 2024
a849579
updated the correct assertion checks for the counters in test_pointwi…
amd-sriram Nov 7, 2024
4eb4f88
[release/2.4][SWDEV-496633] Fix bad merge (#1682)
pragupta Nov 7, 2024
95a4311
[release/2.4] [NO CP] Warp size fix for navi arch (#1684)
iupaikov-amd Nov 8, 2024
1eb7ae2
[release/2.4] [NO CP] Added missing library paths for aot_mode (#1685)
iupaikov-amd Nov 8, 2024
4994fa9
[release/2.4][SWDEV-493252] Fix for test_c10d_nccl.py:test_comm_split…
pragupta Nov 8, 2024
ffca70b
[release/2.4] AOTriton Build Refactor (#1691)
ethanwee1 Nov 12, 2024
07a6874
Revert "[ROCm] Use IPT=8 for block radix sort (#1619) (#1636)"
pruthvistony Nov 13, 2024
634b544
Added skipIfRocmArch function (#1693)
BLOrange-AMD Nov 13, 2024
31e58f8
[release/2.4] Skipped some inductor tests for no hipcc rocm environme…
iupaikov-amd Nov 13, 2024
48e059d
[Release/2.4] Fix 493250 - Fix distributed unit tests that fail on 4 …
amd-sriram Nov 14, 2024
840589a
Revert "[ROCm] Enable vector size for 8 for half precision types in e…
pruthvistony Nov 14, 2024
d802906
Set thread work size to 4 for elementwise kernels similar to cuda
pruthvistony Nov 14, 2024
f29b589
[UPCP][release/2.4] TunableOp fix for batched MM with views. (#1723)
naromero77amd Nov 15, 2024
3db4474
[release/2.4] Skip failed unit tests in inductor/test_kernel_benchmar…
BLOrange-AMD Nov 15, 2024
f3ec3d1
[NOCP][release/2.4] Skip failed unit tests in inductor/test_binary_fo…
BLOrange-AMD Nov 15, 2024
c060a2f
[release/2.4] Skip failed unit tests in functorch/test_ops.py (#1720)
BLOrange-AMD Nov 15, 2024
f705d33
[NOCP][release/2.4] Skip failed unit tests in test_expanded_weights.p…
BLOrange-AMD Nov 15, 2024
f2355b1
[release/2.4] Skip failed unit tests in test_cuda.py (#1718)
BLOrange-AMD Nov 15, 2024
02114fa
[NOCP][release/2.4] skip failed tests in test_torch.py (#1717)
dnikolaev-amd Nov 15, 2024
9a974de
[Release/2..4] Fix 493250 tests - make barrier() block cpu and pass t…
amd-sriram Nov 15, 2024
b8dd07d
[release/2.4] set hipblas workspace (#138791) (#1716)
jeffdaily Nov 15, 2024
e93fe6c
[NOCP][release/2.4] Skip stress related distributed tests (#1713)
BLOrange-AMD Nov 15, 2024
301ce3e
[NOCP][release/2.4] Skip failed unit tests in nn/test_convolution and…
BLOrange-AMD Nov 15, 2024
d2ebf79
[release/2.4] skip failed tests in test_transformers.py (#1708)
dnikolaev-amd Nov 15, 2024
859e498
[release/2.4] skip failed tests in test_ops.py (#1705)
dnikolaev-amd Nov 15, 2024
d702690
[release/2.4] skip failed tests in test_nn.py (#1704)
dnikolaev-amd Nov 15, 2024
933a057
[release/2.4] skip failed tests in test_modules.py (#1703)
dnikolaev-amd Nov 15, 2024
2613ad0
[release/2.4] skip failed tests in test_matmul_cuda.py (#1702)
dnikolaev-amd Nov 15, 2024
711751e
[release/2.4] skip failed tests in test_jit_fuser_te.py (#1700)
dnikolaev-amd Nov 15, 2024
6e06754
[release/2.4] skip failed tests in test_jit.py (jit/test_freezing.py)…
dnikolaev-amd Nov 15, 2024
5672206
[release/2.4] Skipped equivalent_template_code in test_benchmark_fusi…
iupaikov-amd Nov 15, 2024
794a821
[ROCm] Set thread work size to 8 for elementwise kernels (#1734)
jerrymannil Nov 19, 2024
0929149
[Release/2.4] Unit test - test_multihead_self_attn_two_masks_fast_pat…
amd-sriram Nov 21, 2024
8f9b9d3
[ROCm] Enable vector size for 8 for half precision types in elementwi…
jerrymannil Nov 21, 2024
efd0e5a
[release/2.4] Fixed two tests failing in test_pad_mm (#1736)
iupaikov-amd Nov 22, 2024
86006a9
Update related_commits to pick up apex version update
jithunnair-amd Nov 23, 2024
3a062e8
[release/2.4] Skip failed unit tests in test_ops.py (#1744)
BLOrange-AMD Nov 26, 2024
b1bd936
[NOCP][release/2.4] Skip failed unit tests in distributed/pipelining/…
BLOrange-AMD Nov 26, 2024
72512a7
[NOCP][release/2.4] Skip failed unit tests in distributed/fsdp/test_f…
BLOrange-AMD Nov 26, 2024
5e7c476
[NOCP][release/2.4] Skip failed unit tests in distributed/_composable…
BLOrange-AMD Nov 26, 2024
3d9f31c
[NOCP][release/2.4] Skip failed unit tests in distributed/test_c10d_n…
BLOrange-AMD Nov 26, 2024
6e1c879
[NOCP][release/2.4] Skip failed unit tests in distributed/fsdp/test_f…
BLOrange-AMD Nov 26, 2024
b3cd8a5
[NOCP][release/2.4] Skip failed unit tests in test_fully_shard_traini…
BLOrange-AMD Nov 26, 2024
092b9b6
[release/2.4] Added rocm specific logic to is_big_gpu check for induc…
iupaikov-amd Nov 27, 2024
7e0c404
[release/2.4] [NO CP] Removed test skips for navi caused by is_big_gp…
iupaikov-amd Dec 3, 2024
ebb61f3
[release/2.4] Excluded cpp/test_tensorexpr for whl builds (#1757)
iupaikov-amd Dec 4, 2024
95a4c16
[release/2.4] fix test_pointwise_op_fusion_post_grad (#1763)
dnikolaev-amd Dec 4, 2024
b4b81bd
[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_…
amd-sriram Dec 5, 2024
579c159
[release/2.4] AMDSMI/layernorm cherry picks (#1765)
jataylo Dec 6, 2024
f0a620f
[release/2.4] [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead…
mhalk Dec 9, 2024
1556dce
Improve performance of reduce sum for 3D shapes
doru1004 Dec 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions .ci/docker/aotriton_version.txt

This file was deleted.

9 changes: 8 additions & 1 deletion .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ if [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04
elif [[ "$image" == *-noble* ]]; then
UBUNTU_VERSION=24.04
elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
Expand Down Expand Up @@ -452,10 +454,15 @@ if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
fi
fi

DOCKER_PROGRESS="--progress=plain"
if [[ "${DOCKER_BUILDKIT}" == 0 ]]; then
DOCKER_PROGRESS=""
fi

# Build image
docker build \
--no-cache \
--progress=plain \
${DOCKER_PROGRESS} \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
Expand Down
9 changes: 2 additions & 7 deletions .ci/docker/centos-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ RUN rm install_rocm_magma.sh
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
RUN rm install_amdsmi.sh

ENV ROCM_PATH /opt/rocm
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
Expand Down Expand Up @@ -113,13 +115,6 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt

# Install AOTriton (Early fail)
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton

# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton-rocm.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
21eae954efa5bf584da70324b640288c3ee7aede
75cc27c26a88b4edbcd11671a8aa524b65478d46
14 changes: 14 additions & 0 deletions .ci/docker/common/cache_vision_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,20 @@

set -ex

# Skip pytorch-nightly installation in docker images
# Installation of pytorch-nightly is needed to prefetch mobilenet_v2 avd v3 models for some tests.
# Came from https://github.com/ROCm/pytorch/commit/85bd6bc0105162293fa0bbfb7b661f85ec67f85a
# Models are downloaded on first use to the folder /root/.cache/torch/hub
# But pytorch-nightly installation also overrides .ci/docker/requirements-ci.txt settings
# and upgrades some of python packages (sympy from 1.12.0 to 1.13.0)
# which causes several 'dynamic_shapes' tests to fail
# Skip prefetching models affects these tests without any errors:
# python test/mobile/model_test/gen_test_model.py mobilenet_v2
# python test/quantization/eager/test_numeric_suite_eager.py -k test_mobilenet_v3
# Issue https://github.com/ROCm/frameworks-internal/issues/8772
echo "Skip torch-nightly installation"
exit 0

source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

# Cache the test models at ~/.cache/torch/hub/
Expand Down
4 changes: 4 additions & 0 deletions .ci/docker/common/common_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ conda_install() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
}

conda_install_through_forge() {
as_jenkins conda install -c conda-forge -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
}

conda_run() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION --no-capture-output $*
}
Expand Down
23 changes: 0 additions & 23 deletions .ci/docker/common/install_aotriton.sh

This file was deleted.

39 changes: 39 additions & 0 deletions .ci/docker/common/install_base.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ install_ubuntu() {
elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
cmake3="cmake=3.22*"
maybe_libiomp_dev=""
elif [[ "$UBUNTU_VERSION" == "24.04"* ]]; then
cmake3="cmake=3.28*"
maybe_libiomp_dev=""
else
cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev"
Expand Down Expand Up @@ -82,11 +85,42 @@ install_ubuntu() {
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30

# Required to install the fortran after gcc update
if [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
apt autoremove -y gfortran
apt-get update -y
apt-get install -y gfortran libopenblas-dev
fi

# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}

build_libpng() {
# install few packages
yum install -y zlib zlib-devel

LIBPNG_VERSION=1.6.37

mkdir -p libpng
pushd libpng

wget http://download.sourceforge.net/libpng/libpng-$LIBPNG_VERSION.tar.gz
tar -xvzf libpng-$LIBPNG_VERSION.tar.gz

pushd libpng-$LIBPNG_VERSION

./configure
make
make install

popd

popd
rm -rf libpng
}

install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
Expand Down Expand Up @@ -123,6 +157,11 @@ install_centos() {
unzip \
gdb

# CentOS7 doesnt have support for higher version of libpng,
# so it is built from source.
# Libpng is required for torchvision build.
build_libpng

# Cleanup
yum clean all
rm -rf /var/cache/yum
Expand Down
14 changes: 13 additions & 1 deletion .ci/docker/common/install_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ fi
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

pushd /tmp
wget -q "${BASE_URL}/${CONDA_FILE}"
if [ -n $CENTOS_VERSION ] && [[ $CENTOS_VERSION == 7.* ]]; then
NO_CHECK_CERTIFICATE_FLAG="--no-check-certificate"
fi
wget -q "${BASE_URL}/${CONDA_FILE}" ${NO_CHECK_CERTIFICATE_FLAG}
# NB: Manually invoke bash per https://github.com/conda/conda/issues/10431
as_jenkins bash "${CONDA_FILE}" -b -f -p "/opt/conda"
popd
Expand Down Expand Up @@ -110,6 +113,15 @@ fi
conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
fi

# Install required libstdc++.so.6 version
if [ "$ANACONDA_PYTHON_VERSION" = "3.10" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.9" ] ; then
conda_install_through_forge libstdcxx-ng=12
fi

if [ "$ANACONDA_PYTHON_VERSION" = "3.12" ] || [ "$UBUNTU_VERSION" == "24.04"* ] ; then
conda_install_through_forge libstdcxx-ng=14
fi

# Install some other packages, including those needed for Python test reporting
pip_install -r /opt/conda/requirements-ci.txt

Expand Down
5 changes: 5 additions & 0 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ install_ubuntu() {
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
fi
if [[ $UBUNTU_VERSION == 22.04 ]] || [[ $UBUNTU_VERSION == 24.04 ]]; then
apt-get install -y --no-install-recommends gpg-agent
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
fi
apt-get install -y kmod
apt-get install -y wget

Expand Down
7 changes: 7 additions & 0 deletions .ci/docker/common/install_user.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

set -ex

# Since version 24 the system ships with user 'ubuntu' that has id 1000
# We need a work-around to enable id 1000 usage for this script
if [[ $UBUNTU_VERSION == 24.04 ]]; then
# touch is used to disable harmless error message
touch /var/mail/ubuntu && chown ubuntu /var/mail/ubuntu && userdel -r ubuntu
fi

# Mirror jenkins user in container
# jenkins user as ec2-user should have the same user-id
echo "jenkins:x:1000:1000::/var/lib/jenkins:" >> /etc/passwd
Expand Down
16 changes: 15 additions & 1 deletion .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@ click
#Pinned versions:
#test that import:

sympy==1.12.1
#Description: Python library for symbolic mathematics
# installed before coremltools to avoid installation of greater sympy version
#Pinned versions: 1.12.1
#test that import:

coremltools==5.0b5 ; python_version < "3.12"
coremltools==7.2 ; python_version == "3.12"
#Description: Apple framework for ML integration
#Pinned versions: 5.0b5
#test that import:
Expand Down Expand Up @@ -58,6 +65,7 @@ lark==0.12.0
#test that import:

librosa>=0.6.2 ; python_version < "3.11"
librosa==0.10.2 ; python_version == "3.12"
#Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2
#test that import: test_spectral_ops.py
Expand Down Expand Up @@ -106,6 +114,7 @@ networkx==2.8.8
numba==0.49.0 ; python_version < "3.9"
numba==0.54.1 ; python_version == "3.9"
numba==0.55.2 ; python_version == "3.10"
numba==0.60.0 ; python_version == "3.12"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py
Expand Down Expand Up @@ -247,6 +256,11 @@ tb-nightly==2.13.0a20230426
#Pinned versions:
#test that import:

tlparse==0.3.7
#Description: parse logs produced by torch.compile
#Pinned versions:
#test that import: dynamo/test_structured_trace.py

# needed by torchgen utils
typing-extensions
#Description: type hints for python
Expand Down Expand Up @@ -306,7 +320,7 @@ pywavelets==1.5.0 ; python_version >= "3.12"
#Pinned versions: 1.4.1
#test that import:

lxml==5.0.0.
lxml==5.0.0
#Description: This is a requirement of unittest-xml-reporting

# Python-3.9 binaries
Expand Down
7 changes: 0 additions & 7 deletions .ci/docker/ubuntu-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,6 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt

# Install AOTriton
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton

# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
Expand Down
32 changes: 3 additions & 29 deletions .circleci/scripts/binary_populate_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ export TZ=UTC
tagged_version() {
GIT_DIR="${workdir}/pytorch/.git"
GIT_DESCRIBE="git --git-dir ${GIT_DIR} describe --tags --match v[0-9]*.[0-9]*.[0-9]*"
if [[ ! -d "${GIT_DIR}" ]]; then
if [[ -n "${CIRCLE_TAG:-}" ]]; then
echo "${CIRCLE_TAG}"
elif [[ ! -d "${GIT_DIR}" ]]; then
echo "Abort, abort! Git dir ${GIT_DIR} does not exists!"
kill $$
elif ${GIT_DESCRIBE} --exact >/dev/null; then
Expand Down Expand Up @@ -71,34 +73,6 @@ fi

export PYTORCH_BUILD_NUMBER=1

# Set triton version as part of PYTORCH_EXTRA_INSTALL_REQUIREMENTS
TRITON_VERSION=$(cat $PYTORCH_ROOT/.ci/docker/triton_version.txt)

# Here PYTORCH_EXTRA_INSTALL_REQUIREMENTS is already set for the all the wheel builds hence append TRITON_CONSTRAINT
TRITON_CONSTRAINT="platform_system == 'Linux' and platform_machine == 'x86_64' and python_version < '3.13'"
if [[ "$PACKAGE_TYPE" =~ .*wheel.* && -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then
TRITON_REQUIREMENT="triton==${TRITON_VERSION}; ${TRITON_CONSTRAINT}"
if [[ -n "$PYTORCH_BUILD_VERSION" && "$PYTORCH_BUILD_VERSION" =~ .*dev.* ]]; then
TRITON_SHORTHASH=$(cut -c1-10 $PYTORCH_ROOT/.ci/docker/ci_commit_pins/triton.txt)
TRITON_REQUIREMENT="pytorch-triton==${TRITON_VERSION}+${TRITON_SHORTHASH}; ${TRITON_CONSTRAINT}"
fi
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS} | ${TRITON_REQUIREMENT}"
fi

# Set triton via PYTORCH_EXTRA_INSTALL_REQUIREMENTS for triton rocm package
if [[ "$PACKAGE_TYPE" =~ .*wheel.* && -n "$PYTORCH_BUILD_VERSION" && "$PYTORCH_BUILD_VERSION" =~ .*rocm.* && $(uname) == "Linux" ]]; then
TRITON_REQUIREMENT="pytorch-triton-rocm==${TRITON_VERSION}; ${TRITON_CONSTRAINT}"
if [[ -n "$PYTORCH_BUILD_VERSION" && "$PYTORCH_BUILD_VERSION" =~ .*dev.* ]]; then
TRITON_SHORTHASH=$(cut -c1-10 $PYTORCH_ROOT/.ci/docker/ci_commit_pins/triton-rocm.txt)
TRITON_REQUIREMENT="pytorch-triton-rocm==${TRITON_VERSION}+${TRITON_SHORTHASH}; ${TRITON_CONSTRAINT}"
fi
if [[ -z "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${TRITON_REQUIREMENT}"
else
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS} | ${TRITON_REQUIREMENT}"
fi
fi

JAVA_HOME=
BUILD_JNI=OFF
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
Expand Down
16 changes: 7 additions & 9 deletions .github/scripts/amd/package_triton_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,12 @@ fi
# Remove packaged libs and headers
rm -rf $TRITON_ROCM_DIR/include/*

LIBTINFO_PATH="/usr/lib64/libtinfo.so.5"
OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`
if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then
LIBTINFO_PATH="/usr/lib64/libtinfo.so.5"
else
LIBTINFO_PATH="/usr/lib64/libtinfo.so.6"
fi
LIBNUMA_PATH="/usr/lib64/libnuma.so.1"
LIBELF_PATH="/usr/lib64/libelf.so.1"

Expand All @@ -45,16 +50,9 @@ do
cp $lib $TRITON_ROCM_DIR/lib/
done

# Required ROCm libraries
if [[ "${MAJOR_VERSION}" == "6" ]]; then
libamdhip="libamdhip64.so.6"
else
libamdhip="libamdhip64.so.5"
fi

# Required ROCm libraries - ROCm 6.0
ROCM_SO=(
"${libamdhip}"
"libamdhip64.so.6"
"libhsa-runtime64.so.1"
"libamd_comgr.so.2"
"libdrm.so.2"
Expand Down
Loading