Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
ff40ece
Refactor CMakeLists.txt to set default build type and manage dependen…
LeiWang1999 Jun 13, 2025
9c2d9a3
Update default value of `check_well_formed` parameter in `prim_func` …
LeiWang1999 Jun 13, 2025
266ebf7
Add StorageRewrite function to transform module
LeiWang1999 Jun 13, 2025
1dd1279
Refactor null option handling in IR and layout inference
LeiWang1999 Jun 13, 2025
54da710
Update TVM subproject and refactor cluster planning and tile operatio…
LeiWang1999 Jun 13, 2025
b7a2ceb
Update annotation type in warp specialized test for consistency
LeiWang1999 Jun 13, 2025
2f93d63
Refactor test execution in warp specialized test
LeiWang1999 Jun 13, 2025
738a2d4
refactor
Hzfengsy Jun 24, 2025
6b571d1
[Enhancement] Add strict layout map for improved buffer layout infere…
LeiWang1999 Jun 24, 2025
4145e1c
[Example] Update examples to use @tilelang.jit (#597)
Cunxiao2002 Jun 25, 2025
6798ab0
[Enhancement] Refine error messaging in LowerBulkCopy for global and …
LeiWang1999 Jun 26, 2025
2968567
[Enhancement] Introduce PassConfig `TL_ENABLE_AGGRESSIVE_SHARED_MEMOR…
LeiWang1999 Jun 26, 2025
2a7755b
[Refactor] Update accumulation handling in gemm_sm90.h (#603)
LeiWang1999 Jun 27, 2025
09795e1
[Enhancement] Add tma bulk copy. (#600)
cherichy Jun 27, 2025
4c67b92
[Bugfix] Fixed mha_bwd shape inconsistency error (#604)
Nathancgy Jun 30, 2025
126fb09
lint fix
LeiWang1999 Jun 30, 2025
03d83d6
Update requirements-lint.txt to maintain clang-format version consist…
LeiWang1999 Jun 30, 2025
38372c9
[Bugfix] Avoid duplicate data access when cross thread buffer meet re…
LeiWang1999 Jun 30, 2025
cde4cb8
[Enhancement] Support tf32 gemm_rs (#607)
LeiWang1999 Jul 1, 2025
e3cc3d5
[Enhancement] Introduce option `TL_DISABLE_FAST_MATH` and `TL_ENABLE_…
LeiWang1999 Jul 2, 2025
67e1d63
fix build
Hzfengsy Jul 3, 2025
d7d4ac1
[Experimental][Language] add `T.GEMM_SP` for sm90 sparse tensor core …
botbw Jul 3, 2025
b5b768d
[Doc] Phaseout Legacy documentations (#610)
LeiWang1999 Jul 4, 2025
6aabdc4
[Refactor] Phaseout Pass ParallelLoopTransformer (#611)
LeiWang1999 Jul 4, 2025
2dce16c
fix build
Hzfengsy Jul 5, 2025
726fa8d
[Enhancement] Update ReduceOp initialization values for integer types…
LeiWang1999 Jul 8, 2025
6f4ef6f
Bump transformers from 4.50.0 to 4.51.0 in /examples/bitnet-1.58b (#615)
dependabot[bot] Jul 8, 2025
9ace736
[Refactor] refactor autotune examples (#617)
LeiWang1999 Jul 8, 2025
f74e3e8
lint fix
LeiWang1999 Jul 9, 2025
1aeb7c1
Bump transformers from 4.51.0 to 4.52.1 in /examples/bitnet-1.58b (#619)
dependabot[bot] Jul 9, 2025
39f003a
Fix PTXAS options flag in LibraryGenerator for consistency (#620)
LeiWang1999 Jul 9, 2025
bd8308e
Refactor FP8 type handling across multiple files to standardize usage…
LeiWang1999 Jul 9, 2025
471846a
[Refactor] Add parallel loop transform pass for condition extraction …
xxw-keju Jul 9, 2025
648e4b1
[Dev] Update linear attention examples to enhance performance on Hopp…
Rachmanino Jul 9, 2025
5ee9710
[Enhancement] Add ahead of time cython compilation in setup.py (#622)
LeiWang1999 Jul 10, 2025
608993f
[Enhancement] Support more flexible layout host pythonic expr (#623)
LeiWang1999 Jul 10, 2025
cd8f6bd
[Enhancement] support composable expression for shape with symbolic v…
LeiWang1999 Jul 10, 2025
4e7d760
🐍Fix the file name "test_exmaple_tilelang_nsa" (#629)
kadirnar Jul 12, 2025
bf5c1f6
[Enhancement] Add CPU utilization and count settings for Auto-Tuning …
LeiWang1999 Jul 12, 2025
03bea88
[AutoTune] Support `with set_autotune_inputs` to set auto tuning inpu…
LeiWang1999 Jul 13, 2025
fb5eeb8
[Pass] Introduce flag to diable cp async lowering (#633)
LeiWang1999 Jul 14, 2025
e02486e
fix typo (#635)
xiayuqing0622 Jul 15, 2025
0ffba21
[Pass][Simplify] Introduce symbolic level simplify for condition expr…
LeiWang1999 Jul 15, 2025
da91ee3
Enhance test coverage by adding LLVM requirement decorator to multipl…
LeiWang1999 Jul 15, 2025
86cda24
lint fix
LeiWang1999 Jul 15, 2025
42f4a16
Fix software pipeline stage annotation and update optional config han…
LeiWang1999 Jul 15, 2025
1f6f029
Add Python executable detection in CMake configuration and update TVM…
LeiWang1999 Jul 16, 2025
904d098
Update TVM submodule reference and refactor FFI registration to use s…
LeiWang1999 Jul 16, 2025
97992c6
Refactor attribute handling in layout and IR nodes to use reflection …
LeiWang1999 Jul 17, 2025
61fab43
finish rebase
LeiWang1999 Jul 24, 2025
7142d07
tvm update
LeiWang1999 Jul 24, 2025
034c771
Refactor FFI registration across tilelang modules to use the updated …
LeiWang1999 Jul 24, 2025
7f37f4d
lint fix
LeiWang1999 Jul 24, 2025
633c543
Update TVM submodule reference and modify CUDA runtime argument handl…
LeiWang1999 Jul 24, 2025
6f10955
lint fix
LeiWang1999 Jul 24, 2025
b5c77ac
Refactor tensor data type references from "e4m3_float8" and "e5m2_flo…
LeiWang1999 Jul 24, 2025
1116daf
lint fix
LeiWang1999 Jul 24, 2025
ad680f6
Refactor forward_index initialization in Fragment class to default to…
LeiWang1999 Jul 24, 2025
cf550d4
test fix
LeiWang1999 Jul 25, 2025
85b57b4
lint fix
LeiWang1999 Jul 25, 2025
53335df
bugfix
LeiWang1999 Jul 27, 2025
43efb69
lint fix
LeiWang1999 Jul 27, 2025
faac5a2
reduce fix
LeiWang1999 Jul 28, 2025
16801bd
lint fix
LeiWang1999 Jul 28, 2025
12960a9
carver fix
LeiWang1999 Jul 28, 2025
99f7cbb
cast fix
LeiWang1999 Jul 28, 2025
e7d2661
Update submodule and enhance kernel launch functionality with optiona…
LeiWang1999 Jul 28, 2025
8d796cb
lint fix
LeiWang1999 Jul 28, 2025
273e1ce
bugfix
LeiWang1999 Jul 28, 2025
8247d2a
Refactor test execution in test_tilelang_cpu_gemm.py and enhance devi…
LeiWang1999 Jul 28, 2025
1410d93
lint fix
LeiWang1999 Jul 28, 2025
c146c0f
Update runtime.cc
LeiWang1999 Jul 29, 2025
f711e9f
phase out lisence
LeiWang1999 Jul 29, 2025
028e28c
Merge branch 'main' of https://github.com/tile-ai/tilelang into refactor
LeiWang1999 Jul 29, 2025
eec3ef8
Update subproject commit for TVM to 555cc71
LeiWang1999 Jul 29, 2025
efe8942
Merge branch 'refactor' of https://github.com/Hzfengsy/tilelang into …
LeiWang1999 Jul 29, 2025
4657eff
Update subproject commit for TVM to d39953fa
LeiWang1999 Jul 29, 2025
af3ed48
Update subproject commit for TVM to 9574805f
LeiWang1999 Jul 29, 2025
003a9f7
Update subproject commit for TVM to a08b7c3
LeiWang1999 Jul 29, 2025
15d7e4e
fix ci
xwhzz Jul 29, 2025
e0de9a3
Merge branch 'main' of https://github.com/tile-ai/tilelang into refactor
LeiWang1999 Jul 30, 2025
fedc73b
Merge branch 'main' of https://github.com/tile-ai/tilelang into refactor
LeiWang1999 Jul 30, 2025
0234f3e
ci fix
LeiWang1999 Jul 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 24 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ env:

jobs:
format-check:
runs-on: ubuntu-latest
runs-on: self-hosted

permissions:
contents: write
Expand All @@ -26,21 +26,37 @@ jobs:
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install dependencies
- name: Ensure venv (local & persistent)
run: |
python -m pip install --upgrade pip
pip install yapf==0.40.2 toml==0.10.2 tomli==2.0.1 ruff==0.6.5 codespell==2.3.0 clang-format==15.0.7
set -e
REQS_HASH=$(cat requirements-test.txt 2>/dev/null || true)
MARKER="${{ runner.tool_cache }}/.venv_marker_${{ env.PYTHON_VERSION }}_${REQS_HASH:0:8}"

if [[ -f "$MARKER" ]] && [[ -f "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate" ]]; then
echo "venv exists and hash matches – reuse it"
else
echo "venv stale or missing – recreating"
rm -rf "${{ runner.tool_cache }}/${{ env.VENV_DIR }}" "$MARKER"
python -m venv "${{ runner.tool_cache }}/${{ env.VENV_DIR }}"
# shellcheck source=/dev/null
source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
python -m pip install --upgrade pip --no-user
[[ -f requirements-test.txt ]] && \
PIP_NO_BUILD_ISOLATION=1 pip install -r requirements-test.txt --no-user
pip install . --no-user
touch "$MARKER"
fi

- name: Run format check
run: |
git clone https://github.com/tile-ai/tilelang.git main_repo
cp main_repo/format.sh .
rm -rf main_repo
source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
if ! output=$(./format.sh 2>&1); then
echo "------------------------------------"
echo "message:"
echo "$output"
echo "------------------------------------"
printf '%s\n' "$output" | grep "Please review and stage the changes."
echo "------------------------------------"
exit 1
fi

- name: Commit and Push Changes
Expand Down
2 changes: 1 addition & 1 deletion 3rdparty/tvm
Submodule tvm updated from 979c8e to a08b7c
17 changes: 16 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ endif()

# Enable compile command export
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
if(NOT Python_EXECUTABLE)
execute_process(
COMMAND which python
OUTPUT_VARIABLE Python_EXECUTABLE
OUTPUT_STRIP_TRAILING_WHITESPACE
)
set(Python_EXECUTABLE "${Python_EXECUTABLE}" CACHE FILEPATH "Path to the Python executable")
endif()

# Define a custom macro for globbing files with conditional CONFIGURE_DEPENDS
if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.12.0")
Expand Down Expand Up @@ -39,7 +47,8 @@ else()

# Set default build type to RelWithDebInfo if not provided
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING "Build type" FORCE)
# Set default build type to Release if not provided
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
message(STATUS "Setting default build type to ${CMAKE_BUILD_TYPE}")
endif()
endif()
Expand Down Expand Up @@ -145,6 +154,7 @@ message(STATUS "TVM_SOURCE_DIR: ${TVM_SOURCE_DIR}")
# Include directories for TileLang
set(TILE_LANG_INCLUDES
${TVM_SOURCE_DIR}/include
${TVM_SOURCE_DIR}/ffi/include
${TVM_SOURCE_DIR}/src
${TVM_SOURCE_DIR}/3rdparty/dlpack/include
${TVM_SOURCE_DIR}/3rdparty/dmlc-core/include
Expand Down Expand Up @@ -212,6 +222,11 @@ if(CMAKE_BUILD_TYPE STREQUAL "Debug")
target_compile_definitions(tilelang_static PRIVATE "TVM_LOG_DEBUG")
endif()

# Building tvm_cython modules
if(NOT DEFINED TVM_PREBUILD_PATH)
add_dependencies(tilelang tvm_cython)
endif()

# Module shared library
add_library(tilelang_module SHARED $<TARGET_OBJECTS:tilelang_objs>)
target_link_libraries(tilelang_module PUBLIC tvm)
Expand Down
8 changes: 3 additions & 5 deletions benchmark/matmul_fp8/benchmark_matmul.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,8 @@ def get_configs(args, kwargs):
from tilelang.carver.roller.rasterization import NoRasterization
import torch

if torch.version.hip is not None:
arch=CDNA("hip")
else:
arch = CUDA("cuda")
arch = CDNA("hip") if torch.version.hip is not None else CUDA("cuda")

topk = 10

carve_template = MatmulTemplate(
Expand Down Expand Up @@ -158,7 +156,7 @@ def matmul(

# Use half-precision for input data to reduce memory bandwidth,
# accumulate in float for better numerical accuracy
dtype = "e4m3_float8"
dtype = "float8_e4m3"
accum_dtype = "float"

@T.prim_func
Expand Down
2 changes: 1 addition & 1 deletion examples/bitnet-1.58b/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
lm_eval==0.3.0
flash_attn
transformers==4.52.1
transformers==4.52.1
4 changes: 2 additions & 2 deletions examples/cast/example_group_per_split_token_cast_to_fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def group_per_split_token_cast_to_fp8(M, M_max, N, BG, blk_m):

@T.prim_func
def group_per_split_token_cast(X: T.Tensor((M, N), dtype), batch_sizes: T.Tensor(
(BG,), "int32"), X_fp8: T.Tensor((BG, M_max, N), "e4m3_float8"), X_amax: T.Tensor(
(BG,), "int32"), X_fp8: T.Tensor((BG, M_max, N), "float8_e4m3"), X_amax: T.Tensor(
(BG, M_max, T.ceildiv(N, group_size)), accum_dtype)):
with T.Kernel(
T.ceildiv(M_max, blk_m), T.ceildiv(N, group_size), BG, threads=128) as (bx, by, bz):
Expand All @@ -28,7 +28,7 @@ def group_per_split_token_cast(X: T.Tensor((M, N), dtype), batch_sizes: T.Tensor
y_amax_local = T.alloc_fragment((blk_m,), accum_dtype)
y_s_local = T.alloc_fragment((blk_m,), accum_dtype)
y_q_local = T.alloc_fragment((blk_m, group_size), accum_dtype)
y_q_local_fp8 = T.alloc_fragment((blk_m, group_size), "e4m3_float8")
y_q_local_fp8 = T.alloc_fragment((blk_m, group_size), "float8_e4m3")
row_offset = T.alloc_local((1,), "int32")

T.annotate_layout({
Expand Down
4 changes: 2 additions & 2 deletions examples/cast/example_per_token_cast_to_fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def per_token_cast_to_fp8(M, N, blk_m):
fp8_max = 448.0

@T.prim_func
def per_token_cast(X: T.Tensor((M, N), dtype), X_fp8: T.Tensor((M, N), "e4m3_float8"),
def per_token_cast(X: T.Tensor((M, N), dtype), X_fp8: T.Tensor((M, N), "float8_e4m3"),
X_amax: T.Tensor((M, T.ceildiv(N, group_size)), dtype)):
with T.Kernel(T.ceildiv(M, blk_m), T.ceildiv(N, group_size), threads=128) as (bx, by):
row = bx
Expand All @@ -24,7 +24,7 @@ def per_token_cast(X: T.Tensor((M, N), dtype), X_fp8: T.Tensor((M, N), "e4m3_flo
y_amax_local = T.alloc_fragment((blk_m,), dtype)
y_s_local = T.alloc_fragment((blk_m,), dtype)
y_q_local = T.alloc_fragment((blk_m, group_size), dtype)
y_q_local_fp8 = T.alloc_fragment((blk_m, group_size), "e4m3_float8")
y_q_local_fp8 = T.alloc_fragment((blk_m, group_size), "float8_e4m3")

T.annotate_layout({
y_local:
Expand Down
8 changes: 4 additions & 4 deletions examples/deepseek_deepgemm/example_deepgemm_fp8_2xAcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ def tl_gemm(
accum_dtype,
):
assert in_dtype in [
"e4m3_float8",
], "Currently only e4m3_float8 is supported"
"float8_e4m3",
], "Currently only float8_e4m3 is supported"
assert out_dtype in [
"bfloat16",
"float32",
Expand Down Expand Up @@ -179,11 +179,11 @@ def assert_tl_gemm_correctness(M, N, K, block_N, in_dtype, out_dtype, accum_dtyp


def main():
assert_tl_gemm_correctness(1024, 1024, 8192, 128, "e4m3_float8", "bfloat16", "float32")
assert_tl_gemm_correctness(1024, 1024, 8192, 128, "float8_e4m3", "bfloat16", "float32")


if __name__ == "__main__":
for dtype in ["e4m3_float8"]:
for dtype in ["float8_e4m3"]:
for out_dtype in ["bfloat16", "float32"]:
for block_N in [16, 32, 64, 128]:
assert_tl_gemm_correctness(1024, 1024, 8192, block_N, dtype, out_dtype, "float32")
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
def flashattn(batch, heads, kv_head_num, seqlen_kv, dim, pe_dim, block_N, block_H):
scale = (1.0 / (dim + pe_dim))**0.5 * 1.44269504 # log2(e)
dtype = "float16"
q_dtype = "e4m3_float8"
q_dtype = "float8_e4m3"
accum_dtype = "float"
kv_group_num = heads // kv_head_num
VALID_BLOCK_H = min(block_H, kv_group_num)
Expand Down
4 changes: 2 additions & 2 deletions examples/gemm_fp8/example_tilelang_gemm_fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ def test_gemm_fp8(M, N, K, dtype):


def main():
test_gemm_fp8(1024, 1024, 1024, 'e4m3_float8')
test_gemm_fp8(1024, 1024, 1024, 'e5m2_float8')
test_gemm_fp8(1024, 1024, 1024, 'float8_e4m3')
test_gemm_fp8(1024, 1024, 1024, 'float8_e5m2')


if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions examples/gemm_fp8/example_tilelang_gemm_fp8_2xAcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ def test_gemm_fp8(M, N, K, dtype):


def main():
test_gemm_fp8(1024, 1024, 8192, 'e4m3_float8')
test_gemm_fp8(1024, 1024, 8192, 'e5m2_float8')
test_gemm_fp8(1024, 1024, 8192, 'float8_e4m3')
test_gemm_fp8(1024, 1024, 8192, 'float8_e5m2')


if __name__ == "__main__":
Expand Down
10 changes: 5 additions & 5 deletions examples/gemm_fp8/example_tilelang_gemm_fp8_intrinsic.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ def tl_matmul(
):
assert in_dtype in [
"float16",
"e4m3_float8",
"e5m2_float8",
"float8_e4m3",
"float8_e5m2",
"int8",
], "Currently only float16 and int8 are supported"
assert out_dtype in [
Expand All @@ -52,7 +52,7 @@ def tl_matmul(

micro_size_x = micro_size_y = micro_size_k = 16

is_float8 = in_dtype in ["e4m3_float8", "e5m2_float8"]
is_float8 = in_dtype in ["float8_e4m3", "float8_e5m2"]
if out_dtype == "int32" or is_float8:
micro_size_k = 32

Expand Down Expand Up @@ -216,8 +216,8 @@ def assert_tl_matmul_correctness(M, N, K, in_dtype, out_dtype, accum_dtype):


def main():
assert_tl_matmul_correctness(128, 128, 128, "e4m3_float8", "float32", "float32")
assert_tl_matmul_correctness(128, 128, 128, "e5m2_float8", "float32", "float32")
assert_tl_matmul_correctness(128, 128, 128, "float8_e4m3", "float32", "float32")
assert_tl_matmul_correctness(128, 128, 128, "float8_e5m2", "float32", "float32")


if __name__ == "__main__":
Expand Down
5 changes: 2 additions & 3 deletions examples/warp_specialize/example_warp_specialize_flashmla.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
tilelang.disable_cache()


@tilelang.jit(out_idx=[6])
def flashattn(batch, heads, kv_head_num, seqlen_kv, dim, pe_dim, block_N, block_H, num_split):
scale = (1.0 / (dim + pe_dim))**0.5 * 1.44269504 # log2(e)
dtype = "float16"
Expand Down Expand Up @@ -79,7 +80,6 @@ def flash_attn(
p0_1_1_ready_barrier = T.alloc_barrier(arrive_count=128)
lse_0_ready_barrier = T.alloc_barrier(arrive_count=128)
lse_1_ready_barrier = T.alloc_barrier(arrive_count=128)
s_shared_ready_barrier = T.alloc_barrier(arrive_count=128)
q_shared_ready_barrier = T.alloc_barrier(arrive_count=256)
k_pe_shared_1_free_barrier = T.alloc_barrier(arrive_count=128)
k_pe_shared_0_free_barrier = T.alloc_barrier(arrive_count=128)
Expand Down Expand Up @@ -401,8 +401,7 @@ def main(batch=1, heads=128, kv_heads=1, kv_ctx=8192, dim=512, pe_dim=64):
BLOCK_H = 64
num_split = 1

program = flashattn(batch, heads, kv_heads, kv_ctx, dim, pe_dim, BLOCK_N, BLOCK_H, num_split)
kernel = tilelang.compile(program, out_idx=[6])
kernel = flashattn(batch, heads, kv_heads, kv_ctx, dim, pe_dim, BLOCK_N, BLOCK_H, num_split)
profiler = kernel.get_profiler(tensor_supply_type=tilelang.TensorSupplyType.Randn)
profiler.assert_allclose(ref_program, rtol=0.01, atol=0.01)
latency = profiler.do_bench(warmup=500)
Expand Down
1 change: 1 addition & 0 deletions requirements-build.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Should be mirrored in pyproject.toml
Cython
build
cmake>=3.26
packaging
Expand Down
1 change: 1 addition & 0 deletions requirements-test.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# lint requirements
-r requirements-lint.txt
# build requirements
Cython
cmake>=3.26
# runtime requirements
cffi
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -815,7 +815,7 @@ def build_cmake(self, ext):
# -DCMAKE_LIBRARY_OUTPUT_DIRECTORY sets where built libraries go
# -DPYTHON_EXECUTABLE ensures that the correct Python is used
cmake_args = [
f"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={extdir}", f"-DPYTHON_EXECUTABLE={sys.executable}",
f"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={extdir}", f"-DPython_EXECUTABLE={sys.executable}",
f"-DCMAKE_BUILD_TYPE={'Debug' if DEBUG_MODE else 'Release'}"
]
if not USE_ROCM:
Expand Down
Loading
Loading