[Release] v0.17.0 Release Candidate Notes

# Introduction

The TVM community has worked since the v0.17.0 release to deliver the following new exciting improvements!

The main tags are below (**bold text is with lots of progress**):

- Community, RFCs
- AOT, Hexagon， OpenCL & CLML, Web, Metal
- **Relax**, **Dlight**, **Disco**
- TIR, TVMScript
- Docs, CI, **Misc**, **BugFix**

Please visit the full listing of commits for a complete view: [v0.17.dev0...v0.17.0.rc0](https://github.com/apache/tvm/compare/v0.17.dev0...v0.17.0.rc0).

### Community

 * [#17018](https://github.com/apache/tvm/pull/17018) - New committer: Balint Cristian

 ### RFCs

This new RFC added an open, standardized format for neural network exchange developed by the Khronos Group since 2018 (https://www.khronos.org/nnef). It is aimed at deploying trained neural networks from deep learning frameworks to proprietary inference engines of neural network hardware vendors.

 * [#108](https://github.com/apache/tvm-rfcs/pull/108) - [RFC] Add NNEF frontend

----

### AOT
 * [#17077](https://github.com/apache/tvm/pull/17077) - Correctly calculate workspace for vector types

### Adreno
 * [#16927](https://github.com/apache/tvm/pull/16927) - [SCRIPT]Fix in build config for adreno

### BYOC
 * [#16895](https://github.com/apache/tvm/pull/16895) - Add layout check and update shape check for cublas FP8 BYOC

### BugFix
 * [#17138](https://github.com/apache/tvm/pull/17138) - [Fix][TIR] Fix outdated call to create extern buffer in make_extern
 * [#17132](https://github.com/apache/tvm/pull/17132) - Restrict CopyOnWrite to _type_final
 * [#17096](https://github.com/apache/tvm/pull/17096) - Update FAttrsGetter to return Map<String, ObjectRef>
 * [#17078](https://github.com/apache/tvm/pull/17078) - [NCCL] Release NCCL thread_local resources in destructor
 * [#17044](https://github.com/apache/tvm/pull/17044) - [Support] Fix copy constructor for support::OrderedSet
 * [#17000](https://github.com/apache/tvm/pull/17000) - [MSC] split name_string with index by colon from the right
 * [#16923](https://github.com/apache/tvm/pull/16923) - [Fix][Dlight] Fix GeneralReduction for log-sum-exp
 * [#16924](https://github.com/apache/tvm/pull/16924) - [Fix] Fix SSA conversion for SizeVar retention
 * [#16903](https://github.com/apache/tvm/pull/16903) - CudaDeviceAPI::GetAttr may check kExist when GPUs absent
 * [#16901](https://github.com/apache/tvm/pull/16901) - rocm shared memory issue on MI250

### CI
 * [#17055](https://github.com/apache/tvm/pull/17055) - [SME][Test] Add additional conv2d tests for asymmetric parameters
 * [#17007](https://github.com/apache/tvm/pull/17007) - [TOPI][Testing] Enable conv2d NHWC fp16 topi testing for `arm_cpu`
 * [#16930](https://github.com/apache/tvm/pull/16930) - [UnitTest] Use pytest's scope='session' for tvm.testing.parameter
 * [#16948](https://github.com/apache/tvm/pull/16948) - Update image tag to 20240428-060115-0b09ed018
 * [#16931](https://github.com/apache/tvm/pull/16931) - Use LLVM17 for tests on `ci_cpu`
 * [#16942](https://github.com/apache/tvm/pull/16942) - Enable Conda setup v3
 * [#16939](https://github.com/apache/tvm/pull/16939) - Upgrade CUDA to 12.4

### CRT
 * [#17097](https://github.com/apache/tvm/pull/17097) - [Bugfix]Return error code on error from ModuleGetFunction

### Disco
 * [#17035](https://github.com/apache/tvm/pull/17035) - [QoL] Implement broadcast/scatter methods for Session
 * [#16992](https://github.com/apache/tvm/pull/16992) - [Bugfix]Handle NDArray larger than OS buffer for pipe
 * [#16978](https://github.com/apache/tvm/pull/16978) - Implement `num_workers` property for `disco.Session`
 * [#16989](https://github.com/apache/tvm/pull/16989) - Treat hangup of disco worker process as kShutdown
 * [#16993](https://github.com/apache/tvm/pull/16993) - Allow allocation that only exists on worker0
 * [#16979](https://github.com/apache/tvm/pull/16979) - Expose disco.Session.shutdown through the python API
 * [#16919](https://github.com/apache/tvm/pull/16919) - Improve error message for CallPacked

### Dlight
 * [#17082](https://github.com/apache/tvm/pull/17082) - Use 16x32 spatial x reduction thread extents in GEMV scheduling
 * [#17052](https://github.com/apache/tvm/pull/17052) - Skip GEMV rules when more than one vector
 * [#17026](https://github.com/apache/tvm/pull/17026) - Perf improvement for low_batch_gemv on Metal
 * [#17016](https://github.com/apache/tvm/pull/17016) - Update Adreno GEMV Rules
 * [#16972](https://github.com/apache/tvm/pull/16972) - [GPU] Enhance opencl thread limit for schedules
 * [#16973](https://github.com/apache/tvm/pull/16973) - [GPU] Improved gemv outer fallback schedule
 * [#16958](https://github.com/apache/tvm/pull/16958) - Check for target in function attributes
 * [#16894](https://github.com/apache/tvm/pull/16894) - Enhance vectorization for gpu matmul
 * [#16884](https://github.com/apache/tvm/pull/16884) - Add check for matmul dtype and fix reduction rule

### Docs
 * [#17146](https://github.com/apache/tvm/pull/17146) - [DOC] Fix typo for the "We utilize the intermediate representation of nn.Graph to convert the OneFlow model to Reley."
 * [#17015](https://github.com/apache/tvm/pull/17015) - [DOC] Update Model Links to Include Commit

### Frontend
 * [#17014](https://github.com/apache/tvm/pull/17014) - [ArgParse] Pass default values to target compiler(#13264)
 * [#16961](https://github.com/apache/tvm/pull/16961) - [Bugfix][ONNX] Improve broadcast and batch_matmul conversion
 * [#16936](https://github.com/apache/tvm/pull/16936) - [TFLite] Add support for GELU conversion

### Hexagon
 * [#17123](https://github.com/apache/tvm/pull/17123) - Add support for v75

### LLVM
 * [#17046](https://github.com/apache/tvm/pull/17046) - [Arith][SVE] Add rewrite rules for indices split by scalable expressions
 * [#16966](https://github.com/apache/tvm/pull/16966) - [SVE] Add support for representing and creating buffer-level predicates
 * [#17001](https://github.com/apache/tvm/pull/17001) - [SVE] Use only powers of two as possible vscale values
 * [#16962](https://github.com/apache/tvm/pull/16962) - [SVE] Add codegen support for `vscale_range()` function attribute
 * [#16968](https://github.com/apache/tvm/pull/16968) - Stringref API deprecation fixes
 * [#16965](https://github.com/apache/tvm/pull/16965) - [SVE] Add get_active_lane_mask builtin
 * [#16899](https://github.com/apache/tvm/pull/16899) - [SVE][TOPI] Add conv2d NHWC hybrid SVE schedule for `arm_cpu`
 * [#16893](https://github.com/apache/tvm/pull/16893) - [SVE] Check for SVE target in VectorizeLoop
 * [#16862](https://github.com/apache/tvm/pull/16862) - [SVE] Support splitting by vscale in `tir::split` and `te::split`

### MetaSchedule
 * [#17012](https://github.com/apache/tvm/pull/17012) - [BugFix]MultiLevelTilingTensorCore generates inconsistent thread-binding sketch for batched matmul
 * [#17066](https://github.com/apache/tvm/pull/17066) - [BugFix]Fix TensorIntrin ‘dot_4x4_i8i8s32_sdot’ is not registered

### Metal
 * [#17059](https://github.com/apache/tvm/pull/17059) - Enable Debug Label
 * [#17025](https://github.com/apache/tvm/pull/17025) - Support metal device profiling

### OpenCL & CLML
 * [#16933](https://github.com/apache/tvm/pull/16933) - [CLML] Fix in clml pattern check condition
 * [#16929](https://github.com/apache/tvm/pull/16929) - [VM][OPENCL] Take advantage of OpenCL host ptr for improved copy

### ROCm
 * [#17141](https://github.com/apache/tvm/pull/17141) - [Backend]Fix error when building TVM with LLVM 19

### Relax
 * [#17139](https://github.com/apache/tvm/pull/17139) - Fix cublas dispatch for corner cases
 * [#17127](https://github.com/apache/tvm/pull/17127) - [KVCache] Support fork in sliding window sink part
 * [#17115](https://github.com/apache/tvm/pull/17115) - Support `input_axis_separator` to allow 2D to 1D conversion
 * [#17119](https://github.com/apache/tvm/pull/17119) - [Bugfix]Set purity=false for LazySetOutput
 * [#17118](https://github.com/apache/tvm/pull/17118) - [VM] Improved error messages for mismatched parameter count
 * [#17110](https://github.com/apache/tvm/pull/17110) - Alloc BYOC workspace with R.builtin.alloc_tensor
 * [#17089](https://github.com/apache/tvm/pull/17089) - [ONNX] Add support for HardSigmoid
 * [#17100](https://github.com/apache/tvm/pull/17100) -  [KVCache] Unlimited depth blocks
 * [#17075](https://github.com/apache/tvm/pull/17075) - [Transform] Modify FuseTIR pass to propagate buffer attributes
 * [#17088](https://github.com/apache/tvm/pull/17088) - [ONNX] Add support for HardSwish
 * [#17085](https://github.com/apache/tvm/pull/17085) - [PyTorch] Add support for torch.nn.Hardsigmoid
 * [#17083](https://github.com/apache/tvm/pull/17083) - [TVMScript]Preserve tir.SizeVar through TVMScript round-trip
 * [#17086](https://github.com/apache/tvm/pull/17086) - Ignore dynamic parameters in RewriteDataflowReshape
 * [#17084](https://github.com/apache/tvm/pull/17084) - [PyTorch] Add support for torch.nn.Hardswish
 * [#17074](https://github.com/apache/tvm/pull/17074) - [KVCache][Test] Fix TIR attn kernels for uncommon group size
 * [#17067](https://github.com/apache/tvm/pull/17067) - Add missing white spaces in error messages
 * [#17061](https://github.com/apache/tvm/pull/17061) - [Frontend][Onnx] Cast Op special handling for ShapeExpr input
 * [#17033](https://github.com/apache/tvm/pull/17033) - [Bugfix] Apply FuseOps to nested DataflowBlock
 * [#17032](https://github.com/apache/tvm/pull/17032) - [Bugfix] Annotate ComputePrimValue output as host function
 * [#17034](https://github.com/apache/tvm/pull/17034) - [Bugfix] Bind symbolic variables in R.match_cast
 * [#16960](https://github.com/apache/tvm/pull/16960) -  [UnitTest] Validate IRModule with multiple targets
 * [#16995](https://github.com/apache/tvm/pull/16995) - [KVCache] Support KVCache decode from forked sequence and pop more tokens
 * [#16959](https://github.com/apache/tvm/pull/16959) - [Transform] Handle identical PrimFunc with distinct VDevice
 * [#16589](https://github.com/apache/tvm/pull/16589) - [Unity] Check for transpose and dynamic shape in AdjustMatmulOrder
 * [#16988](https://github.com/apache/tvm/pull/16988) - [KVCache] Fix the aux data syncing order of paged KV cache
 * [#16922](https://github.com/apache/tvm/pull/16922) - [BugFix]change FuseOpsByPattern strategy to pattern-match maximal subgraph
 * [#16982](https://github.com/apache/tvm/pull/16982) - [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas
 * [#16955](https://github.com/apache/tvm/pull/16955) - Implement relax.op.view
 * [#16971](https://github.com/apache/tvm/pull/16971) - Support nested ModuleList in nn.Module
 * [#16826](https://github.com/apache/tvm/pull/16826) - Express dynamic arguments of strided_slice as arguments
 * [#16476](https://github.com/apache/tvm/pull/16476) - [Unity][Cutlass] Fix C source generation of dense operation
 * [#16940](https://github.com/apache/tvm/pull/16940) - Allow PrimValue as index in relax.op.take
 * [#16934](https://github.com/apache/tvm/pull/16934) - [TIR] Introduce new `cumsum` op for gpu
 * [#16859](https://github.com/apache/tvm/pull/16859) - [QoL]Use SeqExpr in IR types when SeqExpr is required
 * [#16904](https://github.com/apache/tvm/pull/16904) - Prevent to generate duplicate func in dispatch_sort_scan
 * [#16905](https://github.com/apache/tvm/pull/16905) - [Bugfix]Raise exception for OOM allocation
 * [#16827](https://github.com/apache/tvm/pull/16827) - Handle binary operations between Tensor and PrimValue
 * [#16902](https://github.com/apache/tvm/pull/16902) - Allow specifying entry_funcs for BYOC
 * [#16860](https://github.com/apache/tvm/pull/16860) - [QoL]Infer StructInfo for relax::Tuple on construction
 * [#16861](https://github.com/apache/tvm/pull/16861) - [QoL]Return well-formed IR from relax::Function::CreateEmpty
 * [#16886](https://github.com/apache/tvm/pull/16886) - [Frontend] Fix sort, argsort and topk in nn module
 * [#16883](https://github.com/apache/tvm/pull/16883) - Stabilize relax pass mutation order

### Relay
 * [#16983](https://github.com/apache/tvm/pull/16983) - [BugFix]skip leaf args when matching 'path' part for dominator pattern
 * [#16996](https://github.com/apache/tvm/pull/16996) - fixed to make TupleGetItem inherits the previous span

### Runtime
 * [#17057](https://github.com/apache/tvm/pull/17057) - Stateless interface of PagedKVCache leaf node commit
 * [#17049](https://github.com/apache/tvm/pull/17049) - Support PagedKVCache with tree attention
 * [#17045](https://github.com/apache/tvm/pull/17045) - Fix PagedKVCache for PopN and enhance tests
 * [#16998](https://github.com/apache/tvm/pull/16998) - Compatibility with dmlc::Stream API changes
 * [#17037](https://github.com/apache/tvm/pull/17037) - [ROCm] Enable ROCm host memory support
 * [#17036](https://github.com/apache/tvm/pull/17036) - Use preferred host memory (pinned memory) in KV cache
 * [#16994](https://github.com/apache/tvm/pull/16994) - Allow query of available device memory through DeviceAPI
 * [#16997](https://github.com/apache/tvm/pull/16997) - [Disco] Restore checks for hangup of disco pipe
 * [#16938](https://github.com/apache/tvm/pull/16938) - Allow offset to be specified in NDArray::CreateView
 * [#16890](https://github.com/apache/tvm/pull/16890) - [VULKAN] Support total_global_memory
 * [#16880](https://github.com/apache/tvm/pull/16880) - Implemented Datatype.itemsize()

### TIR
 * [#17134](https://github.com/apache/tvm/pull/17134) - [Schedule] Remove `@type_check` for `set_axis_separator`
 * [#17112](https://github.com/apache/tvm/pull/17112) - [DLight] Enable SimdGroup op for Metal
 * [#17098](https://github.com/apache/tvm/pull/17098) - [RPC] Allow RPC calls to compiled PrimFuncs with no arguments
 * [#17039](https://github.com/apache/tvm/pull/17039) - Fix Bug in VectorizeLoop
 * [#17030](https://github.com/apache/tvm/pull/17030) - Fix Shuffle rewrite
 * [#16947](https://github.com/apache/tvm/pull/16947) - Support narrow dtype for let binding
 * [#16952](https://github.com/apache/tvm/pull/16952) - Enhance CLZ intrinsic support
 * [#16945](https://github.com/apache/tvm/pull/16945) - [Compute-at] Make compute-ated block simple when the predicate could be merged
 * [#16879](https://github.com/apache/tvm/pull/16879) - Make T.reinterpret nop when dtype is the same

### TOPI
 * [#17091](https://github.com/apache/tvm/pull/17091) - Add dense schedule for fp16 and fp32 using gemm
 * [#17048](https://github.com/apache/tvm/pull/17048) - [SME]Add conv2d NHWC SME fp16->fp32 schedule
 * [#17040](https://github.com/apache/tvm/pull/17040) - Fix SME conv2d schedule import and intrin argument
 * [#17003](https://github.com/apache/tvm/pull/17003) - [SME]Add conv2d NHWC SME fp32 schedule
 * [#16977](https://github.com/apache/tvm/pull/16977) - Remove `blockIdx.z` in topi sort
 * [#16951](https://github.com/apache/tvm/pull/16951) - Revert unification of conv2d NHWC hybrid scheduling for `arm_cpu` targets

### TVMScript
 * [#17107](https://github.com/apache/tvm/pull/17107) - Better Type Annotation for TIR OP
 * [#16967](https://github.com/apache/tvm/pull/16967) - Fix error reporting inside Macro func
 * [#16916](https://github.com/apache/tvm/pull/16916) - Support `T.launch_thread` with i64 dtype
 * [#16876](https://github.com/apache/tvm/pull/16876) - Optionally use `ruff format` instead of `black`
 * [#16877](https://github.com/apache/tvm/pull/16877) - [Bug] Add test case for missing symbolic bounds

### cuda & cutlass & tensorrt
 * [#16980](https://github.com/apache/tvm/pull/16980) - [Cuda] Skip FreeDataSpace when CUDA driver is in inconsistent state

### web
 * [#17031](https://github.com/apache/tvm/pull/17031) - Fix string to uint8 array for special characters
 * [#17028](https://github.com/apache/tvm/pull/17028) - Add dtype and offset for CreateView in runtime
 * [#16910](https://github.com/apache/tvm/pull/16910) - Support string[] in setPackedFunc() and exceptionally long arrays

### Misc
 * [#17135](https://github.com/apache/tvm/pull/17135) - [QoL][IR] Provide default constructor for NameSupply/GlobalVarSupply
 * [#17125](https://github.com/apache/tvm/pull/17125) - [Utils] Define line-length for "ruff format"
 * [#17152](https://github.com/apache/tvm/pull/17152) - GraphExecutor: Fix wild pointer assign when input and output are reshape
 * [#17150](https://github.com/apache/tvm/pull/17150) - [WebGPU] Fall back to 256MB for maxBufferSize if needed
 * [#17128](https://github.com/apache/tvm/pull/17128) - [Compute-inline] Prefer T.where for reverse compute-inlined block with predicate
 * [#16976](https://github.com/apache/tvm/pull/16976) - [WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed`
 * [#17124](https://github.com/apache/tvm/pull/17124) - [WebGPU] Add `tir.dp4a`
 * [#17113](https://github.com/apache/tvm/pull/17113) - [CudaGraph] Handle exceptions thrown while capturing cuda graph
 * [#17094](https://github.com/apache/tvm/pull/17094) - [Utility][Container] Support non-nullable types in Array::Map
 * [#17101](https://github.com/apache/tvm/pull/17101) - [RPC] Raise error if server process terminated
 * [#17092](https://github.com/apache/tvm/pull/17092) - [UnitTests] Use tvm.ir.assert_structural_equal whenever possible
 * [#17054](https://github.com/apache/tvm/pull/17054) - [SME] Utilize predication in fp32 matmul and conv2d schedules
 * [#17079](https://github.com/apache/tvm/pull/17079) - [CMake] Show NVCC include directories in compile_commands.json
 * [#17076](https://github.com/apache/tvm/pull/17076) - [SME] Extract gemm block correctly when fused with bias
 * [#17071](https://github.com/apache/tvm/pull/17071) - [WebGPU] Translate `int8x4` into `u32`
 * [#17065](https://github.com/apache/tvm/pull/17065) - [FP8][Codegen] Add make_fp8 vector constructors
 * [#17064](https://github.com/apache/tvm/pull/17064) - Add docs of v0.15.0 and v0.16.0
 * [#16985](https://github.com/apache/tvm/pull/16985) - [CODEGEN] Vector-Codegen support for llvm-pure-intrin
 * [#17058](https://github.com/apache/tvm/pull/17058) - Introduce outer reduction for metal
 * [#17051](https://github.com/apache/tvm/pull/17051) - Use adapter.info when available instead of requestAdapterInfo
 * [#16981](https://github.com/apache/tvm/pull/16981) - [SME] Add scalable fp16->fp32 dense schedule
 * [#17029](https://github.com/apache/tvm/pull/17029) - [Contrib] Implement NDArray cache update
 * [#17027](https://github.com/apache/tvm/pull/17027) - [picojson] Let objects be ordered when serializing
 * [#17021](https://github.com/apache/tvm/pull/17021) - [WebGPU] Update error messages to be more user-friendly
 * [#17010](https://github.com/apache/tvm/pull/17010) - Support multinomial_from_uniform dispatch
 * [#16999](https://github.com/apache/tvm/pull/16999) - [USMP] add missing const specifier for global_const_workspace
 * [#17005](https://github.com/apache/tvm/pull/17005) - [WebGPU] Handle device OOM in createBuffer
 * [#16921](https://github.com/apache/tvm/pull/16921) - [SME] Introduce scalable fp32 dense schedule
 * [#16957](https://github.com/apache/tvm/pull/16957) - chore: remove repetitive words
 * [#16909](https://github.com/apache/tvm/pull/16909) - [QoL][IR] Provide std::hash and std::equal_to for IR Variable types
 * [#16987](https://github.com/apache/tvm/pull/16987) - [JVM] Automatic Compatibility of JVM AttachCurrentThread
 * [#16974](https://github.com/apache/tvm/pull/16974) - [CUBLAS][FP8] Enable R.matmul + R.multiply offloading
 * [#16896](https://github.com/apache/tvm/pull/16896) - [CUBLAS] Enable offloading of R.matmul + R.dequantize
 * [#16956](https://github.com/apache/tvm/pull/16956) - Add script for testing release package
 * [#16908](https://github.com/apache/tvm/pull/16908) - Overriding the StructuralEqual() for easy usage
 * [#16932](https://github.com/apache/tvm/pull/16932) - Enable gemv schedule for adreno
 * [#16935](https://github.com/apache/tvm/pull/16935) - [3rdparty] Bump FlashInfer for sampling functions
 * [#16937](https://github.com/apache/tvm/pull/16937) - [Thrust] Increase static workspace size
 * [#16915](https://github.com/apache/tvm/pull/16915) - [Marvell BYOC]: Marvell AI Accelerator Integration - Phase 2
 * [#16741](https://github.com/apache/tvm/pull/16741) - Restore "pytest.mark.gpu" for RELAX tests
 * [#16914](https://github.com/apache/tvm/pull/16914) - [CMAKE] Make LOG_BEFORE_THROW explicit
 * [#16913](https://github.com/apache/tvm/pull/16913) - Enhance Release Note Script and Remove Useless File
 * [#16907](https://github.com/apache/tvm/pull/16907) - [Upd] Fixed lld search in rocm
 * [#16900](https://github.com/apache/tvm/pull/16900) - [CMAKE] Misc improvment of Util
 * [#16897](https://github.com/apache/tvm/pull/16897) - [Target] Don't register AArch64 target tags without LLVM compiler support
 * [#16892](https://github.com/apache/tvm/pull/16892) - [CUBLAS] Set fp32 compute and scale dtypes in fp16 matmul
 * [#16888](https://github.com/apache/tvm/pull/16888) - [CUBLAS][FP8] Support e4m3 gemm in cuBLAS BYOC
 * [#16887](https://github.com/apache/tvm/pull/16887) - [Contrib] Enable fp16 for thrust sort
 * [#16881](https://github.com/apache/tvm/pull/16881) - [release][Dont Squash] Update version to 0.16.0 and 0.17.0.dev on main branch

[Release] v0.17.0 Release Candidate Notes #17178

Description

Introduction

Community

RFCs

AOT

Adreno

BYOC

BugFix

CI

CRT

Disco

Dlight

Docs

Frontend

Hexagon

LLVM

MetaSchedule

Metal

OpenCL & CLML

ROCm

Relax

Relay

Runtime

TIR

TOPI

TVMScript

cuda & cutlass & tensorrt

web

Misc

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions