-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
Introduction
The TVM community has worked since the v0.17.0 release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress):
- Community, RFCs
- AOT, Hexagon, OpenCL & CLML, Web, Metal
- Relax, Dlight, Disco
- TIR, TVMScript
- Docs, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.17.dev0...v0.17.0.rc0.
Community
- #17018 - New committer: Balint Cristian
RFCs
This new RFC added an open, standardized format for neural network exchange developed by the Khronos Group since 2018 (https://www.khronos.org/nnef). It is aimed at deploying trained neural networks from deep learning frameworks to proprietary inference engines of neural network hardware vendors.
- #108 - [RFC] Add NNEF frontend
AOT
- #17077 - Correctly calculate workspace for vector types
Adreno
- #16927 - [SCRIPT]Fix in build config for adreno
BYOC
- #16895 - Add layout check and update shape check for cublas FP8 BYOC
BugFix
- #17138 - [Fix][TIR] Fix outdated call to create extern buffer in make_extern
- #17132 - Restrict CopyOnWrite to _type_final
- #17096 - Update FAttrsGetter to return Map<String, ObjectRef>
- #17078 - [NCCL] Release NCCL thread_local resources in destructor
- #17044 - [Support] Fix copy constructor for support::OrderedSet
- #17000 - [MSC] split name_string with index by colon from the right
- #16923 - [Fix][Dlight] Fix GeneralReduction for log-sum-exp
- #16924 - [Fix] Fix SSA conversion for SizeVar retention
- #16903 - CudaDeviceAPI::GetAttr may check kExist when GPUs absent
- #16901 - rocm shared memory issue on MI250
CI
- #17055 - [SME][Test] Add additional conv2d tests for asymmetric parameters
- #17007 - [TOPI][Testing] Enable conv2d NHWC fp16 topi testing for
arm_cpu
- #16930 - [UnitTest] Use pytest's scope='session' for tvm.testing.parameter
- #16948 - Update image tag to 20240428-060115-0b09ed018
- #16931 - Use LLVM17 for tests on
ci_cpu
- #16942 - Enable Conda setup v3
- #16939 - Upgrade CUDA to 12.4
CRT
- #17097 - [Bugfix]Return error code on error from ModuleGetFunction
Disco
- #17035 - [QoL] Implement broadcast/scatter methods for Session
- #16992 - [Bugfix]Handle NDArray larger than OS buffer for pipe
- #16978 - Implement
num_workers
property fordisco.Session
- #16989 - Treat hangup of disco worker process as kShutdown
- #16993 - Allow allocation that only exists on worker0
- #16979 - Expose disco.Session.shutdown through the python API
- #16919 - Improve error message for CallPacked
Dlight
- #17082 - Use 16x32 spatial x reduction thread extents in GEMV scheduling
- #17052 - Skip GEMV rules when more than one vector
- #17026 - Perf improvement for low_batch_gemv on Metal
- #17016 - Update Adreno GEMV Rules
- #16972 - [GPU] Enhance opencl thread limit for schedules
- #16973 - [GPU] Improved gemv outer fallback schedule
- #16958 - Check for target in function attributes
- #16894 - Enhance vectorization for gpu matmul
- #16884 - Add check for matmul dtype and fix reduction rule
Docs
- #17146 - [DOC] Fix typo for the "We utilize the intermediate representation of nn.Graph to convert the OneFlow model to Reley."
- #17015 - [DOC] Update Model Links to Include Commit
Frontend
- #17014 - [ArgParse] Pass default values to target compiler([Bug] Default option is not passed by TVMC Front end #13264)
- #16961 - [Bugfix][ONNX] Improve broadcast and batch_matmul conversion
- #16936 - [TFLite] Add support for GELU conversion
Hexagon
- #17123 - Add support for v75
LLVM
- #17046 - [Arith][SVE] Add rewrite rules for indices split by scalable expressions
- #16966 - [SVE] Add support for representing and creating buffer-level predicates
- #17001 - [SVE] Use only powers of two as possible vscale values
- #16962 - [SVE] Add codegen support for
vscale_range()
function attribute - #16968 - Stringref API deprecation fixes
- #16965 - [SVE] Add get_active_lane_mask builtin
- #16899 - [SVE][TOPI] Add conv2d NHWC hybrid SVE schedule for
arm_cpu
- #16893 - [SVE] Check for SVE target in VectorizeLoop
- #16862 - [SVE] Support splitting by vscale in
tir::split
andte::split
MetaSchedule
- #17012 - [BugFix]MultiLevelTilingTensorCore generates inconsistent thread-binding sketch for batched matmul
- #17066 - [BugFix]Fix TensorIntrin ‘dot_4x4_i8i8s32_sdot’ is not registered
Metal
OpenCL & CLML
- #16933 - [CLML] Fix in clml pattern check condition
- #16929 - [VM][OPENCL] Take advantage of OpenCL host ptr for improved copy
ROCm
- #17141 - [Backend]Fix error when building TVM with LLVM 19
Relax
- #17139 - Fix cublas dispatch for corner cases
- #17127 - [KVCache] Support fork in sliding window sink part
- #17115 - Support
input_axis_separator
to allow 2D to 1D conversion - #17119 - [Bugfix]Set purity=false for LazySetOutput
- #17118 - [VM] Improved error messages for mismatched parameter count
- #17110 - Alloc BYOC workspace with R.builtin.alloc_tensor
- #17089 - [ONNX] Add support for HardSigmoid
- #17100 - [KVCache] Unlimited depth blocks
- #17075 - [Transform] Modify FuseTIR pass to propagate buffer attributes
- #17088 - [ONNX] Add support for HardSwish
- #17085 - [PyTorch] Add support for torch.nn.Hardsigmoid
- #17083 - [TVMScript]Preserve tir.SizeVar through TVMScript round-trip
- #17086 - Ignore dynamic parameters in RewriteDataflowReshape
- #17084 - [PyTorch] Add support for torch.nn.Hardswish
- #17074 - [KVCache][Test] Fix TIR attn kernels for uncommon group size
- #17067 - Add missing white spaces in error messages
- #17061 - [Frontend][Onnx] Cast Op special handling for ShapeExpr input
- #17033 - [Bugfix] Apply FuseOps to nested DataflowBlock
- #17032 - [Bugfix] Annotate ComputePrimValue output as host function
- #17034 - [Bugfix] Bind symbolic variables in R.match_cast
- #16960 - [UnitTest] Validate IRModule with multiple targets
- #16995 - [KVCache] Support KVCache decode from forked sequence and pop more tokens
- #16959 - [Transform] Handle identical PrimFunc with distinct VDevice
- #16589 - [Unity] Check for transpose and dynamic shape in AdjustMatmulOrder
- #16988 - [KVCache] Fix the aux data syncing order of paged KV cache
- #16922 - [BugFix]change FuseOpsByPattern strategy to pattern-match maximal subgraph
- #16982 - [Unity][BYOC] Use arith.Analyzer to check batch equality of matmul in cublas
- #16955 - Implement relax.op.view
- #16971 - Support nested ModuleList in nn.Module
- #16826 - Express dynamic arguments of strided_slice as arguments
- #16476 - [Unity][Cutlass] Fix C source generation of dense operation
- #16940 - Allow PrimValue as index in relax.op.take
- #16934 - [TIR] Introduce new
cumsum
op for gpu - #16859 - [QoL]Use SeqExpr in IR types when SeqExpr is required
- #16904 - Prevent to generate duplicate func in dispatch_sort_scan
- #16905 - [Bugfix]Raise exception for OOM allocation
- #16827 - Handle binary operations between Tensor and PrimValue
- #16902 - Allow specifying entry_funcs for BYOC
- #16860 - [QoL]Infer StructInfo for relax::Tuple on construction
- #16861 - [QoL]Return well-formed IR from relax::Function::CreateEmpty
- #16886 - [Frontend] Fix sort, argsort and topk in nn module
- #16883 - Stabilize relax pass mutation order
Relay
- #16983 - [BugFix]skip leaf args when matching 'path' part for dominator pattern
- #16996 - fixed to make TupleGetItem inherits the previous span
Runtime
- #17057 - Stateless interface of PagedKVCache leaf node commit
- #17049 - Support PagedKVCache with tree attention
- #17045 - Fix PagedKVCache for PopN and enhance tests
- #16998 - Compatibility with dmlc::Stream API changes
- #17037 - [ROCm] Enable ROCm host memory support
- #17036 - Use preferred host memory (pinned memory) in KV cache
- #16994 - Allow query of available device memory through DeviceAPI
- #16997 - [Disco] Restore checks for hangup of disco pipe
- #16938 - Allow offset to be specified in NDArray::CreateView
- #16890 - [VULKAN] Support total_global_memory
- #16880 - Implemented Datatype.itemsize()
TIR
- #17134 - [Schedule] Remove
@type_check
forset_axis_separator
- #17112 - [DLight] Enable SimdGroup op for Metal
- #17098 - [RPC] Allow RPC calls to compiled PrimFuncs with no arguments
- #17039 - Fix Bug in VectorizeLoop
- #17030 - Fix Shuffle rewrite
- #16947 - Support narrow dtype for let binding
- #16952 - Enhance CLZ intrinsic support
- #16945 - [Compute-at] Make compute-ated block simple when the predicate could be merged
- #16879 - Make T.reinterpret nop when dtype is the same
TOPI
- #17091 - Add dense schedule for fp16 and fp32 using gemm
- #17048 - [SME]Add conv2d NHWC SME fp16->fp32 schedule
- #17040 - Fix SME conv2d schedule import and intrin argument
- #17003 - [SME]Add conv2d NHWC SME fp32 schedule
- #16977 - Remove
blockIdx.z
in topi sort - #16951 - Revert unification of conv2d NHWC hybrid scheduling for
arm_cpu
targets
TVMScript
- #17107 - Better Type Annotation for TIR OP
- #16967 - Fix error reporting inside Macro func
- #16916 - Support
T.launch_thread
with i64 dtype - #16876 - Optionally use
ruff format
instead ofblack
- #16877 - [Bug] Add test case for missing symbolic bounds
cuda & cutlass & tensorrt
- #16980 - [Cuda] Skip FreeDataSpace when CUDA driver is in inconsistent state
web
- #17031 - Fix string to uint8 array for special characters
- #17028 - Add dtype and offset for CreateView in runtime
- #16910 - Support string[] in setPackedFunc() and exceptionally long arrays
Misc
- #17135 - [QoL][IR] Provide default constructor for NameSupply/GlobalVarSupply
- #17125 - [Utils] Define line-length for "ruff format"
- #17152 - GraphExecutor: Fix wild pointer assign when input and output are reshape
- #17150 - [WebGPU] Fall back to 256MB for maxBufferSize if needed
- #17128 - [Compute-inline] Prefer T.where for reverse compute-inlined block with predicate
- #16976 - [WebGPU] Implement
tir.dp4a
with WGSL built-in functiondot4I8Packed
- #17124 - [WebGPU] Add
tir.dp4a
- #17113 - [CudaGraph] Handle exceptions thrown while capturing cuda graph
- #17094 - [Utility][Container] Support non-nullable types in Array::Map
- #17101 - [RPC] Raise error if server process terminated
- #17092 - [UnitTests] Use tvm.ir.assert_structural_equal whenever possible
- #17054 - [SME] Utilize predication in fp32 matmul and conv2d schedules
- #17079 - [CMake] Show NVCC include directories in compile_commands.json
- #17076 - [SME] Extract gemm block correctly when fused with bias
- #17071 - [WebGPU] Translate
int8x4
intou32
- #17065 - [FP8][Codegen] Add make_fp8 vector constructors
- #17064 - Add docs of v0.15.0 and v0.16.0
- #16985 - [CODEGEN] Vector-Codegen support for llvm-pure-intrin
- #17058 - Introduce outer reduction for metal
- #17051 - Use adapter.info when available instead of requestAdapterInfo
- #16981 - [SME] Add scalable fp16->fp32 dense schedule
- #17029 - [Contrib] Implement NDArray cache update
- #17027 - [picojson] Let objects be ordered when serializing
- #17021 - [WebGPU] Update error messages to be more user-friendly
- #17010 - Support multinomial_from_uniform dispatch
- #16999 - [USMP] add missing const specifier for global_const_workspace
- #17005 - [WebGPU] Handle device OOM in createBuffer
- #16921 - [SME] Introduce scalable fp32 dense schedule
- #16957 - chore: remove repetitive words
- #16909 - [QoL][IR] Provide std::hash and std::equal_to for IR Variable types
- #16987 - [JVM] Automatic Compatibility of JVM AttachCurrentThread
- #16974 - [CUBLAS][FP8] Enable R.matmul + R.multiply offloading
- #16896 - [CUBLAS] Enable offloading of R.matmul + R.dequantize
- #16956 - Add script for testing release package
- #16908 - Overriding the StructuralEqual() for easy usage
- #16932 - Enable gemv schedule for adreno
- #16935 - [3rdparty] Bump FlashInfer for sampling functions
- #16937 - [Thrust] Increase static workspace size
- #16915 - [Marvell BYOC]: Marvell AI Accelerator Integration - Phase 2
- #16741 - Restore "pytest.mark.gpu" for RELAX tests
- #16914 - [CMAKE] Make LOG_BEFORE_THROW explicit
- #16913 - Enhance Release Note Script and Remove Useless File
- #16907 - [Upd] Fixed lld search in rocm
- #16900 - [CMAKE] Misc improvment of Util
- #16897 - [Target] Don't register AArch64 target tags without LLVM compiler support
- #16892 - [CUBLAS] Set fp32 compute and scale dtypes in fp16 matmul
- #16888 - [CUBLAS][FP8] Support e4m3 gemm in cuBLAS BYOC
- #16887 - [Contrib] Enable fp16 for thrust sort
- #16881 - [release][Dont Squash] Update version to 0.16.0 and 0.17.0.dev on main branch
mshr-h
Metadata
Metadata
Assignees
Labels
No labels