[AMD][MLA] Fix mla autotune for rocm #861

LeiWang1999 · 2025-09-22T19:52:55Z

Summary by CodeRabbit

New Features
- Added an --autotune flag for automatic configuration selection during decoding.
- Smarter autotuning avoids unnecessary recompilation when parameters are provided, improving performance.
Changes
- CLI updates: renamed --auto_tune to --autotune.
- Default values updated: --batch now 128 (was 1); --kv_ctx now 8192 (was 1024).
Refactor
- Streamlined autotuning logic and removed legacy tuning paths for a cleaner, more reliable experience.

…ize in benchmark script

…nt defaults - Introduced a new `get_configs` function to generate autotuning configurations for the benchmark. - Updated the default batch size and kv context length in the argument parser for improved performance. - Renamed the `--auto_tune` argument to `--autotune` for consistency. - Modified the kernel invocation logic to support autotuning based on the new configurations.

coderabbitai · 2025-09-22T19:53:02Z

Caution

Review failed

The pull request is closed.

Walkthrough

Introduced a new autotuning configuration generator and applied tilelang.autotune to the flashmla_decode entrypoint. Updated CLI defaults and renamed an autotune flag. Simplified the benchmark’s autotune path. In AutoTuner, added function-parameter awareness, updated parameter setting, and adjusted cache key and compilation logic to consider function parameters.

Changes

Cohort / File(s)	Summary of Changes
AMD MLA decode benchmark updates `examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py`	Added get_configs() producing autotune grids; decorated flashmla_decode with @tilelang.autotune(configs=get_configs()). CLI changes: default batch=128, kv_ctx=8192, flag renamed to --autotune. Conditional invocation: minimal args when autotune enabled, explicit BLOCK_N/BLOCK_H/num_split/threads otherwise. Removed legacy manual autotuner scaffolding.
AutoTuner function-parameter integration `tilelang/autotuner/tuner.py`	Added _function_parameters state. set_kernel_parameters now accepts k_parameters and f_parameters. generate_cache_key and tunable-argument checks now consult function parameters to validate and skip redundant compilation. Autotuning setup passes inspect.signature(fn).parameters into set_kernel_parameters.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (benchmark_mla_decode_amd_tilelang.py)
  participant Kernel as flashmla_decode
  participant Tuner as tilelang.autotune

  User->>CLI: Run with --autotune [on/off]
  alt Autotune enabled
    CLI->>Kernel: Call with minimal args
    Kernel->>Tuner: Autotune entry (configs=get_configs())
    Tuner->>Tuner: Evaluate configs, select best
    Tuner-->>CLI: Return tuned kernel result
  else Autotune disabled
    CLI->>Kernel: Call with explicit BLOCK_N/BLOCK_H/num_split/threads
    Kernel-->>CLI: Execute with provided params
  end

sequenceDiagram
  autonumber
  participant Decorator as @autotune
  participant Tuner as AutoTuner
  participant Fn as Kernel Function

  Decorator->>Fn: Wrap kernel function
  Decorator->>Tuner: set_kernel_parameters(k_parameters, f_parameters=inspect.signature(Fn).parameters)
  note right of Tuner: Store _kernel_parameters and _function_parameters
  Fn->>Tuner: invoke with args/kwargs
  Tuner->>Tuner: generate_cache_key(args, kwargs, function params)
  alt Tunable args already provided
    Tuner-->>Fn: Skip compilation, use provided params
  else Need tuning/compile
    Tuner->>Tuner: Compile/evaluate configs
  end
  Tuner-->>Fn: Selected variant executes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I tuned my ears to threads that hum,
Block_Ns hop—bun to crumb.
Functions whisper param tales,
Cache keys dance on windy trails.
With sleek new flags, I thump in time—
Autotune sings; performance climbs.
— A happy rabbit, in compile-time rhymes 🐇✨

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a51c4 and 6b78083.

📒 Files selected for processing (2)

examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py (2 hunks)
tilelang/autotuner/tuner.py (4 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-22T19:53:05Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

gemini-code-assist · 2025-09-22T19:53:10Z

Summary of Changes

Hello @LeiWang1999, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the autotuning experience for Multi-Layer Attention (MLA) decode benchmarks on AMD platforms using TileLang. It streamlines the autotuning setup by adopting a decorator-based approach and enhances the underlying AutoTuner to intelligently handle and validate tunable parameters, preventing unnecessary recompilation. The changes also update default benchmark parameters to better align with current testing requirements.

Highlights

Refactored Autotuning Mechanism: The autotuning configuration for the flashmla_decode kernel has been moved from an imperative block within the main function to a declarative @tilelang.autotune decorator. This integrates the tuning process more tightly with the JIT compilation, improving code readability and maintainability.
Enhanced Autotuner Parameter Handling: The AutoTuner class now stores the full function signature parameters, allowing it to more robustly detect when tunable arguments are already provided, whether as keyword or positional arguments. This prevents redundant compilation and ensures correct behavior when parameters are pre-specified.
Updated Benchmark Defaults: Default values for the benchmark script's batch size and kv_ctx (KV context length) have been increased from 1 to 128 and 1024 to 8192, respectively. This adjusts the benchmark to potentially larger or more realistic workload scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the autotuning logic in the MLA benchmark for ROCm to use the @tilelang.autotune decorator, which is a great improvement for clarity and simplicity. The changes also include a fix in the core autotuner to correctly detect when tunable parameters are provided as positional arguments, making the autotuner more robust. My feedback includes a few suggestions to improve code style and maintainability.

gemini-code-assist · 2025-09-22T19:54:47Z

examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py



+def get_configs():
+    import itertools


For better code organization and to follow PEP 8 guidelines, it's recommended to place all imports at the top of the file. Please move this import to the top-level of the module.

gemini-code-assist · 2025-09-22T19:54:47Z

examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py

+    return [{
+        "block_N": c[0],
+        "block_H": c[1],
+        "num_split": c[2],
+        "threads": c[3],
+    } for c in _configs]


The creation of configuration dictionaries can be made more concise and maintainable. Using zip with explicit keys avoids relying on hardcoded indices like c[0], c[1], etc., which is less error-prone if the order of parameters in itertools.product changes.

return [dict(zip(("block_N", "block_H", "num_split", "threads"), c)) for c in _configs]

gemini-code-assist · 2025-09-22T19:54:47Z

tilelang/autotuner/tuner.py

+    def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]):
        # for cache key generation
-        self._kernel_parameters = parameters
+        self._kernel_parameters = k_parameters
+        self._function_parameters = f_parameters


The parameter names k_parameters and f_parameters are a bit cryptic. Using more descriptive names like kernel_params_key and function_parameters would improve readability and make the code easier to understand and maintain.

Suggested change

def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]):

# for cache key generation

self._kernel_parameters = parameters

self._kernel_parameters = k_parameters

self._function_parameters = f_parameters

def set_kernel_parameters(self, kernel_params_key: Tuple[str, ...], function_parameters: Dict[str, Any]):

# for cache key generation

self._kernel_parameters = kernel_params_key

self._function_parameters = function_parameters

LeiWang1999 added 5 commits September 22, 2025 09:37

Refactor matmul example to include ReLU activation and update batch s…

6932d2e

…ize in benchmark script

lint fix

3e15964

Merge branch 'main' of https://github.com/tile-ai/tilelang into amd_0922

eee70de

lint fix

6b78083

LeiWang1999 merged commit 3b21a67 into tile-ai:main Sep 22, 2025
4 of 5 checks passed

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Oct 10, 2025

[Lint] Enable pyupgrade linter in ruff #963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD][MLA] Fix mla autotune for rocm #861

[AMD][MLA] Fix mla autotune for rocm #861

Uh oh!

LeiWang1999 commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Review failed

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[AMD][MLA] Fix mla autotune for rocm #861

[AMD][MLA] Fix mla autotune for rocm #861

Uh oh!

Conversation

LeiWang1999 commented Sep 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading