Skip to content

Conversation

LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Sep 22, 2025

Summary by CodeRabbit

  • New Features
    • Added an --autotune flag for automatic configuration selection during decoding.
    • Smarter autotuning avoids unnecessary recompilation when parameters are provided, improving performance.
  • Changes
    • CLI updates: renamed --auto_tune to --autotune.
    • Default values updated: --batch now 128 (was 1); --kv_ctx now 8192 (was 1024).
  • Refactor
    • Streamlined autotuning logic and removed legacy tuning paths for a cleaner, more reliable experience.

…nt defaults

- Introduced a new `get_configs` function to generate autotuning configurations for the benchmark.
- Updated the default batch size and kv context length in the argument parser for improved performance.
- Renamed the `--auto_tune` argument to `--autotune` for consistency.
- Modified the kernel invocation logic to support autotuning based on the new configurations.
Copy link
Contributor

coderabbitai bot commented Sep 22, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Introduced a new autotuning configuration generator and applied tilelang.autotune to the flashmla_decode entrypoint. Updated CLI defaults and renamed an autotune flag. Simplified the benchmark’s autotune path. In AutoTuner, added function-parameter awareness, updated parameter setting, and adjusted cache key and compilation logic to consider function parameters.

Changes

Cohort / File(s) Summary of Changes
AMD MLA decode benchmark updates
examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py
Added get_configs() producing autotune grids; decorated flashmla_decode with @tilelang.autotune(configs=get_configs()). CLI changes: default batch=128, kv_ctx=8192, flag renamed to --autotune. Conditional invocation: minimal args when autotune enabled, explicit BLOCK_N/BLOCK_H/num_split/threads otherwise. Removed legacy manual autotuner scaffolding.
AutoTuner function-parameter integration
tilelang/autotuner/tuner.py
Added _function_parameters state. set_kernel_parameters now accepts k_parameters and f_parameters. generate_cache_key and tunable-argument checks now consult function parameters to validate and skip redundant compilation. Autotuning setup passes inspect.signature(fn).parameters into set_kernel_parameters.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (benchmark_mla_decode_amd_tilelang.py)
  participant Kernel as flashmla_decode
  participant Tuner as tilelang.autotune

  User->>CLI: Run with --autotune [on/off]
  alt Autotune enabled
    CLI->>Kernel: Call with minimal args
    Kernel->>Tuner: Autotune entry (configs=get_configs())
    Tuner->>Tuner: Evaluate configs, select best
    Tuner-->>CLI: Return tuned kernel result
  else Autotune disabled
    CLI->>Kernel: Call with explicit BLOCK_N/BLOCK_H/num_split/threads
    Kernel-->>CLI: Execute with provided params
  end
Loading
sequenceDiagram
  autonumber
  participant Decorator as @autotune
  participant Tuner as AutoTuner
  participant Fn as Kernel Function

  Decorator->>Fn: Wrap kernel function
  Decorator->>Tuner: set_kernel_parameters(k_parameters, f_parameters=inspect.signature(Fn).parameters)
  note right of Tuner: Store _kernel_parameters and _function_parameters
  Fn->>Tuner: invoke with args/kwargs
  Tuner->>Tuner: generate_cache_key(args, kwargs, function params)
  alt Tunable args already provided
    Tuner-->>Fn: Skip compilation, use provided params
  else Need tuning/compile
    Tuner->>Tuner: Compile/evaluate configs
  end
  Tuner-->>Fn: Selected variant executes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I tuned my ears to threads that hum,
Block_Ns hop—bun to crumb.
Functions whisper param tales,
Cache keys dance on windy trails.
With sleek new flags, I thump in time—
Autotune sings; performance climbs.
— A happy rabbit, in compile-time rhymes 🐇✨

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a51c4 and 6b78083.

📒 Files selected for processing (2)
  • examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py (2 hunks)
  • tilelang/autotuner/tuner.py (4 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Contributor

Summary of Changes

Hello @LeiWang1999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the autotuning experience for Multi-Layer Attention (MLA) decode benchmarks on AMD platforms using TileLang. It streamlines the autotuning setup by adopting a decorator-based approach and enhances the underlying AutoTuner to intelligently handle and validate tunable parameters, preventing unnecessary recompilation. The changes also update default benchmark parameters to better align with current testing requirements.

Highlights

  • Refactored Autotuning Mechanism: The autotuning configuration for the flashmla_decode kernel has been moved from an imperative block within the main function to a declarative @tilelang.autotune decorator. This integrates the tuning process more tightly with the JIT compilation, improving code readability and maintainability.
  • Enhanced Autotuner Parameter Handling: The AutoTuner class now stores the full function signature parameters, allowing it to more robustly detect when tunable arguments are already provided, whether as keyword or positional arguments. This prevents redundant compilation and ensures correct behavior when parameters are pre-specified.
  • Updated Benchmark Defaults: Default values for the benchmark script's batch size and kv_ctx (KV context length) have been increased from 1 to 128 and 1024 to 8192, respectively. This adjusts the benchmark to potentially larger or more realistic workload scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@LeiWang1999 LeiWang1999 merged commit 3b21a67 into tile-ai:main Sep 22, 2025
4 of 5 checks passed
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the autotuning logic in the MLA benchmark for ROCm to use the @tilelang.autotune decorator, which is a great improvement for clarity and simplicity. The changes also include a fix in the core autotuner to correctly detect when tunable parameters are provided as positional arguments, making the autotuner more robust. My feedback includes a few suggestions to improve code style and maintainability.



def get_configs():
import itertools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better code organization and to follow PEP 8 guidelines, it's recommended to place all imports at the top of the file. Please move this import to the top-level of the module.

Comment on lines +20 to +25
return [{
"block_N": c[0],
"block_H": c[1],
"num_split": c[2],
"threads": c[3],
} for c in _configs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The creation of configuration dictionaries can be made more concise and maintainable. Using zip with explicit keys avoids relying on hardcoded indices like c[0], c[1], etc., which is less error-prone if the order of parameters in itertools.product changes.

    return [dict(zip(("block_N", "block_H", "num_split", "threads"), c)) for c in _configs]

Comment on lines +226 to +229
def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]):
# for cache key generation
self._kernel_parameters = parameters
self._kernel_parameters = k_parameters
self._function_parameters = f_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter names k_parameters and f_parameters are a bit cryptic. Using more descriptive names like kernel_params_key and function_parameters would improve readability and make the code easier to understand and maintain.

Suggested change
def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]):
# for cache key generation
self._kernel_parameters = parameters
self._kernel_parameters = k_parameters
self._function_parameters = f_parameters
def set_kernel_parameters(self, kernel_params_key: Tuple[str, ...], function_parameters: Dict[str, Any]):
# for cache key generation
self._kernel_parameters = kernel_params_key
self._function_parameters = function_parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant