-
Notifications
You must be signed in to change notification settings - Fork 261
[AMD][MLA] Fix mla autotune for rocm #861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ize in benchmark script
…nt defaults - Introduced a new `get_configs` function to generate autotuning configurations for the benchmark. - Updated the default batch size and kv context length in the argument parser for improved performance. - Renamed the `--auto_tune` argument to `--autotune` for consistency. - Modified the kernel invocation logic to support autotuning based on the new configurations.
Caution Review failedThe pull request is closed. WalkthroughIntroduced a new autotuning configuration generator and applied tilelang.autotune to the flashmla_decode entrypoint. Updated CLI defaults and renamed an autotune flag. Simplified the benchmark’s autotune path. In AutoTuner, added function-parameter awareness, updated parameter setting, and adjusted cache key and compilation logic to consider function parameters. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI (benchmark_mla_decode_amd_tilelang.py)
participant Kernel as flashmla_decode
participant Tuner as tilelang.autotune
User->>CLI: Run with --autotune [on/off]
alt Autotune enabled
CLI->>Kernel: Call with minimal args
Kernel->>Tuner: Autotune entry (configs=get_configs())
Tuner->>Tuner: Evaluate configs, select best
Tuner-->>CLI: Return tuned kernel result
else Autotune disabled
CLI->>Kernel: Call with explicit BLOCK_N/BLOCK_H/num_split/threads
Kernel-->>CLI: Execute with provided params
end
sequenceDiagram
autonumber
participant Decorator as @autotune
participant Tuner as AutoTuner
participant Fn as Kernel Function
Decorator->>Fn: Wrap kernel function
Decorator->>Tuner: set_kernel_parameters(k_parameters, f_parameters=inspect.signature(Fn).parameters)
note right of Tuner: Store _kernel_parameters and _function_parameters
Fn->>Tuner: invoke with args/kwargs
Tuner->>Tuner: generate_cache_key(args, kwargs, function params)
alt Tunable args already provided
Tuner-->>Fn: Skip compilation, use provided params
else Need tuning/compile
Tuner->>Tuner: Compile/evaluate configs
end
Tuner-->>Fn: Selected variant executes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
Summary of ChangesHello @LeiWang1999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving the autotuning experience for Multi-Layer Attention (MLA) decode benchmarks on AMD platforms using TileLang. It streamlines the autotuning setup by adopting a decorator-based approach and enhances the underlying Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the autotuning logic in the MLA benchmark for ROCm to use the @tilelang.autotune
decorator, which is a great improvement for clarity and simplicity. The changes also include a fix in the core autotuner to correctly detect when tunable parameters are provided as positional arguments, making the autotuner more robust. My feedback includes a few suggestions to improve code style and maintainability.
|
||
|
||
def get_configs(): | ||
import itertools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [{ | ||
"block_N": c[0], | ||
"block_H": c[1], | ||
"num_split": c[2], | ||
"threads": c[3], | ||
} for c in _configs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of configuration dictionaries can be made more concise and maintainable. Using zip
with explicit keys avoids relying on hardcoded indices like c[0]
, c[1]
, etc., which is less error-prone if the order of parameters in itertools.product
changes.
return [dict(zip(("block_N", "block_H", "num_split", "threads"), c)) for c in _configs]
def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]): | ||
# for cache key generation | ||
self._kernel_parameters = parameters | ||
self._kernel_parameters = k_parameters | ||
self._function_parameters = f_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter names k_parameters
and f_parameters
are a bit cryptic. Using more descriptive names like kernel_params_key
and function_parameters
would improve readability and make the code easier to understand and maintain.
def set_kernel_parameters(self, k_parameters: Tuple[str, ...], f_parameters: Dict[str, Any]): | |
# for cache key generation | |
self._kernel_parameters = parameters | |
self._kernel_parameters = k_parameters | |
self._function_parameters = f_parameters | |
def set_kernel_parameters(self, kernel_params_key: Tuple[str, ...], function_parameters: Dict[str, Any]): | |
# for cache key generation | |
self._kernel_parameters = kernel_params_key | |
self._function_parameters = function_parameters |
Summary by CodeRabbit