-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[None][fix] Refactoring input prep to allow out-of-tree models #6497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][fix] Refactoring input prep to allow out-of-tree models #6497
Conversation
📝 WalkthroughWalkthroughThis change introduces a centralized multimodal placeholder registry system, migrating all model-specific placeholder logic from static mappings to a dynamic registry. It updates model input processor registrations to include explicit placeholder metadata, enhances CLI and data utilities for multimodal flexibility, and enables dynamic importing of custom modules in both quickstart and benchmarking scripts. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI_Script as CLI Script (quickstart/throughput)
participant ModuleLoader as Custom Module Loader
participant Registry as MultimodalPlaceholderRegistry
participant Model as Model Input Processor
User->>CLI_Script: Run with --custom_module_dirs, --model_type, etc.
CLI_Script->>ModuleLoader: Import custom modules from dirs
ModuleLoader-->>CLI_Script: Custom modules registered
CLI_Script->>Registry: Query supported model types
CLI_Script->>Model: Instantiate input processor with placeholder_metadata
Model->>Registry: Register placeholder metadata for model_type
CLI_Script->>Registry: Retrieve placeholder/placement for model_type & modality
Registry-->>CLI_Script: Return placeholder metadata
CLI_Script->>Model: Process input using placeholders
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tensorrt_llm/bench/benchmark/throughput.py (1)
111-116
: Consider renaming the flag for clarity.The double-negative naming (
no_skip_tokenizer_init
) could be confusing. Consider using a positive flag name like--init_tokenizer
or--enable_tokenizer_init
with appropriate default value.tensorrt_llm/inputs/registry.py (1)
142-168
: Consider refactoring repeated getter patterns.The getter methods for different modalities follow the same pattern. Consider a helper method to reduce duplication.
+def _get_model_types_by_modality(self, modality: str) -> List[str]: + return [ + model_type + for model_type in self._multimodal_placeholder_by_model_type + if modality in self._multimodal_placeholder_by_model_type[model_type].placeholder_map + ] + def get_registered_image_model_types(self) -> List[str]: - return [ - model_type - for model_type in self._multimodal_placeholder_by_model_type - if "image" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map - ] + return self._get_model_types_by_modality("image") def get_registered_video_model_types(self) -> List[str]: - return [ - model_type - for model_type in self._multimodal_placeholder_by_model_type - if "video" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map - ] + return self._get_model_types_by_modality("video") def get_registered_audio_model_types(self) -> List[str]: - return [ - model_type - for model_type in self._multimodal_placeholder_by_model_type - if "audio" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map - ] + return self._get_model_types_by_modality("audio")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
examples/llm-api/quickstart_multimodal.py
(5 hunks)tensorrt_llm/__init__.py
(2 hunks)tensorrt_llm/_torch/models/modeling_gemma3vl.py
(2 hunks)tensorrt_llm/_torch/models/modeling_hyperclovax.py
(2 hunks)tensorrt_llm/_torch/models/modeling_llama.py
(2 hunks)tensorrt_llm/_torch/models/modeling_llava_next.py
(2 hunks)tensorrt_llm/_torch/models/modeling_mistral.py
(2 hunks)tensorrt_llm/_torch/models/modeling_phi4mm.py
(2 hunks)tensorrt_llm/_torch/models/modeling_qwen2vl.py
(3 hunks)tensorrt_llm/_torch/models/modeling_vila.py
(2 hunks)tensorrt_llm/bench/benchmark/throughput.py
(9 hunks)tensorrt_llm/bench/utils/data.py
(2 hunks)tensorrt_llm/inputs/__init__.py
(2 hunks)tensorrt_llm/inputs/registry.py
(4 hunks)tensorrt_llm/inputs/utils.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/__init__.py
tensorrt_llm/_torch/models/modeling_llava_next.py
tensorrt_llm/_torch/models/modeling_llama.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/_torch/models/modeling_vila.py
tensorrt_llm/_torch/models/modeling_hyperclovax.py
tensorrt_llm/_torch/models/modeling_phi4mm.py
examples/llm-api/quickstart_multimodal.py
tensorrt_llm/bench/utils/data.py
tensorrt_llm/_torch/models/modeling_gemma3vl.py
tensorrt_llm/_torch/models/modeling_qwen2vl.py
tensorrt_llm/inputs/utils.py
tensorrt_llm/_torch/models/modeling_mistral.py
tensorrt_llm/inputs/registry.py
tensorrt_llm/bench/benchmark/throughput.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/__init__.py
tensorrt_llm/_torch/models/modeling_llava_next.py
tensorrt_llm/_torch/models/modeling_llama.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/_torch/models/modeling_vila.py
tensorrt_llm/_torch/models/modeling_hyperclovax.py
tensorrt_llm/_torch/models/modeling_phi4mm.py
examples/llm-api/quickstart_multimodal.py
tensorrt_llm/bench/utils/data.py
tensorrt_llm/_torch/models/modeling_gemma3vl.py
tensorrt_llm/_torch/models/modeling_qwen2vl.py
tensorrt_llm/inputs/utils.py
tensorrt_llm/_torch/models/modeling_mistral.py
tensorrt_llm/inputs/registry.py
tensorrt_llm/bench/benchmark/throughput.py
🧠 Learnings (12)
tensorrt_llm/__init__.py (1)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
tensorrt_llm/_torch/models/modeling_llava_next.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/_torch/models/modeling_llama.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/inputs/__init__.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
examples/llm-api/quickstart_multimodal.py (2)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/_torch/models/modeling_gemma3vl.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/inputs/utils.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/_torch/models/modeling_mistral.py (2)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
tensorrt_llm/inputs/registry.py (1)
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
tensorrt_llm/bench/benchmark/throughput.py (1)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
🪛 Ruff (0.12.2)
tensorrt_llm/__init__.py
66-66: Module level import not at top of file
(E402)
tensorrt_llm/_torch/models/modeling_hyperclovax.py
966-966: Line too long (132 > 120)
(E501)
970-970: Line too long (146 > 120)
(E501)
971-971: Line too long (160 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (40)
tensorrt_llm/__init__.py (2)
65-66
: LGTM! Lazy import pattern correctly implemented.The lazy import with
isort:skip
directive is properly used to avoid circular dependency issues withlora_manager
. The placement after other imports is intentional and well-documented.
88-88
: LGTM! Public API exposure is consistent.Adding
torch_models
to__all__
properly exposes the lazily imported module in the package's public API, enabling external access to the updated model definitions and multimodal placeholder capabilities.tensorrt_llm/inputs/__init__.py (2)
4-6
: LGTM! Registry imports properly added.The new imports
MultimodalPlaceholderMetadata
andMultimodalPlaceholderPlacement
from the registry module are correctly integrated, supporting the transition from static placeholder mappings to the centralized registry system.
24-25
: LGTM! Public API updated consistently.Adding the new registry classes to
__all__
properly exposes them for use across the codebase, enabling model input processors to register with explicit placeholder metadata.tensorrt_llm/_torch/models/modeling_gemma3vl.py (2)
16-16
: LGTM! Required import added for registry system.The
MultimodalPlaceholderMetadata
import is correctly added to support the new placeholder metadata registration pattern.
86-89
: LGTM! Placeholder metadata properly configured.The
@register_input_processor
decorator is correctly updated with explicitplaceholder_metadata
usingMultimodalPlaceholderMetadata
. The placeholder mapping{"image": "<start_of_image>"}
correctly specifies the image placeholder token for the Gemma3VL model.tensorrt_llm/_torch/models/modeling_llama.py (2)
23-23
: LGTM! Required import added for registry system.The
MultimodalPlaceholderMetadata
import is correctly added to support the new placeholder metadata registration pattern.
1168-1171
: LGTM! Placeholder metadata correctly configured for Llama4.The
@register_input_processor
decorator is properly updated with explicitplaceholder_metadata
usingMultimodalPlaceholderMetadata
. The placeholder mapping{"image": "<|image|>"}
correctly specifies the Llama4-specific image placeholder token, which appropriately differs from other models' placeholder conventions.tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)
18-21
: LGTM!The import additions for
MultimodalPlaceholderMetadata
andMultimodalPlaceholderPlacement
are correctly structured and align with the centralized multimodal placeholder registry system being introduced across the codebase.tensorrt_llm/_torch/models/modeling_vila.py (2)
38-40
: LGTM!The import addition for
MultimodalPlaceholderMetadata
is correctly integrated into the existing import statement and maintains proper namespace.
1122-1128
: LGTM!The decorator modification correctly implements placeholder metadata registration with proper formatting. The placeholder tokens align with the
MEDIA_TOKENS
constants defined earlier in the file (lines 56-59), ensuring consistency.tensorrt_llm/_torch/models/modeling_llava_next.py (2)
14-16
: LGTM!The import addition for
MultimodalPlaceholderMetadata
is correctly integrated and maintains proper namespace according to the coding guidelines.
266-269
: LGTM!The decorator modification correctly implements placeholder metadata registration with clean formatting. The simple "image": "
" mapping is appropriate for the LlavaNext model's image-focused functionality.
tensorrt_llm/_torch/models/modeling_qwen2vl.py (3)
14-16
: LGTM! Import statement correctly updated.The addition of
MultimodalPlaceholderMetadata
import aligns with the centralized multimodal placeholder registry system and follows proper namespace conventions.
626-633
: LGTM! Placeholder metadata registration is correctly implemented.The
@register_input_processor
decorator properly defines multimodal placeholder metadata with appropriate tokens for Qwen2VL model. The vision token structure follows the expected format with<|vision_start|>
, modality-specific padding, and<|vision_end|>
markers.
645-652
: LGTM! Consistent placeholder metadata for Qwen2_5_VL.The placeholder metadata registration for
Qwen2_5_VLModel
correctly uses the same token structure asQwen2VLModel
, maintaining consistency between model variants while integrating with the centralized multimodal placeholder registry.tensorrt_llm/bench/utils/data.py (2)
44-45
: LGTM! Well-designed parameter additions.The new
image_data_format
anddata_device
parameters enhance multimodal benchmarking flexibility with sensible defaults. The parameter names are descriptive and the default values ("pt" for tensors, "cuda" for GPU) align with typical TensorRT-LLM usage patterns.
135-137
: LGTM! Parameters correctly passed to multimodal loader.The new parameters are properly passed through to
default_multimodal_input_loader
when multimodal processing is required. The parameter mapping is clear and the conditional usage ensures they're only applied when relevant.tensorrt_llm/_torch/models/modeling_mistral.py (2)
32-34
: LGTM! Import statement properly updated.The addition of
MultimodalPlaceholderMetadata
andMultimodalPlaceholderPlacement
imports correctly supports the enhanced input processor registration while maintaining proper namespace conventions.
274-287
: LGTM! Excellent implementation with thorough documentation.The placeholder metadata registration is well-designed with several strengths:
- Simple and appropriate "[IMG]" placeholder token for Mistral3
- Well-justified
AFTER_TEXT
placement with performance impact data (~30% quality difference)- Comprehensive comments explaining design decisions with external references
- Proper integration with the centralized multimodal placeholder registry
The documentation provides valuable context for future maintainers.
tensorrt_llm/_torch/models/modeling_phi4mm.py (2)
13-16
: LGTM! Import statement consistent with refactoring pattern.The addition of multimodal placeholder imports follows the same correct pattern used across other model files, properly supporting the enhanced input processor registration.
143-153
: LGTM! Comprehensive dual-modality placeholder registration.The placeholder metadata correctly supports both image and audio modalities with:
- Indexed placeholder tokens (
<|image_{0}|>
,<|audio_{0}|>
) suitable for multiple instances- Appropriate
BEFORE_TEXT
placement for Phi4MM model- Empty separator for seamless placeholder concatenation
- Proper integration with the centralized registry system
examples/llm-api/quickstart_multimodal.py (5)
7-9
: LGTM! Improved imports support dynamic model loading.The import changes enhance extensibility by:
- Replacing static model lists with the dynamic
MULTIMODAL_PLACEHOLDER_REGISTRY
- Adding
import_custom_module_from_dir
for out-of-tree model support- Enabling centralized model type management through the registry system
83-87
: LGTM! Dynamic model type choices improve maintainability.The argument parser now uses the registry to determine valid model types, ensuring CLI options automatically stay synchronized with registered models. This eliminates the need to manually update static lists when new models are added.
113-124
: LGTM! Well-designed custom module directory support.The new
--custom_module_dirs
argument effectively enables out-of-tree model loading with:
- Multiple directory support for flexibility
- Clear documentation of expected directory structure
- Backward compatibility through optional nature
- Descriptive naming and comprehensive help text
157-159
: LGTM! Clean and effective dynamic import implementation.The dynamic module import logic is well-implemented:
- Proper None check for optional functionality
- Simple loop handles multiple directories efficiently
- Early execution ensures custom models are available for validation
- Delegates appropriately to utility function
176-180
: LGTM! Enhanced validation with improved user experience.The model type validation improvements provide:
- Registry-based validation ensuring consistency with registered models
- Enhanced error messages that dynamically list supported model types
- Explicit string conversion for model directory paths (defensive programming)
- Better user guidance when unsupported model types are encountered
Also applies to: 188-188
tensorrt_llm/bench/benchmark/throughput.py (5)
15-15
: LGTM!The import for dynamic module loading is appropriate for supporting custom multimodal models.
52-61
: LGTM!The
--custom_module_dirs
option follows the established CLI pattern and provides flexibility for loading custom model implementations.
317-318
: LGTM!The comment clarification helps distinguish between tokenizer usage for data preparation vs. model initialization.
425-425
: LGTM!The logic correctly inverts the
no_skip_tokenizer_init
flag toskip_tokenizer_init
for the runtime arguments.
329-330
: LGTM!The multimodal data format and device parameters are properly passed to the dataset creation function.
tensorrt_llm/inputs/utils.py (3)
20-21
: LGTM!The imports align with the new registry-based architecture for managing multimodal placeholders.
228-238
: LGTM!The simplified placeholder retrieval logic properly delegates to the centralized registry while maintaining the existing formatting behavior.
306-314
: LGTM!The refactored placeholder placement logic cleanly uses the registry's enum-based placement configuration and separator settings.
tensorrt_llm/inputs/registry.py (5)
1-2
: LGTM!The enum and dataclass provide a clean, type-safe structure for multimodal placeholder configuration with sensible defaults.
Also applies to: 83-94
183-206
: LGTM!The updated decorator properly enforces placeholder metadata registration for multimodal models.
256-283
: LGTM!The extracted
multimodal_hashing_process
function improves code organization and separation of concerns.
288-310
: LGTM!The caching mechanism for
multimodal_hashing_supported
attribute efficiently avoids repeated attempts at multimodal processing for models that don't support it.
311-315
: LGTM!Appropriate error handling that logs with traceback for debugging while re-raising the exception to maintain error propagation.
PR_Github #13584 [ run ] triggered by Bot |
PR_Github #13584 [ run ] completed with state |
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (1)
tensorrt_llm/tools/importlib_utils.py (1)
58-76
: Add docstring and remove unreachable return statement.The function is missing a required docstring and has unreachable code at line 75.
+def import_custom_module(custom_module_path: Union[str, Path]): + """Import a custom module from either a file or directory. + + Args: + custom_module_path: Path to the module file or package directory. + + Returns: + The imported module or package object. + + Raises: + FileNotFoundError: If the path does not exist or is neither a file nor directory. + ImportError: If the module/package cannot be imported. + """ def import_custom_module(custom_module_path: Union[str, Path]): if isinstance(custom_module_path, str): custom_module_path = Path(custom_module_path) print(f"Importing custom module from: {custom_module_path}") if custom_module_path.exists(): if custom_module_path.is_file(): return import_custom_module_from_file(custom_module_path) elif custom_module_path.is_dir(): return import_custom_module_from_dir(custom_module_path) else: raise FileNotFoundError( f"Custom module path {custom_module_path} is neither a file nor a directory." ) else: raise FileNotFoundError( f"Custom module path {custom_module_path} does not exist.") - return None
🧹 Nitpick comments (3)
tensorrt_llm/tools/importlib_utils.py (1)
1-5
: Minor formatting nitpick: Excessive blank lines.Consider removing one of the consecutive empty lines (lines 4-5) to follow standard Python formatting conventions.
tensorrt_llm/inputs/registry.py (2)
292-297
: Simplify attribute access.Using
__getattribute__
is unnecessarily complex when direct attribute access would work and be more readable.- if hasattr(input_processor, "multimodal_hashing_supported"): - if input_processor.__getattribute__( - "multimodal_hashing_supported"): + if hasattr(input_processor, "multimodal_hashing_supported"): + if input_processor.multimodal_hashing_supported: return multimodal_hashing_process(inputs, sampling_params) else: return input_processor(inputs, sampling_params)
299-313
: Consider reducing verbosity of fallback logging.The traceback printing in the warning case might be too verbose for normal operation. Consider using
logger.debug
for the traceback or only printing it in debug mode.except Exception as e: - import traceback - traceback.print_exc() + logger.debug(f"Multimodal hashing traceback: {e}", exc_info=True) logger.warning( f"Multimodal hashing failed: {e}. Falling back to basic input processor." )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
examples/llm-api/quickstart_multimodal.py
(5 hunks)tensorrt_llm/_torch/models/modeling_hyperclovax.py
(2 hunks)tensorrt_llm/bench/benchmark/throughput.py
(9 hunks)tensorrt_llm/inputs/registry.py
(4 hunks)tensorrt_llm/tools/importlib_utils.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tensorrt_llm/bench/benchmark/throughput.py
- examples/llm-api/quickstart_multimodal.py
- tensorrt_llm/_torch/models/modeling_hyperclovax.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/tools/importlib_utils.py
tensorrt_llm/inputs/registry.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/tools/importlib_utils.py
tensorrt_llm/inputs/registry.py
🧠 Learnings (1)
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Applied to files:
tensorrt_llm/inputs/registry.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tensorrt_llm/inputs/registry.py (7)
1-2
: LGTM!The new imports for
enum
anddataclasses
are appropriate and necessary for the multimodal placeholder registry implementation.
83-94
: Well-designed data structures.The enum and dataclass are cleanly implemented with appropriate defaults and follow Python naming conventions. The use of
field(default_factory=dict)
correctly avoids mutable default arguments.
129-135
: Good improvement on defensive validation.The
get_placeholder
method now properly validates inputs before accessing dictionary keys, which addresses the previous review feedback about preventing KeyError exceptions.
174-174
: LGTM!The global registry instance follows the established pattern and naming conventions.
260-287
: Well-structured refactoring of multimodal hashing logic.Extracting the multimodal hashing logic into a nested function improves code organization and readability. The logic correctly handles multimodal data processing and validation.
288-320
: Excellent caching mechanism for multimodal hashing support.The dynamic detection and caching of multimodal hashing support is well-designed. It attempts hashing once, caches the result, and provides appropriate fallback behavior with good error logging.
154-160
: Fix syntax error caused by line break.The line break in the middle of the attribute access creates a syntax error.
def get_registered_video_model_types(self) -> List[str]: return [ model_type for model_type in self._multimodal_placeholder_by_model_type - if "video" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map + if "video" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map ]Likely an incorrect or invalid review comment.
PR_Github #13717 [ run ] triggered by Bot |
PR_Github #13717 [ run ] completed with state |
/bot run |
PR_Github #13734 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the work! Left some comments below.
PR_Github #13734 [ run ] completed with state |
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tensorrt_llm/inputs/registry.py (2)
152-158
: Fix syntax error in line continuation.There's a syntax error caused by an improper line break in the attribute access.
Apply this fix:
def get_registered_image_model_types(self) -> List[str]: return [ model_type for model_type in self._multimodal_placeholder_by_model_type - if "image" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map + if "image" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map ]
168-174
: Fix syntax error in line continuation.There's a syntax error caused by an improper line break in the attribute access.
Apply this fix:
def get_registered_audio_model_types(self) -> List[str]: return [ model_type for model_type in self._multimodal_placeholder_by_model_type - if "audio" in self. - _multimodal_placeholder_by_model_type[model_type].placeholder_map + if "audio" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map ]
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
tensorrt_llm/_torch/models/modeling_gemma3vl.py
(2 hunks)tensorrt_llm/_torch/models/modeling_hyperclovax.py
(2 hunks)tensorrt_llm/_torch/models/modeling_llama.py
(2 hunks)tensorrt_llm/_torch/models/modeling_llava_next.py
(2 hunks)tensorrt_llm/_torch/models/modeling_mistral.py
(2 hunks)tensorrt_llm/_torch/models/modeling_qwen2vl.py
(3 hunks)tensorrt_llm/_torch/models/modeling_vila.py
(2 hunks)tensorrt_llm/bench/utils/data.py
(2 hunks)tensorrt_llm/inputs/__init__.py
(2 hunks)tensorrt_llm/inputs/registry.py
(4 hunks)tensorrt_llm/inputs/utils.py
(4 hunks)tensorrt_llm/tools/importlib_utils.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- tensorrt_llm/inputs/init.py
- tensorrt_llm/_torch/models/modeling_gemma3vl.py
- tensorrt_llm/_torch/models/modeling_llama.py
- tensorrt_llm/bench/utils/data.py
- tensorrt_llm/_torch/models/modeling_llava_next.py
- tensorrt_llm/_torch/models/modeling_mistral.py
- tensorrt_llm/tools/importlib_utils.py
- tensorrt_llm/_torch/models/modeling_vila.py
- tensorrt_llm/_torch/models/modeling_qwen2vl.py
- tensorrt_llm/_torch/models/modeling_hyperclovax.py
- tensorrt_llm/inputs/utils.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/inputs/registry.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/inputs/registry.py
🧠 Learnings (1)
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Applied to files:
tensorrt_llm/inputs/registry.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (13)
tensorrt_llm/inputs/registry.py (13)
83-87
: LGTM! Well-structured enum definition.The
MultimodalPlaceholderPlacement
enum provides clear semantics for placeholder positioning relative to text content.
89-94
: LGTM! Clean dataclass with sensible defaults.The
MultimodalPlaceholderMetadata
dataclass is well-structured with appropriate default values and type hints.
96-118
: LGTM! Registry initialization and string representation are well-implemented.The constructor and
__str__
method provide clean initialization and useful debugging output formatting.
119-122
: LGTM! Defensive validation logic.The
is_valid
method properly checks both model type and modality existence before access.
123-130
: LGTM! Proper exception handling implemented.The defensive validation with
ValueError
exception is correctly implemented, addressing previous review feedback.
131-137
: LGTM! Consistent defensive validation.The method properly uses the
is_valid
helper and raises appropriate exceptions, following the same pattern as other registry methods.
139-144
: LGTM! Defensive validation added.The method now includes proper input validation before dictionary access, preventing KeyError exceptions.
146-150
: LGTM! Consistent validation pattern.The method follows the same defensive validation pattern as other registry methods.
176-180
: LGTM! Clean implementation of registry access methods.The
get_registered_model_types
method and global registry instance provide clean access to the centralized placeholder metadata.
193-216
: LGTM! Improved decorator with proper validation.The updated
register_input_processor
decorator now properly validates the requiredplaceholder_metadata
parameter and registers it with the global registry, replacing assertions with proper exception handling.
269-295
: LGTM! Well-structured multimodal hashing logic.The nested
multimodal_hashing_process
function provides clean separation of multimodal hashing logic with proper error handling and validation.
301-322
: LGTM! Robust fallback mechanism with caching.The enhanced logic properly detects multimodal hashing support, caches the capability, and provides graceful fallback with comprehensive error handling.
324-328
: LGTM! Comprehensive error handling.The outer exception handling provides good debugging information with traceback printing and proper error propagation.
… quickstart, trtllm-bench, etc. Signed-off-by: Rakib Hasan <[email protected]>
57d68bc
to
b23e977
Compare
/bot run |
PR_Github #14521 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once rebased on the circular import changes, LGTM
PR_Github #14521 [ run ] completed with state |
/bot run |
PR_Github #14865 [ run ] triggered by Bot |
PR_Github #14865 [ run ] completed with state |
/bot run |
PR_Github #14898 [ run ] triggered by Bot |
PR_Github #14898 [ run ] completed with state |
/bot run |
PR_Github #14998 [ run ] triggered by Bot |
PR_Github #14998 [ run ] completed with state |
Input processor is now being tracked using a registry along with placeholder metadata.
More explicit placeholder specification from within the individual model definitions.
[Will add some documentation for out-of-tree example once initial review is complete.]
Summary by CodeRabbit
New Features
Improvements
Bug Fixes