Skip to content

Conversation

rakib-hasan
Copy link
Collaborator

@rakib-hasan rakib-hasan commented Jul 31, 2025

Input processor is now being tracked using a registry along with placeholder metadata.
More explicit placeholder specification from within the individual model definitions.

[Will add some documentation for out-of-tree example once initial review is complete.]

Summary by CodeRabbit

  • New Features

    • Added support for dynamic registration and management of multimodal placeholders, enabling flexible handling of image, audio, and video modalities across different model types.
    • Introduced new CLI options for benchmarking and quickstart scripts, including custom module import directories, multimodal data format/device selection, and tokenizer initialization control.
    • Enabled dynamic loading of custom multimodal model modules at runtime.
    • Added utilities for dynamic import of custom Python modules from files or directories.
  • Improvements

    • Centralized multimodal placeholder logic into a registry system, enhancing maintainability and extensibility.
    • Enhanced error handling and metadata management for multimodal input processing.
    • Updated input processor registrations with explicit multimodal placeholder metadata for multiple models.
    • Added lazy import of model-related modules to avoid circular dependencies.
    • Refined multimodal hashing support with caching and robust error handling.
    • Simplified multimodal placeholder retrieval and insertion by delegating to the centralized registry.
  • Bug Fixes

    • Improved validation and error messages for model type selection and multimodal placeholder usage.

@rakib-hasan rakib-hasan requested review from a team as code owners July 31, 2025 00:16
Copy link
Contributor

coderabbitai bot commented Jul 31, 2025

📝 Walkthrough

Walkthrough

This change introduces a centralized multimodal placeholder registry system, migrating all model-specific placeholder logic from static mappings to a dynamic registry. It updates model input processor registrations to include explicit placeholder metadata, enhances CLI and data utilities for multimodal flexibility, and enables dynamic importing of custom modules in both quickstart and benchmarking scripts.

Changes

Cohort / File(s) Change Summary
Multimodal Placeholder Registry & Metadata System
tensorrt_llm/inputs/registry.py
Introduces MultimodalPlaceholderRegistry, MultimodalPlaceholderMetadata, and MultimodalPlaceholderPlacement. Refactors register_input_processor to require and register placeholder metadata. Refactors create_input_processor_with_hash for enhanced multimodal hashing support and error handling.
Centralization of Placeholder Logic
tensorrt_llm/inputs/utils.py
Removes all static multimodal placeholder and placement mappings. Refactors placeholder retrieval and insertion to use the new registry. Removes now-obsolete enum and dictionary.
Input Processor Registration with Placeholder Metadata
tensorrt_llm/_torch/models/modeling_gemma3vl.py, tensorrt_llm/_torch/models/modeling_hyperclovax.py, tensorrt_llm/_torch/models/modeling_llama.py, tensorrt_llm/_torch/models/modeling_llava_next.py, tensorrt_llm/_torch/models/modeling_mistral.py, tensorrt_llm/_torch/models/modeling_phi4mm.py, tensorrt_llm/_torch/models/modeling_qwen2vl.py, tensorrt_llm/_torch/models/modeling_vila.py
Updates all multimodal model input processor registrations to include placeholder_metadata with modality-to-placeholder mappings and placement details.
API and CLI Enhancements for Multimodal and Custom Modules
examples/llm-api/quickstart_multimodal.py, tensorrt_llm/bench/benchmark/throughput.py, tensorrt_llm/bench/utils/data.py
Adds CLI options for dynamic importing of custom modules, multimodal data format/device configuration, and tokenizer initialization control. Updates dataset creation and argument parsing accordingly.
Public API and Export List Updates
tensorrt_llm/inputs/__init__.py, tensorrt_llm/__init__.py
Updates __all__ and imports to expose new registry and metadata classes, removes obsolete exports, and adds lazy import for torch_models.
Dynamic Custom Module Import Utilities
tensorrt_llm/tools/importlib_utils.py
Adds utility functions for importing custom Python modules dynamically from files or directories, with error handling and logging.
Input Processor Multimodal Hashing Enhancements
tensorrt_llm/inputs/registry.py (part of registry changes)
Adds caching and error handling to multimodal hashing support in input processor wrappers.
Throughput Benchmark Enhancements
tensorrt_llm/bench/benchmark/throughput.py
Adds CLI options for custom module imports, tokenizer initialization control, and multimodal image data format/device configuration.
Dataset Creation Enhancements
tensorrt_llm/bench/utils/data.py
Extends dataset creation function to accept multimodal image data format and device parameters, forwarding them to the multimodal input loader.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI_Script as CLI Script (quickstart/throughput)
    participant ModuleLoader as Custom Module Loader
    participant Registry as MultimodalPlaceholderRegistry
    participant Model as Model Input Processor

    User->>CLI_Script: Run with --custom_module_dirs, --model_type, etc.
    CLI_Script->>ModuleLoader: Import custom modules from dirs
    ModuleLoader-->>CLI_Script: Custom modules registered
    CLI_Script->>Registry: Query supported model types
    CLI_Script->>Model: Instantiate input processor with placeholder_metadata
    Model->>Registry: Register placeholder metadata for model_type
    CLI_Script->>Registry: Retrieve placeholder/placement for model_type & modality
    Registry-->>CLI_Script: Return placeholder metadata
    CLI_Script->>Model: Process input using placeholders
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • lfr-0531
  • symphonylyh
  • shaharmor98
  • amitz-nv
  • byshiue

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ef1fd and 90f1b6d.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_llava_next.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rakib-hasan rakib-hasan marked this pull request as draft July 31, 2025 00:17
@rakib-hasan
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tensorrt_llm/bench/benchmark/throughput.py (1)

111-116: Consider renaming the flag for clarity.

The double-negative naming (no_skip_tokenizer_init) could be confusing. Consider using a positive flag name like --init_tokenizer or --enable_tokenizer_init with appropriate default value.

tensorrt_llm/inputs/registry.py (1)

142-168: Consider refactoring repeated getter patterns.

The getter methods for different modalities follow the same pattern. Consider a helper method to reduce duplication.

+def _get_model_types_by_modality(self, modality: str) -> List[str]:
+    return [
+        model_type
+        for model_type in self._multimodal_placeholder_by_model_type
+        if modality in self._multimodal_placeholder_by_model_type[model_type].placeholder_map
+    ]
+
 def get_registered_image_model_types(self) -> List[str]:
-    return [
-        model_type
-        for model_type in self._multimodal_placeholder_by_model_type
-        if "image" in self.
-        _multimodal_placeholder_by_model_type[model_type].placeholder_map
-    ]
+    return self._get_model_types_by_modality("image")

 def get_registered_video_model_types(self) -> List[str]:
-    return [
-        model_type
-        for model_type in self._multimodal_placeholder_by_model_type
-        if "video" in self.
-        _multimodal_placeholder_by_model_type[model_type].placeholder_map
-    ]
+    return self._get_model_types_by_modality("video")

 def get_registered_audio_model_types(self) -> List[str]:
-    return [
-        model_type
-        for model_type in self._multimodal_placeholder_by_model_type
-        if "audio" in self.
-        _multimodal_placeholder_by_model_type[model_type].placeholder_map
-    ]
+    return self._get_model_types_by_modality("audio")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e16d1f and 3c49dcd.

📒 Files selected for processing (15)
  • examples/llm-api/quickstart_multimodal.py (5 hunks)
  • tensorrt_llm/__init__.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_llama.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_llava_next.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_mistral.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_phi4mm.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py (3 hunks)
  • tensorrt_llm/_torch/models/modeling_vila.py (2 hunks)
  • tensorrt_llm/bench/benchmark/throughput.py (9 hunks)
  • tensorrt_llm/bench/utils/data.py (2 hunks)
  • tensorrt_llm/inputs/__init__.py (2 hunks)
  • tensorrt_llm/inputs/registry.py (4 hunks)
  • tensorrt_llm/inputs/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/__init__.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
  • tensorrt_llm/_torch/models/modeling_llama.py
  • tensorrt_llm/inputs/__init__.py
  • tensorrt_llm/_torch/models/modeling_vila.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/_torch/models/modeling_phi4mm.py
  • examples/llm-api/quickstart_multimodal.py
  • tensorrt_llm/bench/utils/data.py
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py
  • tensorrt_llm/inputs/utils.py
  • tensorrt_llm/_torch/models/modeling_mistral.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/bench/benchmark/throughput.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/__init__.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
  • tensorrt_llm/_torch/models/modeling_llama.py
  • tensorrt_llm/inputs/__init__.py
  • tensorrt_llm/_torch/models/modeling_vila.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/_torch/models/modeling_phi4mm.py
  • examples/llm-api/quickstart_multimodal.py
  • tensorrt_llm/bench/utils/data.py
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py
  • tensorrt_llm/inputs/utils.py
  • tensorrt_llm/_torch/models/modeling_mistral.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/bench/benchmark/throughput.py
🧠 Learnings (12)
tensorrt_llm/__init__.py (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

tensorrt_llm/_torch/models/modeling_llava_next.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/_torch/models/modeling_llama.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/inputs/__init__.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

examples/llm-api/quickstart_multimodal.py (2)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/_torch/models/modeling_gemma3vl.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/inputs/utils.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/_torch/models/modeling_mistral.py (2)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

tensorrt_llm/inputs/registry.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

tensorrt_llm/bench/benchmark/throughput.py (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🪛 Ruff (0.12.2)
tensorrt_llm/__init__.py

66-66: Module level import not at top of file

(E402)

tensorrt_llm/_torch/models/modeling_hyperclovax.py

966-966: Line too long (132 > 120)

(E501)


970-970: Line too long (146 > 120)

(E501)


971-971: Line too long (160 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (40)
tensorrt_llm/__init__.py (2)

65-66: LGTM! Lazy import pattern correctly implemented.

The lazy import with isort:skip directive is properly used to avoid circular dependency issues with lora_manager. The placement after other imports is intentional and well-documented.


88-88: LGTM! Public API exposure is consistent.

Adding torch_models to __all__ properly exposes the lazily imported module in the package's public API, enabling external access to the updated model definitions and multimodal placeholder capabilities.

tensorrt_llm/inputs/__init__.py (2)

4-6: LGTM! Registry imports properly added.

The new imports MultimodalPlaceholderMetadata and MultimodalPlaceholderPlacement from the registry module are correctly integrated, supporting the transition from static placeholder mappings to the centralized registry system.


24-25: LGTM! Public API updated consistently.

Adding the new registry classes to __all__ properly exposes them for use across the codebase, enabling model input processors to register with explicit placeholder metadata.

tensorrt_llm/_torch/models/modeling_gemma3vl.py (2)

16-16: LGTM! Required import added for registry system.

The MultimodalPlaceholderMetadata import is correctly added to support the new placeholder metadata registration pattern.


86-89: LGTM! Placeholder metadata properly configured.

The @register_input_processor decorator is correctly updated with explicit placeholder_metadata using MultimodalPlaceholderMetadata. The placeholder mapping {"image": "<start_of_image>"} correctly specifies the image placeholder token for the Gemma3VL model.

tensorrt_llm/_torch/models/modeling_llama.py (2)

23-23: LGTM! Required import added for registry system.

The MultimodalPlaceholderMetadata import is correctly added to support the new placeholder metadata registration pattern.


1168-1171: LGTM! Placeholder metadata correctly configured for Llama4.

The @register_input_processor decorator is properly updated with explicit placeholder_metadata using MultimodalPlaceholderMetadata. The placeholder mapping {"image": "<|image|>"} correctly specifies the Llama4-specific image placeholder token, which appropriately differs from other models' placeholder conventions.

tensorrt_llm/_torch/models/modeling_hyperclovax.py (1)

18-21: LGTM!

The import additions for MultimodalPlaceholderMetadata and MultimodalPlaceholderPlacement are correctly structured and align with the centralized multimodal placeholder registry system being introduced across the codebase.

tensorrt_llm/_torch/models/modeling_vila.py (2)

38-40: LGTM!

The import addition for MultimodalPlaceholderMetadata is correctly integrated into the existing import statement and maintains proper namespace.


1122-1128: LGTM!

The decorator modification correctly implements placeholder metadata registration with proper formatting. The placeholder tokens align with the MEDIA_TOKENS constants defined earlier in the file (lines 56-59), ensuring consistency.

tensorrt_llm/_torch/models/modeling_llava_next.py (2)

14-16: LGTM!

The import addition for MultimodalPlaceholderMetadata is correctly integrated and maintains proper namespace according to the coding guidelines.


266-269: LGTM!

The decorator modification correctly implements placeholder metadata registration with clean formatting. The simple "image": "" mapping is appropriate for the LlavaNext model's image-focused functionality.

tensorrt_llm/_torch/models/modeling_qwen2vl.py (3)

14-16: LGTM! Import statement correctly updated.

The addition of MultimodalPlaceholderMetadata import aligns with the centralized multimodal placeholder registry system and follows proper namespace conventions.


626-633: LGTM! Placeholder metadata registration is correctly implemented.

The @register_input_processor decorator properly defines multimodal placeholder metadata with appropriate tokens for Qwen2VL model. The vision token structure follows the expected format with <|vision_start|>, modality-specific padding, and <|vision_end|> markers.


645-652: LGTM! Consistent placeholder metadata for Qwen2_5_VL.

The placeholder metadata registration for Qwen2_5_VLModel correctly uses the same token structure as Qwen2VLModel, maintaining consistency between model variants while integrating with the centralized multimodal placeholder registry.

tensorrt_llm/bench/utils/data.py (2)

44-45: LGTM! Well-designed parameter additions.

The new image_data_format and data_device parameters enhance multimodal benchmarking flexibility with sensible defaults. The parameter names are descriptive and the default values ("pt" for tensors, "cuda" for GPU) align with typical TensorRT-LLM usage patterns.


135-137: LGTM! Parameters correctly passed to multimodal loader.

The new parameters are properly passed through to default_multimodal_input_loader when multimodal processing is required. The parameter mapping is clear and the conditional usage ensures they're only applied when relevant.

tensorrt_llm/_torch/models/modeling_mistral.py (2)

32-34: LGTM! Import statement properly updated.

The addition of MultimodalPlaceholderMetadata and MultimodalPlaceholderPlacement imports correctly supports the enhanced input processor registration while maintaining proper namespace conventions.


274-287: LGTM! Excellent implementation with thorough documentation.

The placeholder metadata registration is well-designed with several strengths:

  • Simple and appropriate "[IMG]" placeholder token for Mistral3
  • Well-justified AFTER_TEXT placement with performance impact data (~30% quality difference)
  • Comprehensive comments explaining design decisions with external references
  • Proper integration with the centralized multimodal placeholder registry

The documentation provides valuable context for future maintainers.

tensorrt_llm/_torch/models/modeling_phi4mm.py (2)

13-16: LGTM! Import statement consistent with refactoring pattern.

The addition of multimodal placeholder imports follows the same correct pattern used across other model files, properly supporting the enhanced input processor registration.


143-153: LGTM! Comprehensive dual-modality placeholder registration.

The placeholder metadata correctly supports both image and audio modalities with:

  • Indexed placeholder tokens (<|image_{0}|>, <|audio_{0}|>) suitable for multiple instances
  • Appropriate BEFORE_TEXT placement for Phi4MM model
  • Empty separator for seamless placeholder concatenation
  • Proper integration with the centralized registry system
examples/llm-api/quickstart_multimodal.py (5)

7-9: LGTM! Improved imports support dynamic model loading.

The import changes enhance extensibility by:

  • Replacing static model lists with the dynamic MULTIMODAL_PLACEHOLDER_REGISTRY
  • Adding import_custom_module_from_dir for out-of-tree model support
  • Enabling centralized model type management through the registry system

83-87: LGTM! Dynamic model type choices improve maintainability.

The argument parser now uses the registry to determine valid model types, ensuring CLI options automatically stay synchronized with registered models. This eliminates the need to manually update static lists when new models are added.


113-124: LGTM! Well-designed custom module directory support.

The new --custom_module_dirs argument effectively enables out-of-tree model loading with:

  • Multiple directory support for flexibility
  • Clear documentation of expected directory structure
  • Backward compatibility through optional nature
  • Descriptive naming and comprehensive help text

157-159: LGTM! Clean and effective dynamic import implementation.

The dynamic module import logic is well-implemented:

  • Proper None check for optional functionality
  • Simple loop handles multiple directories efficiently
  • Early execution ensures custom models are available for validation
  • Delegates appropriately to utility function

176-180: LGTM! Enhanced validation with improved user experience.

The model type validation improvements provide:

  • Registry-based validation ensuring consistency with registered models
  • Enhanced error messages that dynamically list supported model types
  • Explicit string conversion for model directory paths (defensive programming)
  • Better user guidance when unsupported model types are encountered

Also applies to: 188-188

tensorrt_llm/bench/benchmark/throughput.py (5)

15-15: LGTM!

The import for dynamic module loading is appropriate for supporting custom multimodal models.


52-61: LGTM!

The --custom_module_dirs option follows the established CLI pattern and provides flexibility for loading custom model implementations.


317-318: LGTM!

The comment clarification helps distinguish between tokenizer usage for data preparation vs. model initialization.


425-425: LGTM!

The logic correctly inverts the no_skip_tokenizer_init flag to skip_tokenizer_init for the runtime arguments.


329-330: LGTM!

The multimodal data format and device parameters are properly passed to the dataset creation function.

tensorrt_llm/inputs/utils.py (3)

20-21: LGTM!

The imports align with the new registry-based architecture for managing multimodal placeholders.


228-238: LGTM!

The simplified placeholder retrieval logic properly delegates to the centralized registry while maintaining the existing formatting behavior.


306-314: LGTM!

The refactored placeholder placement logic cleanly uses the registry's enum-based placement configuration and separator settings.

tensorrt_llm/inputs/registry.py (5)

1-2: LGTM!

The enum and dataclass provide a clean, type-safe structure for multimodal placeholder configuration with sensible defaults.

Also applies to: 83-94


183-206: LGTM!

The updated decorator properly enforces placeholder metadata registration for multimodal models.


256-283: LGTM!

The extracted multimodal_hashing_process function improves code organization and separation of concerns.


288-310: LGTM!

The caching mechanism for multimodal_hashing_supported attribute efficiently avoids repeated attempts at multimodal processing for models that don't support it.


311-315: LGTM!

Appropriate error handling that logs with traceback for debugging while re-raising the exception to maintain error propagation.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13584 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10184 completed with status: 'FAILURE'

@rakib-hasan
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (1)
tensorrt_llm/tools/importlib_utils.py (1)

58-76: Add docstring and remove unreachable return statement.

The function is missing a required docstring and has unreachable code at line 75.

+def import_custom_module(custom_module_path: Union[str, Path]):
+    """Import a custom module from either a file or directory.
+    
+    Args:
+        custom_module_path: Path to the module file or package directory.
+        
+    Returns:
+        The imported module or package object.
+        
+    Raises:
+        FileNotFoundError: If the path does not exist or is neither a file nor directory.
+        ImportError: If the module/package cannot be imported.
+    """
 def import_custom_module(custom_module_path: Union[str, Path]):
     if isinstance(custom_module_path, str):
         custom_module_path = Path(custom_module_path)
     print(f"Importing custom module from: {custom_module_path}")

     if custom_module_path.exists():
         if custom_module_path.is_file():
             return import_custom_module_from_file(custom_module_path)
         elif custom_module_path.is_dir():
             return import_custom_module_from_dir(custom_module_path)
         else:
             raise FileNotFoundError(
                 f"Custom module path {custom_module_path} is neither a file nor a directory."
             )
     else:
         raise FileNotFoundError(
             f"Custom module path {custom_module_path} does not exist.")
-    return None
🧹 Nitpick comments (3)
tensorrt_llm/tools/importlib_utils.py (1)

1-5: Minor formatting nitpick: Excessive blank lines.

Consider removing one of the consecutive empty lines (lines 4-5) to follow standard Python formatting conventions.

tensorrt_llm/inputs/registry.py (2)

292-297: Simplify attribute access.

Using __getattribute__ is unnecessarily complex when direct attribute access would work and be more readable.

-            if hasattr(input_processor, "multimodal_hashing_supported"):
-                if input_processor.__getattribute__(
-                        "multimodal_hashing_supported"):
+            if hasattr(input_processor, "multimodal_hashing_supported"):
+                if input_processor.multimodal_hashing_supported:
                     return multimodal_hashing_process(inputs, sampling_params)
                 else:
                     return input_processor(inputs, sampling_params)

299-313: Consider reducing verbosity of fallback logging.

The traceback printing in the warning case might be too verbose for normal operation. Consider using logger.debug for the traceback or only printing it in debug mode.

                 except Exception as e:
-                    import traceback
-                    traceback.print_exc()
+                    logger.debug(f"Multimodal hashing traceback: {e}", exc_info=True)
                     logger.warning(
                         f"Multimodal hashing failed: {e}. Falling back to basic input processor."
                     )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c49dcd and 1300867.

📒 Files selected for processing (5)
  • examples/llm-api/quickstart_multimodal.py (5 hunks)
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py (2 hunks)
  • tensorrt_llm/bench/benchmark/throughput.py (9 hunks)
  • tensorrt_llm/inputs/registry.py (4 hunks)
  • tensorrt_llm/tools/importlib_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tensorrt_llm/bench/benchmark/throughput.py
  • examples/llm-api/quickstart_multimodal.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/tools/importlib_utils.py
  • tensorrt_llm/inputs/registry.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/tools/importlib_utils.py
  • tensorrt_llm/inputs/registry.py
🧠 Learnings (1)
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • tensorrt_llm/inputs/registry.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tensorrt_llm/inputs/registry.py (7)

1-2: LGTM!

The new imports for enum and dataclasses are appropriate and necessary for the multimodal placeholder registry implementation.


83-94: Well-designed data structures.

The enum and dataclass are cleanly implemented with appropriate defaults and follow Python naming conventions. The use of field(default_factory=dict) correctly avoids mutable default arguments.


129-135: Good improvement on defensive validation.

The get_placeholder method now properly validates inputs before accessing dictionary keys, which addresses the previous review feedback about preventing KeyError exceptions.


174-174: LGTM!

The global registry instance follows the established pattern and naming conventions.


260-287: Well-structured refactoring of multimodal hashing logic.

Extracting the multimodal hashing logic into a nested function improves code organization and readability. The logic correctly handles multimodal data processing and validation.


288-320: Excellent caching mechanism for multimodal hashing support.

The dynamic detection and caching of multimodal hashing support is well-designed. It attempts hashing once, caches the result, and provides appropriate fallback behavior with good error logging.


154-160: Fix syntax error caused by line break.

The line break in the middle of the attribute access creates a syntax error.

     def get_registered_video_model_types(self) -> List[str]:
         return [
             model_type
             for model_type in self._multimodal_placeholder_by_model_type
-            if "video" in self.
-            _multimodal_placeholder_by_model_type[model_type].placeholder_map
+            if "video" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map
         ]

Likely an incorrect or invalid review comment.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13717 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13717 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10306 completed with status: 'FAILURE'

@rakib-hasan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13734 [ run ] triggered by Bot

Copy link
Collaborator

@yechank-nvidia yechank-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the work! Left some comments below.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13734 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10322 completed with status: 'FAILURE'

@rakib-hasan rakib-hasan marked this pull request as ready for review August 1, 2025 18:24
@rakib-hasan
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
tensorrt_llm/inputs/registry.py (2)

152-158: Fix syntax error in line continuation.

There's a syntax error caused by an improper line break in the attribute access.

Apply this fix:

     def get_registered_image_model_types(self) -> List[str]:
         return [
             model_type
             for model_type in self._multimodal_placeholder_by_model_type
-            if "image" in self.
-            _multimodal_placeholder_by_model_type[model_type].placeholder_map
+            if "image" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map
         ]

168-174: Fix syntax error in line continuation.

There's a syntax error caused by an improper line break in the attribute access.

Apply this fix:

     def get_registered_audio_model_types(self) -> List[str]:
         return [
             model_type
             for model_type in self._multimodal_placeholder_by_model_type
-            if "audio" in self.
-            _multimodal_placeholder_by_model_type[model_type].placeholder_map
+            if "audio" in self._multimodal_placeholder_by_model_type[model_type].placeholder_map
         ]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8039de2 and fb23e9b.

📒 Files selected for processing (12)
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_llama.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_llava_next.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_mistral.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py (3 hunks)
  • tensorrt_llm/_torch/models/modeling_vila.py (2 hunks)
  • tensorrt_llm/bench/utils/data.py (2 hunks)
  • tensorrt_llm/inputs/__init__.py (2 hunks)
  • tensorrt_llm/inputs/registry.py (4 hunks)
  • tensorrt_llm/inputs/utils.py (4 hunks)
  • tensorrt_llm/tools/importlib_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • tensorrt_llm/inputs/init.py
  • tensorrt_llm/_torch/models/modeling_gemma3vl.py
  • tensorrt_llm/_torch/models/modeling_llama.py
  • tensorrt_llm/bench/utils/data.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
  • tensorrt_llm/_torch/models/modeling_mistral.py
  • tensorrt_llm/tools/importlib_utils.py
  • tensorrt_llm/_torch/models/modeling_vila.py
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/inputs/utils.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/inputs/registry.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/inputs/registry.py
🧠 Learnings (1)
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • tensorrt_llm/inputs/registry.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (13)
tensorrt_llm/inputs/registry.py (13)

83-87: LGTM! Well-structured enum definition.

The MultimodalPlaceholderPlacement enum provides clear semantics for placeholder positioning relative to text content.


89-94: LGTM! Clean dataclass with sensible defaults.

The MultimodalPlaceholderMetadata dataclass is well-structured with appropriate default values and type hints.


96-118: LGTM! Registry initialization and string representation are well-implemented.

The constructor and __str__ method provide clean initialization and useful debugging output formatting.


119-122: LGTM! Defensive validation logic.

The is_valid method properly checks both model type and modality existence before access.


123-130: LGTM! Proper exception handling implemented.

The defensive validation with ValueError exception is correctly implemented, addressing previous review feedback.


131-137: LGTM! Consistent defensive validation.

The method properly uses the is_valid helper and raises appropriate exceptions, following the same pattern as other registry methods.


139-144: LGTM! Defensive validation added.

The method now includes proper input validation before dictionary access, preventing KeyError exceptions.


146-150: LGTM! Consistent validation pattern.

The method follows the same defensive validation pattern as other registry methods.


176-180: LGTM! Clean implementation of registry access methods.

The get_registered_model_types method and global registry instance provide clean access to the centralized placeholder metadata.


193-216: LGTM! Improved decorator with proper validation.

The updated register_input_processor decorator now properly validates the required placeholder_metadata parameter and registers it with the global registry, replacing assertions with proper exception handling.


269-295: LGTM! Well-structured multimodal hashing logic.

The nested multimodal_hashing_process function provides clean separation of multimodal hashing logic with proper error handling and validation.


301-322: LGTM! Robust fallback mechanism with caching.

The enhanced logic properly detects multimodal hashing support, caches the capability, and provides graceful fallback with comprehensive error handling.


324-328: LGTM! Comprehensive error handling.

The outer exception handling provides good debugging information with traceback printing and proper error propagation.

… quickstart, trtllm-bench, etc.

Signed-off-by: Rakib Hasan <[email protected]>
@rakib-hasan rakib-hasan force-pushed the fix/input_prep_for_OOT_models branch from 57d68bc to b23e977 Compare August 7, 2025 17:33
@rakib-hasan
Copy link
Collaborator Author

/bot run

@rakib-hasan rakib-hasan requested review from hypdeb and amukkara August 8, 2025 00:19
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14521 [ run ] triggered by Bot

Copy link
Collaborator

@hypdeb hypdeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once rebased on the circular import changes, LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14521 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10969 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@rakib-hasan
Copy link
Collaborator Author

/bot run

@rakib-hasan rakib-hasan enabled auto-merge (squash) August 12, 2025 00:33
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14865 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14865 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11222 completed with status: 'FAILURE'

@rakib-hasan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14898 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14898 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11242 completed with status: 'FAILURE'

@rakib-hasan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14998 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14998 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11331 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@rakib-hasan rakib-hasan merged commit 2923eb8 into NVIDIA:main Aug 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants