Skip to content

Conversation

wsttiger
Copy link
Collaborator

Add TensorRT Decoder Plugin for Quantum Error Correction

Overview

This PR introduces a new TensorRT-based decoder plugin for quantum error correction, leveraging NVIDIA TensorRT for accelerated neural network inference in QEC applications.

Key Features

  • TensorRT Integration: Full TensorRT runtime integration with support for both ONNX model loading and pre-built engine loading
  • Flexible Precision Support: Configurable precision modes (fp16, bf16, int8, fp8, tf32, best) with automatic hardware capability detection
  • Memory Management: Efficient CUDA memory allocation and stream-based execution
  • Parameter Validation: Comprehensive input validation with clear error messages
  • Python Utilities: ONNX to TensorRT engine conversion script for model preprocessing

Technical Implementation

  • Core Decoder Class: trt_decoder implementing the decoder interface with TensorRT backend
  • Hardware Detection: Automatic GPU capability detection for optimal precision selection
  • Error Handling: Robust error handling with graceful fallbacks and informative error messages
  • Plugin Architecture: CMake-based plugin system with conditional TensorRT linking

Files Added/Modified

  • libs/qec/include/cudaq/qec/trt_decoder_internal.h - Internal API declarations
  • libs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp - Main decoder implementation
  • libs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt - Plugin build configuration
  • libs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py - Python utility
  • libs/qec/unittests/test_trt_decoder.cpp - Comprehensive unit tests
  • Updated CMakeLists.txt files for integration

Testing

  • ✅ All 8 unit tests passing
  • Parameter validation tests
  • File loading utility tests
  • Edge case handling tests
  • Error condition testing

Usage Example

// Load from ONNX model
cudaqx::heterogeneous_map params;
params.insert("onnx_load_path", "model.onnx");
params.insert("precision", "fp16");
auto decoder = std::make_unique<trt_decoder>(H, params);

// Or load pre-built engine
params.clear();
params.insert("engine_load_path", "model.trt");
auto decoder = std::make_unique<trt_decoder>(H, params);

Dependencies

  • TensorRT 10.13.3.9+
  • CUDA 12.0+
  • NVIDIA GPU with appropriate compute capability

Performance Benefits

  • GPU-accelerated inference for QEC decoding
  • Optimized precision selection based on hardware capabilities
  • Efficient memory usage with CUDA streams
  • Reduced latency compared to CPU-based decoders

This implementation provides a production-ready TensorRT decoder plugin that can significantly accelerate quantum error correction workflows while maintaining compatibility with the existing CUDA-Q QEC framework.

Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add trt_decoder class implementing TensorRT-accelerated inference
- Support both ONNX model loading and pre-built engine loading
- Include precision configuration (fp16, bf16, int8, fp8, tf32, best)
- Add hardware platform detection for capability-based precision selection
- Implement CUDA memory management and stream-based execution
- Add Python utility script for ONNX to TensorRT engine conversion
- Update CMakeLists.txt to build TensorRT decoder plugin
- Add comprehensive parameter validation and error handling
Signed-off-by: Scott Thornton <[email protected]>
import tensorrt as trt


def build_engine(onnx_file,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file exposed as part of the wheel such that regular users will be able to use this file?

@wsttiger
Copy link
Collaborator Author

/ok to test fb16b36

Copy link

copy-pr-bot bot commented Oct 16, 2025

/ok to test fb16b36

@wsttiger, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@wsttiger
Copy link
Collaborator Author

/ok to test c9e563f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants