feat: ORT GenAI Stateful Compilation changes #676

ankitm3k · 2025-04-25T07:53:20Z

Description

This PR enables the essential features to enable ORT GenAI with OVEP using Stateful Compilation of ov::Model, inspired from OV GenAI pipeline flow.

I have introduced a new provider option enable_causallm which can be set to True for enabling the ORT GenAI pipeline with Causal LLM Models that are fully supported on OVEP in the custom config file called genai_config.json

Sample genai_config.json -

"provider_options": [
    {
         // Key "OpenVINO" is case sensitive and must be used as below as its defined by MSFT
         "OpenVINO":
          {
              "device_type": "NPU",
              // Mandatory provider option to be set always with ORT GenAI
              "enable_causallm" : "True",
              // (Applicable for NPU only) Optional setting for compilation with custom MAX_PROMPT_LEN & MIN_RESPONSE_LEN
              "load_config": "{\"NPU\":{\"MAX_PROMPT_LEN\":\"2048\",\"MIN_RESPONSE_LEN\":\"512\"}}"                            
                       
           }
    }
    ]

FYI the GenAI models in ONNX format are usually Stateless in nature & require dynamic shapes compilation.

onnxruntime/core/providers/openvino/backends/basic_backend.cc

onnxruntime/test/contrib_ops/attention_op_test.cc

onnxruntime/core/providers/openvino/backend_manager.cc

onnxruntime/core/providers/openvino/ibackend.h

preetha-intel · 2025-06-03T05:01:18Z

onnxruntime/core/providers/openvino/backend_manager.cc

@@ -106,7 +106,8 @@ BackendManager::BackendManager(SessionContext& session_context,
    subgraph_context_.has_dynamic_input_shape = true;
    LOGS_DEFAULT(INFO) << "[OpenVINO-EP] Model has symbolic input dims";
    if ((session_context_.device_type.find("CPU") != std::string::npos ||
-         session_context_.device_type.find("GPU") != std::string::npos) &&
+         session_context_.device_type.find("GPU") != std::string::npos ||
+         (session_context_.device_type.find("NPU") != std::string::npos && session_context_.enable_causallm)) &&


Can this condition be simplified Since this is not valid only for a dynamic model on NPU

onnxruntime/core/providers/openvino/ov_interface.cc

RyanMetcalfeInt8 · 2025-06-04T19:34:08Z

onnxruntime/core/providers/openvino/ov_interface.cc

+  ovInfReq.set_tensor(tensor_name, tensor);
+}
+
+void StatefulOVInferRequest::CacheTensor(const std::string& tensor_name, std::vector<int64_t>& cache) {


Note that we may eventually need to support caching position_ids / logits which have more complicated shapes than just [1, <num_input_tokens>].

will handle that logic in the next PR

onnxruntime/core/providers/openvino/backends/basic_backend.h

preetha-intel

LGTM

Copilot

Pull Request Overview

This PR introduces stateful compilation support for ORT GenAI using OpenVINO by integrating a new provider option, enable_causallm, along with several supporting changes in compilation, inference request handling, and backend communication. Key changes include adding stateful model transformation utilities, updating the OpenVINO interface to support causal LM functionality, and modifying test cases and backend management to incorporate KV cache rewind and dynamic shapes handling.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

File	Description
onnxruntime/test/perftest/ort_test_session.cc	Extended option parsing for enable_causallm with boolean value checking and error messaging
onnxruntime/core/providers/openvino/ov_interface.{h,cc}	Added StatefulCompileModel mechanism and updated OVExeNetwork to carry extra stateful attributes
onnxruntime/core/providers/openvino/openvino_provider_factory.cc	Integrated parsing logic for enable_causallm and adjusted dynamic shapes flags for NPU devices
onnxruntime/* (multiple backend and execution provider files)	Updated backend and infer request handling to support KV cache operations, stateful inference, and additional configuration for ORT GenAI

Comments suppressed due to low confidence (2)

onnxruntime/core/providers/openvino/backends/basic_backend.h:54

It would be helpful to add a comment explaining why tensor names such as ''beam_idx'', ''past_key_values'', and ''present'' are being skipped when session_context.enable_causallm is true. This aids future maintainers in understanding the rationale behind bypassing KV cache tensor mapping in stateful model scenarios.

if ((onnx_name.empty() || onnx_name == "beam_idx" ||

onnxruntime/core/providers/openvino/ov_interface.h:90

[nitpick] Consider renaming the member variable 'compiled_model_obj' to simply 'compiled_model' for improved clarity and consistency with standard naming conventions.

ov::CompiledModel compiled_model_obj;

* feat: ORT GenAI Stateful Compilation changes * fix: Disabled UT as testdata/attention_past_state.onnx model is invalid * fix:lint fixes * fix: refactor tensor caching * update: Fix optional position ids caching logic * fix: remove unwanted comment

ankitm3k added the ort_genai_ovep label Apr 25, 2025

ankitm3k requested review from sfatimar and vthaniel April 25, 2025 07:53

ankitm3k self-assigned this Apr 25, 2025

ankitm3k requested a review from jatinwadhwa921 April 25, 2025 09:06

preetha-intel reviewed Apr 25, 2025

View reviewed changes

onnxruntime/core/providers/openvino/backends/basic_backend.cc Outdated Show resolved Hide resolved

ankitm3k force-pushed the ort_genai_features branch 3 times, most recently from 46802e5 to f137407 Compare May 19, 2025 08:11

ankitm3k force-pushed the ort_genai_features branch from f137407 to 8a9fca1 Compare May 20, 2025 07:04

ankitm3k force-pushed the ort_genai_features branch 2 times, most recently from b672820 to 775c27e Compare June 2, 2025 06:19

vthaniel reviewed Jun 2, 2025

View reviewed changes

onnxruntime/test/contrib_ops/attention_op_test.cc Show resolved Hide resolved

preetha-intel reviewed Jun 3, 2025

View reviewed changes

onnxruntime/core/providers/openvino/backend_manager.cc Show resolved Hide resolved

preetha-intel reviewed Jun 3, 2025

View reviewed changes

onnxruntime/core/providers/openvino/ibackend.h Outdated Show resolved Hide resolved

preetha-intel reviewed Jun 3, 2025

View reviewed changes

MayureshV1 requested a review from Copilot June 3, 2025 23:58

This comment was marked as outdated.

Sign in to view

ankitm3k requested a review from Copilot June 4, 2025 12:20

This comment was marked as outdated.

Sign in to view

ankitm3k force-pushed the ort_genai_features branch 2 times, most recently from 845f903 to 833cff9 Compare June 4, 2025 13:02

RyanMetcalfeInt8 reviewed Jun 4, 2025

View reviewed changes

preetha-intel approved these changes Jun 5, 2025

View reviewed changes

ankitm3k force-pushed the ort_genai_features branch from 97775fd to a5ac79d Compare June 5, 2025 10:00

ankitm3k added 5 commits June 5, 2025 15:47

feat: ORT GenAI Stateful Compilation changes

b78043b

fix: Disabled UT as testdata/attention_past_state.onnx model is invalid

fab0072

fix:lint fixes

60b08be

fix: refactor tensor caching

9cab433

update: Fix optional position ids caching logic

3de8201

fix: remove unwanted comment

a9b1f9d

ankitm3k force-pushed the ort_genai_features branch from a5ac79d to a9b1f9d Compare June 5, 2025 10:47

ankitm3k requested a review from Copilot June 5, 2025 12:01

Copilot AI reviewed Jun 5, 2025

View reviewed changes

ankitm3k merged commit 660adfc into ovep-develop Jun 5, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: ORT GenAI Stateful Compilation changes #676

feat: ORT GenAI Stateful Compilation changes #676

Uh oh!

ankitm3k commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

preetha-intel Jun 3, 2025

Uh oh!

ankitm3k Jun 3, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

RyanMetcalfeInt8 Jun 4, 2025

Uh oh!

ankitm3k Jun 5, 2025

Uh oh!

Uh oh!

preetha-intel left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

feat: ORT GenAI Stateful Compilation changes #676

feat: ORT GenAI Stateful Compilation changes #676

Uh oh!

Conversation

ankitm3k commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

preetha-intel Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

ankitm3k Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

RyanMetcalfeInt8 Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

ankitm3k Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

preetha-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

ankitm3k commented Apr 25, 2025 •

edited

Loading