A comprehensive C# wrapper for OpenVINO and OpenVINO GenAI, providing idiomatic .NET APIs for AI inference and generative AI tasks. Currently working on adding bindings for WhisperPipeline C API too.
For LLM, an alternative to consider is Microsoft's C# Foundry Local package if you're just looking to run inference on GPU.
- OpenVINO.NET.Core: Core OpenVINO functionality for model inference
- OpenVINO.NET.GenAI: Generative AI capabilities including LLM pipelines
- OpenVINO.NET.Native: Native library management and deployment
- Modern C# API: Async/await, IAsyncEnumerable, SafeHandle resource management
- Windows x64 Support: Optimized for Windows deployment scenarios
- .NET 8.0 or later
- Windows x64
- OpenVINO GenAI 2025.2.0.0 runtime
The easiest way to get started is with the QuickDemo application that automatically downloads a model:
By default the script downloads for ubuntu 24, if have another version, change it in the script
scripts/download-openvino-runtime.sh
OPENVINO_RUNTIME_PATH=/home/brandon/OpenVINO.GenAI.NET/build/native/runtimes/linux-x64/native dotnet run --project samples/QuickDemo/ --configuration Release -- --device CPU
For Windwos
.\scripts\download-openvino-runtime.ps1
$env:OPENVINO_RUNTIME_PATH = "C:\Users\brand\code\OpenVINO.GenAI.NET\build\native\runtimes\win-x64\native"
dotnet run --project samples/QuickDemo/ --configuration Release -- --device CPU
Sample Output:
OpenVINO.NET Quick Demo
=======================
Model: Qwen3-0.6B-fp16-ov
Temperature: 0.7, Max Tokens: 100
✓ Model found at: ./Models/Qwen3-0.6B-fp16-ov
Device: CPU
Prompt 1: "Explain quantum computing in simple terms:"
Response: "Quantum computing is a revolutionary technology that uses quantum mechanics principles..."
Performance: 12.4 tokens/sec, First token: 450ms
For integrating into your own applications:
using OpenVINO.NET.GenAI;
using var pipeline = new LLMPipeline("path/to/model", "CPU");
var config = GenerationConfig.Default.WithMaxTokens(100).WithTemperature(0.7f);
string result = await pipeline.GenerateAsync("Hello, world!", config);
Console.WriteLine(result);
using OpenVINO.NET.GenAI;
using var pipeline = new LLMPipeline("path/to/model", "CPU");
var config = GenerationConfig.Default.WithMaxTokens(100);
await foreach (var token in pipeline.GenerateStreamAsync("Tell me a story", config))
{
Console.Write(token);
}
OpenVINO.NET.Core
- Core OpenVINO wrapperOpenVINO.NET.GenAI
- GenAI functionalityOpenVINO.NET.Native
- Native library managementQuickDemo
- Quick start demo with automatic model downloadTextGeneration.Sample
- Basic text generation exampleStreamingChat.Sample
- Streaming chat application
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ OpenVINO.NET.GenAI │
│ • LLMPipeline (High-level API) │
│ • GenerationConfig (Fluent configuration) │
│ • ChatSession (Conversation management) │
│ • IAsyncEnumerable streaming │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ OpenVINO.NET.Core │
│ • Core OpenVINO functionality │
│ • Model loading and inference │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ OpenVINO.NET.Native │
│ • P/Invoke declarations │
│ • SafeHandle resource management │
│ • MSBuild targets for DLL deployment │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ OpenVINO GenAI C API │
│ • Native OpenVINO GenAI runtime │
│ • Version: 2025.2.0.0 │
└─────────────────────────────────────────────────────────────┘
- Memory Safe: SafeHandle pattern for automatic resource cleanup
- Async/Await: Full async support with cancellation tokens
- Streaming: Real-time token generation with
IAsyncEnumerable<string>
- Fluent API: Chainable configuration methods
- Error Handling: Comprehensive exception handling and device fallbacks
- Performance: Optimized for both throughput and latency
-
Install .NET 8.0 SDK or later
- Download from: https://dotnet.microsoft.com/download
-
Install OpenVINO GenAI Runtime 2025.2.0.0
- Download from: https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/
- Extract to a directory in your PATH, or place DLLs in your application's output directory
# Compare all available devices
dotnet run --project samples/QuickDemo -- --benchmark
Error: The specified module could not be found. (Exception from HRESULT: 0x8007007E)
Solution: Ensure OpenVINO GenAI runtime DLLs are in your PATH or application directory.
Error: Failed to create LLM pipeline on GPU: Device GPU is not supported
Solutions:
- Check device availability:
dotnet run --project samples/QuickDemo -- --benchmark
- Use CPU fallback:
dotnet run --project samples/QuickDemo -- --device CPU
- Install appropriate drivers (Intel GPU driver for GPU support, Intel NPU driver for NPU)
Error: Failed to download model files from HuggingFace
Solutions:
- Check internet connectivity
- Verify HuggingFace is accessible
- Manually download model files to
./Models/Qwen3-0.6B-fp16-ov/
Error: Insufficient memory to load model
Solutions:
- Use a smaller model
- Reduce max_tokens parameter
- Close other memory-intensive applications
- Consider using INT4 quantized models
Enable detailed logging by setting environment variable:
# Windows
set OPENVINO_LOG_LEVEL=DEBUG
# Linux/macOS
export OPENVINO_LOG_LEVEL=DEBUG
-
Install Prerequisites
- Visual Studio 2022 or VS Code with C# extension
- .NET 9.0 SDK
- OpenVINO GenAI runtime
-
Build and Test
dotnet build OpenVINO.NET.sln dotnet test tests/OpenVINO.NET.GenAI.Tests/
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenVINO GenAI Documentation
- OpenVINO GenAI C API Reference
- .NET P/Invoke Documentation
- HuggingFace Model Hub
dotnet build OpenVINO.NET.sln