docs: Possible update to injection detection (#1144)

mikemckiernan · web-flow · commit 4bcedf52cda3 · 2025-05-28T18:58:32.000+02:00
Signed-off-by: Mike McKiernan &lt;mmckiernan@nvidia.com&gt;
diff --git a/docs/user-guides/guardrails-library.md b/docs/user-guides/guardrails-library.md
@@ -952,29 +952,38 @@ Times reported below in are **averages** and are reported in milliseconds.
 | Docker     | 2057  | 115 |
 | In-Process | 3227  | 157 |
 
-
 ### Injection Detection
-NeMo Guardrails offers detection of potential injection attempts (_e.g._ code injection, cross-site scripting, SQL injection, template injection) using [YARA rules](https://yara.readthedocs.io/en/stable/index.html), a technology familiar to many security teams.
-NeMo Guardrails ships with some basic rules for the following categories:
-* Code injection (Python)
-* Cross-site scripting (Markdown and Javascript)
-* SQL injection
-* Template injection (Jinja)
 
-Additional rules can be added by including them in the `library/injection_detection/yara_rules` folder or specifying a `yara_path` with all the rules.
+NeMo Guardrails offers detection of potential exploitation attempts by using injection such as code injection, cross-site scripting, SQL injection, and template injection.
+Injection detection is primarily intended to be used in agentic systems to enhance other security controls as part of a defense-in-depth strategy.
+
+The first part of injection detection is [YARA rules](https://yara.readthedocs.io/en/stable/index.html).
+A YARA rule specifies a set of strings--text or binary patterns--to match and a Boolean expression that specifies the logic of the rule.
+YARA rules are a technology that is familiar to many security teams.
+
+The second part of injection detection is specifying the action to take when a rule is triggered.
+You can specify to *reject* the text and return "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}."
+Rejecting the output is the safest action and most appropriate for production deployments.
+As an alternative to rejecting the output, you can specify to *omit* the triggering text from the response.
+
+#### About the Default Rules
 
-Injection detection has a number of action options that indicate what to do when potential exploitation is detected.
-Two options are currently available: `reject` and `omit`, with `sanitize` planned for a future release.
+By default, NeMo Guardrails provides the following rules:
 
-* `reject` will return a message to the user indicating that their query could not be handled and they should try again.
-* `omit` will return the model's output, removing the offending detected content.
-* `sanitize` attempts to "de-fang" the malicious content, returning the output in a way that is less likely to result exploitation. This action is generally considered unsuitable for production use.
+- Code injection (Python): Recommended if the LLM output is used as an argument to downstream functions or passed to a code interpreter.
+- SQL injection: Recommended if the LLM output is used as part of a SQL query to a database.
+- Template injection (Jinja): Recommended for use if LLM output is rendered using the Jinja templating language.
+  This rule is usually paired with code injection rules.
+- Cross-site scripting (Markdown and Javascript): Recommended if the LLM output is rendered directly in HTML or Markdown.
+
+You can view the default rules in the [yara_rules directory](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/injection_detection/yara_rules) of the GitHub repository.
 
 #### Configuring Injection Detection
-To activate injection detection, you must include the `injection detection` output flow.
+
+To activate injection detection, you must specify the rules to apply and the action to take as well as include the `injection detection` output flow.
 As an example config:
 
-```colang
+```yaml
 rails:
   config:
     injection_detection:
@@ -991,14 +1000,89 @@ rails:
       - injection detection
 ```
 
-**SECURITY WARNING:** It is _strongly_ advised that the `sanitize` action not be used in production systems, as there is no guarantee of its efficacy, and it may lead to adverse security outcomes.
+Refer to the following table for the `rails.config.injection_detection` field syntax reference:
+
+```{list-table}
+:header-rows: 1
+
+* - Field
+  - Description
+  - Default Value
+
+* - `injections`
+  - Specifies the injection detection rules to use.
+    The following injections are part of the library:
+
+    - `code` for Python code injection
+    - `sqli` for SQL injection
+    - `template` for Jinja template injection
+    - `xss` for cross-site scripting
+  - None (required)
+
+* - `action`
+  - Specifies the action to take when injection is detected.
+    Refer to the following actions:
+
+    - `reject` returns a message to the user indicating that the query could not be handled and they should try again.
+    - `omit` returns the model response, removing the offending detected content.
+  - None (required)
+
+* - `yara_path`
+  - Specifies the path to a directory that contains custom YARA rules.
+  - `library/injection_detection/yara_rules` in the NeMo Guardrails package.
+
+* - `yara_rules`
+  - Specifies inline YARA rules.
+    The field is a dictionary that maps rule names to the rules.
+    The rules use the string data type.
+
+    ```yaml
+    yara_rules:
+      <inline-rule-name>: |-
+        <inline-rule-content>
+    ```
+
+    If specified, these inline rules override the rules found in the `yara_path` field.
+  - None
+```
+
+For information about writing YARA rules, refer to the [YARA documentation](https://yara.readthedocs.io/en/stable/index.html).
+
+#### Example
+
+Before you begin, install the `yara-python` package or you can install the NeMo Guardrails package with `pip install nemoguardrails[jailbreak]`.
+
+1. Set your NVIDIA API key as an environment variable:
+
+   ```console
+   $ export NVIDIA_API_KEY=<nvapi-...>
+   ```
+
+1. Create a configuration directory, such as `config`, and add a `config.yml` file with contents like the following:
+
+   ```{literalinclude} ../../examples/configs/injection_detection/config/config.yml
+   :language: yaml
+   ```
+
+1. Load the guardrails configuration:
+
+   ```{literalinclude} ../../examples/configs/injection_detection/demo.py
+   :language: python
+   :start-after: "# start-load-config"
+   :end-before: "# end-load-config"
+   ```
+
+1. Send a possibly unsafe request:
+
+   ```{literalinclude} ../../examples/configs/injection_detection/demo.py
+   :language: python
+   :start-after: "# start-unsafe-response"
+   :end-before: "# end-unsafe-response"
+   ```
 
-This rail is primarily intended to be used in agentic systems to _enhance_ other security controls as part of a defense in depth strategy.
-The provided rules are recommended to be used in the following settings:
-* `code`: Recommended if the LLM's output will be used as an argument to downstream functions or passed to a code interpreter.
-* `sqli`: Recommended if the LLM's output will be used as part of a SQL query to a database
-* `template`: Recommended for use if LLM output is rendered using templating languages like Jinja. This rule should usually be paired with `code` rules.
-* `xss`: Recommended if LLM output will be rendered directly in HTML or Markdown
+   *Example Output*
 
-The included rules are in no way comprehensive.
-They can and should be extended by security teams for use in your application's particular context and paired with additional security controls.
+   ```{literalinclude} ../../examples/configs/injection_detection/demo-out.txt
+   :start-after: "# start-unsafe-response"
+   :end-before: "# end-unsafe-response"
+   ```
diff --git a/examples/configs/injection_detection/config/config.yml b/examples/configs/injection_detection/config/config.yml
@@ -0,0 +1,14 @@
+models:
+  - type: main
+    engine: nvidia_ai_endpoints
+    model: meta/llama-3.3-70b-instruct
+
+rails:
+  config:
+    injection_detection:
+      injections:
+        - code
+        - sqli
+        - template
+        - xss
+      action: reject
diff --git a/examples/configs/injection_detection/demo-out.txt b/examples/configs/injection_detection/demo-out.txt
@@ -0,0 +1,3 @@
+# start-unsafe-response
+{'role': 'assistant', 'content': '**Getting the Weather in Santa Clara using Python**\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\n**Prerequisites**\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\n**Code**\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n    """\n    Get the current weather in a city.\n\n    Args:\n        api_key (str): OpenWeatherMap API key\n        city (str): City name\n        units (str, optional): Units of measurement (default: \'metric\')\n\n    Returns:\n        dict: Weather data\n    """\n    base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n    params = {\n        \'q\': city,\n        \'units\': units,\n        \'appid\': api_key\n    }\n    response = requests.get(base_url, params=params)\n    response.raise_for_status()\n    return response.json()\n\ndef main():\n    api_key = \'YOUR_API_KEY\'  # replace with your OpenWeatherMap API key\n    city = \'Santa Clara\'\n    weather_data = get_weather(api_key, city)\n    print(\'Weather in {}:\'.format(city))\n    print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n    print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n    print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n    main()\n```\n\n**Explanation**\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\n**Notes**\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'}
+# end-unsafe-response
diff --git a/examples/configs/injection_detection/demo.py b/examples/configs/injection_detection/demo.py
@@ -0,0 +1,55 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# fmt: off
+
+import atexit
+import os
+import sys
+from pathlib import Path
+
+curdir = os.getcwd()
+
+@atexit.register
+def cleanup():
+    os.chdir(curdir)
+
+os.chdir(Path(__file__).parent)
+
+# start-load-config
+from nemoguardrails import LLMRails, RailsConfig
+
+config = RailsConfig.from_path("./config")
+rails = LLMRails(config)
+# end-load-config
+
+# start-unsafe-response
+messages = [
+    {
+        "role": "user",
+        "content": "Write a Python script that uses the requests library, and maybe urllib, to get the weather in Santa Clara.",
+    },
+]
+
+response = rails.generate(messages=messages)
+print(response)
+# end-unsafe-response
+
+stdout = sys.stdout
+with open("demo-out.txt", "w") as sys.stdout:
+    print("# start-unsafe-response")
+    print(response)
+    print("# end-unsafe-response\n")
+sys.stdout = stdout

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# start-unsafe-response`
	`2`	+{'role': 'assistant', 'content': 'Getting the Weather in Santa Clara using Python\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\nPrerequisites\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\nCode\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n """\n Get the current weather in a city.\n\n Args:\n api_key (str): OpenWeatherMap API key\n city (str): City name\n units (str, optional): Units of measurement (default: \'metric\')\n\n Returns:\n dict: Weather data\n """\n base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n params = {\n \'q\': city,\n \'units\': units,\n \'appid\': api_key\n }\n response = requests.get(base_url, params=params)\n response.raise_for_status()\n return response.json()\n\ndef main():\n api_key = \'YOUR_API_KEY\' # replace with your OpenWeatherMap API key\n city = \'Santa Clara\'\n weather_data = get_weather(api_key, city)\n print(\'Weather in {}:\'.format(city))\n print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n main()\n```\n\nExplanation\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\nNotes\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'}
	`3`	`+# end-unsafe-response`