Skip to content

Commit 4bcedf5

Browse files
docs: Possible update to injection detection (#1144)
Signed-off-by: Mike McKiernan <[email protected]>
1 parent 5c5261c commit 4bcedf5

File tree

4 files changed

+180
-24
lines changed

4 files changed

+180
-24
lines changed

docs/user-guides/guardrails-library.md

Lines changed: 108 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -952,29 +952,38 @@ Times reported below in are **averages** and are reported in milliseconds.
952952
| Docker | 2057 | 115 |
953953
| In-Process | 3227 | 157 |
954954

955-
956955
### Injection Detection
957-
NeMo Guardrails offers detection of potential injection attempts (_e.g._ code injection, cross-site scripting, SQL injection, template injection) using [YARA rules](https://yara.readthedocs.io/en/stable/index.html), a technology familiar to many security teams.
958-
NeMo Guardrails ships with some basic rules for the following categories:
959-
* Code injection (Python)
960-
* Cross-site scripting (Markdown and Javascript)
961-
* SQL injection
962-
* Template injection (Jinja)
963956

964-
Additional rules can be added by including them in the `library/injection_detection/yara_rules` folder or specifying a `yara_path` with all the rules.
957+
NeMo Guardrails offers detection of potential exploitation attempts by using injection such as code injection, cross-site scripting, SQL injection, and template injection.
958+
Injection detection is primarily intended to be used in agentic systems to enhance other security controls as part of a defense-in-depth strategy.
959+
960+
The first part of injection detection is [YARA rules](https://yara.readthedocs.io/en/stable/index.html).
961+
A YARA rule specifies a set of strings--text or binary patterns--to match and a Boolean expression that specifies the logic of the rule.
962+
YARA rules are a technology that is familiar to many security teams.
963+
964+
The second part of injection detection is specifying the action to take when a rule is triggered.
965+
You can specify to *reject* the text and return "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}."
966+
Rejecting the output is the safest action and most appropriate for production deployments.
967+
As an alternative to rejecting the output, you can specify to *omit* the triggering text from the response.
968+
969+
#### About the Default Rules
965970

966-
Injection detection has a number of action options that indicate what to do when potential exploitation is detected.
967-
Two options are currently available: `reject` and `omit`, with `sanitize` planned for a future release.
971+
By default, NeMo Guardrails provides the following rules:
968972

969-
* `reject` will return a message to the user indicating that their query could not be handled and they should try again.
970-
* `omit` will return the model's output, removing the offending detected content.
971-
* `sanitize` attempts to "de-fang" the malicious content, returning the output in a way that is less likely to result exploitation. This action is generally considered unsuitable for production use.
973+
- Code injection (Python): Recommended if the LLM output is used as an argument to downstream functions or passed to a code interpreter.
974+
- SQL injection: Recommended if the LLM output is used as part of a SQL query to a database.
975+
- Template injection (Jinja): Recommended for use if LLM output is rendered using the Jinja templating language.
976+
This rule is usually paired with code injection rules.
977+
- Cross-site scripting (Markdown and Javascript): Recommended if the LLM output is rendered directly in HTML or Markdown.
978+
979+
You can view the default rules in the [yara_rules directory](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/injection_detection/yara_rules) of the GitHub repository.
972980

973981
#### Configuring Injection Detection
974-
To activate injection detection, you must include the `injection detection` output flow.
982+
983+
To activate injection detection, you must specify the rules to apply and the action to take as well as include the `injection detection` output flow.
975984
As an example config:
976985

977-
```colang
986+
```yaml
978987
rails:
979988
config:
980989
injection_detection:
@@ -991,14 +1000,89 @@ rails:
9911000
- injection detection
9921001
```
9931002
994-
**SECURITY WARNING:** It is _strongly_ advised that the `sanitize` action not be used in production systems, as there is no guarantee of its efficacy, and it may lead to adverse security outcomes.
1003+
Refer to the following table for the `rails.config.injection_detection` field syntax reference:
1004+
1005+
```{list-table}
1006+
:header-rows: 1
1007+
1008+
* - Field
1009+
- Description
1010+
- Default Value
1011+
1012+
* - `injections`
1013+
- Specifies the injection detection rules to use.
1014+
The following injections are part of the library:
1015+
1016+
- `code` for Python code injection
1017+
- `sqli` for SQL injection
1018+
- `template` for Jinja template injection
1019+
- `xss` for cross-site scripting
1020+
- None (required)
1021+
1022+
* - `action`
1023+
- Specifies the action to take when injection is detected.
1024+
Refer to the following actions:
1025+
1026+
- `reject` returns a message to the user indicating that the query could not be handled and they should try again.
1027+
- `omit` returns the model response, removing the offending detected content.
1028+
- None (required)
1029+
1030+
* - `yara_path`
1031+
- Specifies the path to a directory that contains custom YARA rules.
1032+
- `library/injection_detection/yara_rules` in the NeMo Guardrails package.
1033+
1034+
* - `yara_rules`
1035+
- Specifies inline YARA rules.
1036+
The field is a dictionary that maps rule names to the rules.
1037+
The rules use the string data type.
1038+
1039+
```yaml
1040+
yara_rules:
1041+
<inline-rule-name>: |-
1042+
<inline-rule-content>
1043+
```
1044+
1045+
If specified, these inline rules override the rules found in the `yara_path` field.
1046+
- None
1047+
```
1048+
1049+
For information about writing YARA rules, refer to the [YARA documentation](https://yara.readthedocs.io/en/stable/index.html).
1050+
1051+
#### Example
1052+
1053+
Before you begin, install the `yara-python` package or you can install the NeMo Guardrails package with `pip install nemoguardrails[jailbreak]`.
1054+
1055+
1. Set your NVIDIA API key as an environment variable:
1056+
1057+
```console
1058+
$ export NVIDIA_API_KEY=<nvapi-...>
1059+
```
1060+
1061+
1. Create a configuration directory, such as `config`, and add a `config.yml` file with contents like the following:
1062+
1063+
```{literalinclude} ../../examples/configs/injection_detection/config/config.yml
1064+
:language: yaml
1065+
```
1066+
1067+
1. Load the guardrails configuration:
1068+
1069+
```{literalinclude} ../../examples/configs/injection_detection/demo.py
1070+
:language: python
1071+
:start-after: "# start-load-config"
1072+
:end-before: "# end-load-config"
1073+
```
1074+
1075+
1. Send a possibly unsafe request:
1076+
1077+
```{literalinclude} ../../examples/configs/injection_detection/demo.py
1078+
:language: python
1079+
:start-after: "# start-unsafe-response"
1080+
:end-before: "# end-unsafe-response"
1081+
```
9951082

996-
This rail is primarily intended to be used in agentic systems to _enhance_ other security controls as part of a defense in depth strategy.
997-
The provided rules are recommended to be used in the following settings:
998-
* `code`: Recommended if the LLM's output will be used as an argument to downstream functions or passed to a code interpreter.
999-
* `sqli`: Recommended if the LLM's output will be used as part of a SQL query to a database
1000-
* `template`: Recommended for use if LLM output is rendered using templating languages like Jinja. This rule should usually be paired with `code` rules.
1001-
* `xss`: Recommended if LLM output will be rendered directly in HTML or Markdown
1083+
*Example Output*
10021084

1003-
The included rules are in no way comprehensive.
1004-
They can and should be extended by security teams for use in your application's particular context and paired with additional security controls.
1085+
```{literalinclude} ../../examples/configs/injection_detection/demo-out.txt
1086+
:start-after: "# start-unsafe-response"
1087+
:end-before: "# end-unsafe-response"
1088+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
models:
2+
- type: main
3+
engine: nvidia_ai_endpoints
4+
model: meta/llama-3.3-70b-instruct
5+
6+
rails:
7+
config:
8+
injection_detection:
9+
injections:
10+
- code
11+
- sqli
12+
- template
13+
- xss
14+
action: reject
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# start-unsafe-response
2+
{'role': 'assistant', 'content': '**Getting the Weather in Santa Clara using Python**\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\n**Prerequisites**\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\n**Code**\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n """\n Get the current weather in a city.\n\n Args:\n api_key (str): OpenWeatherMap API key\n city (str): City name\n units (str, optional): Units of measurement (default: \'metric\')\n\n Returns:\n dict: Weather data\n """\n base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n params = {\n \'q\': city,\n \'units\': units,\n \'appid\': api_key\n }\n response = requests.get(base_url, params=params)\n response.raise_for_status()\n return response.json()\n\ndef main():\n api_key = \'YOUR_API_KEY\' # replace with your OpenWeatherMap API key\n city = \'Santa Clara\'\n weather_data = get_weather(api_key, city)\n print(\'Weather in {}:\'.format(city))\n print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n main()\n```\n\n**Explanation**\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\n**Notes**\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'}
3+
# end-unsafe-response
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# fmt: off
17+
18+
import atexit
19+
import os
20+
import sys
21+
from pathlib import Path
22+
23+
curdir = os.getcwd()
24+
25+
@atexit.register
26+
def cleanup():
27+
os.chdir(curdir)
28+
29+
os.chdir(Path(__file__).parent)
30+
31+
# start-load-config
32+
from nemoguardrails import LLMRails, RailsConfig
33+
34+
config = RailsConfig.from_path("./config")
35+
rails = LLMRails(config)
36+
# end-load-config
37+
38+
# start-unsafe-response
39+
messages = [
40+
{
41+
"role": "user",
42+
"content": "Write a Python script that uses the requests library, and maybe urllib, to get the weather in Santa Clara.",
43+
},
44+
]
45+
46+
response = rails.generate(messages=messages)
47+
print(response)
48+
# end-unsafe-response
49+
50+
stdout = sys.stdout
51+
with open("demo-out.txt", "w") as sys.stdout:
52+
print("# start-unsafe-response")
53+
print(response)
54+
print("# end-unsafe-response\n")
55+
sys.stdout = stdout

0 commit comments

Comments
 (0)