feat: improved XMLArgs processing #3358

airween · 2025-04-07T13:34:00Z

what

This PR adds a new feature within XML processing.

Old (current) behavior: in case of XML:/* target the body processor expands the node values from the XML payload. Eg.:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
</root>

will produce this value:

[/post][9] Target value: "  foo1  bar1  foo2  bar2"

In this case, there is no option to exclude any node. For example, if a node contains a term that a rule is looking for, the administrator could not create an exclusion. The only solution is to exclude the whole rule.

New behavior: there is a new configuration keyword, SecParseXMLintoArgs with possible values On, Off and OnlyArgs. The default value is Off. This won't change anything. If the administrator set this to On, then the engine will parse the XML into ARGS AND the XML:/* target will still contain the only text content as before. If the value is OnlyArgs then only the parsed content will appear in ARGS target; the XML:/* target won't contain the parsed content anymore.

If administrator sets it to On, then the node values will appear in ARGS, and it's easy to make any exclusion against the named target.

why

A customer request has been received to solve this.

references

See #3178.

RedXanadu · 2025-04-25T12:50:04Z

This is a great new feature. This will open up ModSecurity to anyone who needs to do serious processing of XML APIs (lots of legacy and current applications!). Especially with pre-written rule sets like CRS, this makes the task of handling false positives possible.

Thank you for the work that has gone into this 🚀

dune73 · 2025-04-25T12:55:30Z

@airween Could you share how the new option parses / advertises multi-level documents with multiple leaves carrying the same name? Is the hierarchy part of the name or is that hidden?

Needless to say, that I really like this option.

airween · 2025-04-25T13:08:56Z

@dune73,

I hope I understand your question as well 😄, so consider this file:

cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
</root>

and this request:

curl -v -H "Content-Type: application/xml" -X POST -d @test.xml http://localhost/post.php

This will generates these arguments (it's totally the same as in case of JSON):

Adding XML argument 'xml.root.level1.level2.node' with value 'foo1'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar1'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo2'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar2'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo1'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar1'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo2'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar2'
Expanded "REQUEST_URI_RAW|REQUEST_HEADERS|ARGS|ARGS_NAMES" to "REQUEST_URI_RAW|REQUEST_HEADERS:Host|REQUEST_HEADERS:User-Agent|REQUEST_HEADERS:Accept|REQUEST_HEADERS:Content-Type|REQUEST_HEADERS:Content-Length|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node".

dune73 · 2025-04-25T13:45:05Z

This was what I expected. Thanks for the confirmation. Very good.

fzipi

No tests?

fzipi · 2025-04-26T12:41:05Z

apache2/msc_xml.c

+static void msc_xml_on_start_elementns(
+    void *ctx,
+    const xmlChar *localname,
+    const xmlChar *prefix,
+    const xmlChar *URI,
+    int nb_namespaces,
+    const xmlChar **namespaces,
+    int nb_attributes,
+    int nb_defaulted,
+    const xmlChar **attributes
+) {


Is this new formatting for the code? Or was adopted as standard already?

Having mixed format in parameters, in general makes more difficult reading the code. So my suggestion will be:

use the same format as all other files

propose a new standard

once accepted, apply to all files once and for all

enforce the new format in the pipeline.

What do you mean by "new"? AFAICT, there are already multiple different formats for long function signatures (e.g., sec_guarstatic void dian_logger, static void copy_rules_phase).

Sorry, could you show me an example what can I assume as "standard" already?

fzipi · 2025-04-26T12:41:24Z

apache2/msc_xml.c

+static void msc_xml_on_end_elementns(
+    void* ctx,
+    const xmlChar* localname,
+    const xmlChar* prefix,
+    const xmlChar* URI
+) {


apache2/msc_xml.c

apache2/msc_xml.h

apache2/apache2_config.c

theseion · 2025-04-26T17:33:56Z

apache2/msc_xml.c

+static void msc_xml_on_start_elementns(
+    void *ctx,
+    const xmlChar *localname,
+    const xmlChar *prefix,
+    const xmlChar *URI,
+    int nb_namespaces,
+    const xmlChar **namespaces,
+    int nb_attributes,
+    int nb_defaulted,
+    const xmlChar **attributes
+) {


What do you mean by "new"? AFAICT, there are already multiple different formats for long function signatures (e.g., sec_guarstatic void dian_logger, static void copy_rules_phase).

apache2/msc_xml.c

apache2/re_actions.c

apache2/modsecurity.h

airween · 2025-04-26T17:58:07Z

No tests?

I know we should start at some point to add tests, but unfortunately almost all of directives and features do not have tests. In project MRTS I really hope once we will arrive that point where we can cover all directives, arguments, operators and actions. But not yet.

NB: in case of libmodsecurity3 implementation there are many new tests, but it's more easier.

Co-authored-by: Felipe Zipitría <[email protected]>

Co-authored-by: Max Leske <[email protected]>

theseion · 2025-04-27T06:29:08Z

I know we should start at some point to add tests, but unfortunately almost all of directives and features do not have tests. In project MRTS I really hope once we will arrive that point where we can cover all directives, arguments, operators and actions. But not yet.

NB: in case of libmodsecurity3 implementation there are many new tests, but it's more easier.

I have only a rough working knowledge of C/C++, so it's very likely that I missed stuff during the review. Knowing how many mistakes I make in languages that I'm proficient in I feel rather uncomfortable saying that code is good. Tests would give me much more confidence. I do understand your situation though. Ultimately, it's up to you (you would probably have to carry the weight of bug fixes and user inquiries).

airween · 2025-04-27T09:22:43Z

I have only a rough working knowledge of C/C++, so it's very likely that I missed stuff during the review. Knowing how many mistakes I make in languages that I'm proficient in I feel rather uncomfortable saying that code is good. Tests would give me much more confidence. I do understand your situation though. Ultimately, it's up to you (you would probably have to carry the weight of bug fixes and user inquiries).

I completely agree with you. Making a review without tests is very hard.

Fortunately the two codes (this and #3363) is almost the same, but yes, we can't declare that they are the same, and the other PR's test cases cover this PR too.

My problem is in this case that there are many-many tests are missing. Adding a unique test against this PR just makes the CI more complicated. The solution is that we find out how can we make a framework when we can write tests against directives and their arguments. (This is why I started MRTS project.)

Co-authored-by: Max Leske <[email protected]>

…ery time

airween · 2025-04-28T19:17:54Z

I added two new commits:

4c043a0 - changed the directive SecXMLintoArgs to SecXmlIntoArgs, as we discussed here
19b7e98 - there is a small change in the node value's parsing: if the node value contains a multi-byte character the SAX calls the function multiple times, so we need to concatenate those sub-strings, not creating a new string every time

apache2/msc_xml.c

Co-authored-by: Max Leske <[email protected]>

sonarqubecloud · 2025-04-30T06:52:04Z

Quality Gate failed

Failed conditions
6 Security Hotspots
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Finish XMLArgs processing

0c8cc6e

airween changed the title ~~Finish XMLArgs processing~~ feat: improved XMLArgs processing Apr 7, 2025

airween added the 2.x Related to ModSecurity version 2.x label Apr 20, 2025

airween mentioned this pull request Apr 20, 2025

feat: improved XMLArgs processing #3363

Open

Add nullptr check conditions

1953e37

airween requested review from theseion and fzipi April 26, 2025 09:40

fzipi reviewed Apr 26, 2025

View reviewed changes

theseion requested changes Apr 26, 2025

View reviewed changes

airween and others added 12 commits April 26, 2025 20:01

Remove unnecessary comment.

c24ad68

Co-authored-by: Felipe Zipitría <[email protected]>

Format comment

78ca32f

Co-authored-by: Felipe Zipitría <[email protected]>

Comment clarification

055aec7

Co-authored-by: Felipe Zipitría <[email protected]>

Debug message clarification

3829d65

Co-authored-by: Max Leske <[email protected]>

Typo fix.

321c554

Co-authored-by: Max Leske <[email protected]>

Typo fix.

c3ab480

Co-authored-by: Max Leske <[email protected]>

Typo fix.

c19f901

Co-authored-by: Max Leske <[email protected]>

Typo fix.

bfe8047

Co-authored-by: Max Leske <[email protected]>

Typo fix.

21d71bb

Co-authored-by: Max Leske <[email protected]>

Typo fix.

f1ecdb1

Co-authored-by: Max Leske <[email protected]>

Typo fix.

b518823

Co-authored-by: Max Leske <[email protected]>

Typo fix.

ed24e70

Co-authored-by: Max Leske <[email protected]>

airween and others added 2 commits April 27, 2025 11:25

Fix retval logic explanation

c11bd6c

Fix error message explanation.

bd45108

Co-authored-by: Max Leske <[email protected]>

theseion previously approved these changes Apr 27, 2025

View reviewed changes

Change directive format to strict camel case

4c043a0

airween dismissed theseion’s stale review via 4c043a0 April 28, 2025 19:05

Change node value's parsing to concatenate instead of duplicate it ev…

19b7e98

…ery time

airween requested review from theseion and fzipi April 28, 2025 19:18

theseion requested changes Apr 30, 2025

View reviewed changes

apache2/msc_xml.c Outdated Show resolved Hide resolved

Update explanation

87cbf9e

Co-authored-by: Max Leske <[email protected]>

theseion approved these changes May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improved XMLArgs processing #3358

feat: improved XMLArgs processing #3358

airween commented Apr 7, 2025

RedXanadu commented Apr 25, 2025

dune73 commented Apr 25, 2025

airween commented Apr 25, 2025

dune73 commented Apr 25, 2025

fzipi left a comment

fzipi Apr 26, 2025

theseion Apr 26, 2025

airween Apr 26, 2025

fzipi Apr 26, 2025

theseion Apr 26, 2025

airween commented Apr 26, 2025

theseion commented Apr 27, 2025

airween commented Apr 27, 2025

airween commented Apr 28, 2025

sonarqubecloud bot commented Apr 30, 2025

feat: improved XMLArgs processing #3358

Are you sure you want to change the base?

feat: improved XMLArgs processing #3358

Conversation

airween commented Apr 7, 2025

what

why

references

RedXanadu commented Apr 25, 2025

dune73 commented Apr 25, 2025

airween commented Apr 25, 2025

dune73 commented Apr 25, 2025

fzipi left a comment

Choose a reason for hiding this comment

fzipi Apr 26, 2025

Choose a reason for hiding this comment

theseion Apr 26, 2025

Choose a reason for hiding this comment

airween Apr 26, 2025

Choose a reason for hiding this comment

fzipi Apr 26, 2025

Choose a reason for hiding this comment

theseion Apr 26, 2025

Choose a reason for hiding this comment

airween commented Apr 26, 2025

theseion commented Apr 27, 2025

airween commented Apr 27, 2025

airween commented Apr 28, 2025

sonarqubecloud bot commented Apr 30, 2025

Quality Gate failed