Skip to content

feat: improved XMLArgs processing #3358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: v2/master
Choose a base branch
from

Conversation

airween
Copy link
Member

@airween airween commented Apr 7, 2025

what

This PR adds a new feature within XML processing.

Old (current) behavior: in case of XML:/* target the body processor expands the node values from the XML payload. Eg.:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
</root>

will produce this value:

[/post][9] Target value: "  foo1  bar1  foo2  bar2"

In this case, there is no option to exclude any node. For example, if a node contains a term that a rule is looking for, the administrator could not create an exclusion. The only solution is to exclude the whole rule.

New behavior: there is a new configuration keyword, SecParseXMLintoArgs with possible values On, Off and OnlyArgs. The default value is Off. This won't change anything. If the administrator set this to On, then the engine will parse the XML into ARGS AND the XML:/* target will still contain the only text content as before. If the value is OnlyArgs then only the parsed content will appear in ARGS target; the XML:/* target won't contain the parsed content anymore.

If administrator sets it to On, then the node values will appear in ARGS, and it's easy to make any exclusion against the named target.

why

A customer request has been received to solve this.

references

See #3178.

@airween airween changed the title Finish XMLArgs processing feat: improved XMLArgs processing Apr 7, 2025
@airween airween added the 2.x Related to ModSecurity version 2.x label Apr 20, 2025
@RedXanadu
Copy link

This is a great new feature. This will open up ModSecurity to anyone who needs to do serious processing of XML APIs (lots of legacy and current applications!). Especially with pre-written rule sets like CRS, this makes the task of handling false positives possible.

Thank you for the work that has gone into this 🚀

@dune73
Copy link
Member

dune73 commented Apr 25, 2025

@airween Could you share how the new option parses / advertises multi-level documents with multiple leaves carrying the same name? Is the hierarchy part of the name or is that hidden?

Needless to say, that I really like this option.

@airween
Copy link
Member Author

airween commented Apr 25, 2025

@dune73,

I hope I understand your question as well 😄, so consider this file:

cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
</root>

and this request:

curl -v -H "Content-Type: application/xml" -X POST -d @test.xml http://localhost/post.php

This will generates these arguments (it's totally the same as in case of JSON):

Adding XML argument 'xml.root.level1.level2.node' with value 'foo1'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar1'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo2'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar2'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo1'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar1'
Adding XML argument 'xml.root.level1.level2.node' with value 'foo2'
Adding XML argument 'xml.root.level1.level2.node' with value 'bar2'
Expanded "REQUEST_URI_RAW|REQUEST_HEADERS|ARGS|ARGS_NAMES" to "REQUEST_URI_RAW|REQUEST_HEADERS:Host|REQUEST_HEADERS:User-Agent|REQUEST_HEADERS:Accept|REQUEST_HEADERS:Content-Type|REQUEST_HEADERS:Content-Length|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node|ARGS_NAMES:xml.root.level1.level2.node".

@dune73
Copy link
Member

dune73 commented Apr 25, 2025

This was what I expected. Thanks for the confirmation. Very good.

@airween airween requested review from theseion and fzipi April 26, 2025 09:40
Copy link
Contributor

@fzipi fzipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests?

Comment on lines +17 to +27
static void msc_xml_on_start_elementns(
void *ctx,
const xmlChar *localname,
const xmlChar *prefix,
const xmlChar *URI,
int nb_namespaces,
const xmlChar **namespaces,
int nb_attributes,
int nb_defaulted,
const xmlChar **attributes
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new formatting for the code? Or was adopted as standard already?

Having mixed format in parameters, in general makes more difficult reading the code. So my suggestion will be:

  • use the same format as all other files
  • propose a new standard
  • once accepted, apply to all files once and for all
  • enforce the new format in the pipeline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "new"? AFAICT, there are already multiple different formats for long function signatures (e.g., sec_guarstatic void dian_logger, static void copy_rules_phase).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you show me an example what can I assume as "standard" already?

Comment on lines +52 to +57
static void msc_xml_on_end_elementns(
void* ctx,
const xmlChar* localname,
const xmlChar* prefix,
const xmlChar* URI
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines +17 to +27
static void msc_xml_on_start_elementns(
void *ctx,
const xmlChar *localname,
const xmlChar *prefix,
const xmlChar *URI,
int nb_namespaces,
const xmlChar **namespaces,
int nb_attributes,
int nb_defaulted,
const xmlChar **attributes
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "new"? AFAICT, there are already multiple different formats for long function signatures (e.g., sec_guarstatic void dian_logger, static void copy_rules_phase).

@airween
Copy link
Member Author

airween commented Apr 26, 2025

No tests?

I know we should start at some point to add tests, but unfortunately almost all of directives and features do not have tests. In project MRTS I really hope once we will arrive that point where we can cover all directives, arguments, operators and actions. But not yet.

NB: in case of libmodsecurity3 implementation there are many new tests, but it's more easier.

airween and others added 12 commits April 26, 2025 20:01
Co-authored-by: Felipe Zipitría <[email protected]>
Co-authored-by: Felipe Zipitría <[email protected]>
Co-authored-by: Felipe Zipitría <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
@theseion
Copy link
Collaborator

I know we should start at some point to add tests, but unfortunately almost all of directives and features do not have tests. In project MRTS I really hope once we will arrive that point where we can cover all directives, arguments, operators and actions. But not yet.

NB: in case of libmodsecurity3 implementation there are many new tests, but it's more easier.

I have only a rough working knowledge of C/C++, so it's very likely that I missed stuff during the review. Knowing how many mistakes I make in languages that I'm proficient in I feel rather uncomfortable saying that code is good. Tests would give me much more confidence. I do understand your situation though. Ultimately, it's up to you (you would probably have to carry the weight of bug fixes and user inquiries).

@airween
Copy link
Member Author

airween commented Apr 27, 2025

I have only a rough working knowledge of C/C++, so it's very likely that I missed stuff during the review. Knowing how many mistakes I make in languages that I'm proficient in I feel rather uncomfortable saying that code is good. Tests would give me much more confidence. I do understand your situation though. Ultimately, it's up to you (you would probably have to carry the weight of bug fixes and user inquiries).

I completely agree with you. Making a review without tests is very hard.

Fortunately the two codes (this and #3363) is almost the same, but yes, we can't declare that they are the same, and the other PR's test cases cover this PR too.

My problem is in this case that there are many-many tests are missing. Adding a unique test against this PR just makes the CI more complicated. The solution is that we find out how can we make a framework when we can write tests against directives and their arguments. (This is why I started MRTS project.)

theseion
theseion previously approved these changes Apr 27, 2025
@airween
Copy link
Member Author

airween commented Apr 28, 2025

I added two new commits:

  • 4c043a0 - changed the directive SecXMLintoArgs to SecXmlIntoArgs, as we discussed here
  • 19b7e98 - there is a small change in the node value's parsing: if the node value contains a multi-byte character the SAX calls the function multiple times, so we need to concatenate those sub-strings, not creating a new string every time

@airween airween requested review from theseion and fzipi April 28, 2025 19:18
Co-authored-by: Max Leske <[email protected]>
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to ModSecurity version 2.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants