Skip to content

Conversation

mhucka
Copy link
Contributor

@mhucka mhucka commented Sep 24, 2025

CodeQL security scan report #567 flagged a data loading operation in src/openfermion/utils/operator_utils.py as being usafe due because it uses a user-provided value. The warning is about lines 282-283, involving the code

with open(file_path, 'rb') as f:
    data = marshal.load(f)

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

The problem here is that loading serialized Python objects using marshal is inherently unsafe, because the format is not designed to be secure against malicious or corrupted data. Unfortunately, we probably can't simply switch to a safer serialization method (e.g., JSON or protobufs) because users may already have saved files in this format. We should maintain backward compatibility with people's existing files.

This PR adds more checks on the data read by load_operator(), including checks on maximum file size.

Note: the maximum file size values are hardwired as constants near the top of this file. I don't have much evidence for what would be reasonable max values, so made some guesses based on other files found in the repository. Someone with longer experiencing using OpenFermion should double-check the values.

CodeQL security [scan report #567](https://github.com/quantumlib/OpenFermion/security/code-scanning/567) flagged a data loading operation in `src/openfermion/utils/operator_utils.py` as being usafe due because it uses a user-provided value. The warning is about lines 282-283, involving the code

```python
with open(file_path, 'rb') as f:
    data = marshal.load(f)
```

> Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

This changes the code to use the `json` package instead of the `marshal` package and extracts the data more carefully.
@mhucka mhucka changed the title Fix security scan warning about deserialization of data Fix #1119: avoid unsafe approach to deserializing data Sep 25, 2025
@mhucka mhucka added the area/health Involves code and/or project health label Sep 26, 2025
@mhucka mhucka force-pushed the mh-fix-deserialization branch from 2fa6f58 to aa8e819 Compare September 26, 2025 03:58
@mhucka mhucka force-pushed the mh-fix-deserialization branch from aa8e819 to c8fab8a Compare September 26, 2025 04:08
Python pytest coverage complains about a line not being covered where a
function is being mocked. I don't see a way to actually test this, so
for now I added a pragma no cover.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health Involves code and/or project health
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant