Fix #1119: avoid unsafe approach to deserializing data #1141

mhucka · 2025-09-24T23:27:56Z

CodeQL security scan report #567 flagged a data loading operation in src/openfermion/utils/operator_utils.py as being usafe due because it uses a user-provided value. The warning is about lines 282-283, involving the code

with open(file_path, 'rb') as f:
    data = marshal.load(f)

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

The problem here is that loading serialized Python objects using marshal is inherently unsafe, because the format is not designed to be secure against malicious or corrupted data. Unfortunately, we probably can't simply switch to a safer serialization method (e.g., JSON or protobufs) because users may already have saved files in this format. We should maintain backward compatibility with people's existing files.

This PR adds more checks on the data read by load_operator(), including checks on maximum file size.

Note: the maximum file size values are hardwired as constants near the top of this file. I don't have much evidence for what would be reasonable max values, so made some guesses based on other files found in the repository. Someone with longer experiencing using OpenFermion should double-check the values.

CodeQL security [scan report #567](https://github.com/quantumlib/OpenFermion/security/code-scanning/567) flagged a data loading operation in `src/openfermion/utils/operator_utils.py` as being usafe due because it uses a user-provided value. The warning is about lines 282-283, involving the code ```python with open(file_path, 'rb') as f: data = marshal.load(f) ``` > Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code. This changes the code to use the `json` package instead of the `marshal` package and extracts the data more carefully.

src/openfermion/utils/operator_utils.py

Python pytest coverage complains about a line not being covered where a function is being mocked. I don't see a way to actually test this, so for now I added a pragma no cover.

mhucka changed the title ~~Fix security scan warning about deserialization of data~~ Fix #1119: avoid unsafe approach to deserializing data Sep 25, 2025

mhucka added the area/health Involves code and/or project health label Sep 26, 2025

github-advanced-security bot found potential problems Sep 26, 2025

View reviewed changes

src/openfermion/utils/operator_utils.py Show resolved Hide resolved

src/openfermion/utils/operator_utils.py Fixed Show fixed Hide fixed

mhucka force-pushed the mh-fix-deserialization branch from 2fa6f58 to aa8e819 Compare September 26, 2025 03:58

Resolve conflicts during merge

c8fab8a

mhucka force-pushed the mh-fix-deserialization branch from aa8e819 to c8fab8a Compare September 26, 2025 04:08

mhucka added 5 commits September 26, 2025 04:13

Move imports of marshal to the top of the file

bb19678

Remove unneeded import of built-in functions zip and map

7edf089

Fix formatting of imports

0648c39

Merge branch 'master' into mh-fix-deserialization

875b6d9

Silence a difficult coverage warning

a5c7098

Python pytest coverage complains about a line not being covered where a function is being mocked. I don't see a way to actually test this, so for now I added a pragma no cover.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #1119: avoid unsafe approach to deserializing data #1141

Fix #1119: avoid unsafe approach to deserializing data #1141

Uh oh!

mhucka commented Sep 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix #1119: avoid unsafe approach to deserializing data #1141

Are you sure you want to change the base?

Fix #1119: avoid unsafe approach to deserializing data #1141

Uh oh!

Conversation

mhucka commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mhucka commented Sep 24, 2025 •

edited

Loading