Description
Bug report
Bug description:
This is minimal reproduction of this downstream bug report: astral-sh/uv#10619
The output of compileall.compile_file
depends on the order in which the files are compiled. This means compilation is non-deterministic if builds are distributed over a process pool.
This becomes a problem when building docker images, where you usually bytecode compile ahead of time for faster startup, and where the hash of the image depends on all files in the image, including the .pyc
files.
Specifically, the output of
a = {"foo", 2, 3}
def f():
b = {"foo", 2, 3}
is different if we previously compiled another file with
import foo
Reproducer script:
#!/bin/bash
set -e
script=$(cat << EOF
import compileall
import sys
for path in sys.argv[1:]:
compileall.compile_file(path)
EOF
)
cat << EOF > a.py
import foo
EOF
cat << EOF > b.py
a = {"foo", 2, 3}
def f():
b = {"foo", 2, 3}
EOF
# Both files
rm -rf __pycache__
python3.14 -c "$script" a.py b.py
sha256sum __pycache__/b.cpython-314.pyc
# For debugging
cp __pycache__/b.cpython-314.pyc b1.cpython-314.pyc
# Single file only
rm -rf __pycache__
python3.14 -c "$script" b.py
sha256sum __pycache__/b.cpython-314.pyc
# For debugging
cp __pycache__/b.cpython-314.pyc b2.cpython-314.pyc
This is caused be different refcounts in the marshalled files:
import marshal
import sys
with open("b1.cpython-313.pyc", "rb") as f:
f.read(16) # Skip header
pyc1 = marshal.load(f)
with open("b2.cpython-313.pyc", "rb") as f:
f.read(16) # Skip header
pyc2 = marshal.load(f)
print(sys.getrefcount(pyc1.co_consts[0]))
print(sys.getrefcount(pyc2.co_consts[0]))
This prints 2
and 3
.
The original report is from 3.13, i've reproduced it with 3.14.0a4. It happens at least on linux and windows.
CPython versions tested on:
3.14
Operating systems tested on:
Linux