Skip to content

Parallel build_ext might fail due to file collisions #30873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dr-Irv opened this issue Jan 9, 2020 · 17 comments
Closed

Parallel build_ext might fail due to file collisions #30873

Dr-Irv opened this issue Jan 9, 2020 · 17 comments
Assignees
Labels
Build Library building on various platforms Windows Windows OS

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 9, 2020

Code Sample, a copy-pastable example if possible

python setup.py build_ext --inplace -j 8

Problem description

See some discussion in #30862 . I have 8 cores on Windows. When I use -j 8, in the middle of the build, I will get this error message:

c:\Code\pandas_dev\pandas\pandas\_libs\tslibs\src\datetime\np_datetime.c : fatal error C1083: Cannot open compiler generated file: 'c:\Code\pandas_dev\pandas\build\temp.win-amd64-3.7\Release\pandas\_libs\tslibs\src\datetime\np_datetime.obj': Permission denied

What I think is happening is that in setup.py, _libs.tslibs.conversion, libs.tslibs.np_datetime and _libs.tslibs.period all have to compile pandas/_libs/tslibs/src/datetime/np_datetime.c . If the parallel build is timed in such a way that two of those extensions are being built at the same time, then they can conflict in writing np_datetime.obj. A similar thing can happen with _libs.lib, _libs.parsers, and _libs.tslibs.parsing all having to compile pandas/_libs/src/parser/tokenizer.c .

If I do two python setup.py build_ext --inplace -j 8 steps in a row, then everything is fine. Also things are fine with -j 4. But I can imagine that even with -j 4, you could end up with the same kind of issue, even in CI.

I don't know enough about how extensions are built, distutils and setuptools in order to avoid this potential collision during the build process.

Output of pd.show_versions()

c:\Code\pandas_dev\pandas\pandas\core\index.py:29: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
FutureWarning,

INSTALLED VERSIONS

commit : 13858f6
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.26.0.dev0+1734.g13858f6e1
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191201
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.57.1
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.46.0

@jbrockmendel jbrockmendel added the Build Library building on various platforms label Jan 10, 2020
@WillAyd
Copy link
Member

WillAyd commented Jan 10, 2020

You are using the MSVC compiler right? Not sure if it matters just asking

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 10, 2020

You are using the MSVC compiler right? Not sure if it matters just asking

Yes.

@alimcmaster1
Copy link
Member

I sometimes do pandas dev on windows and recall running into this exact problem with -j 4

@alimcmaster1
Copy link
Member

Maybe @scoder or someone from Cython can help?

@mroeschke mroeschke added the Windows Windows OS label Apr 19, 2020
@adamjstewart
Copy link
Contributor

Was this fixed by #30862? Can this be closed now?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 7, 2021

@adamjstewart Don't think so. I opened this one while #30862 was being developed.

@mgorny
Copy link
Contributor

mgorny commented Mar 7, 2021

This is actually a pretty awful bug in distutils. Basically, if you use the same source file in two extensions, distutils can hit pretty nasty race conditions compiling the file twice simultaneously. This is especially bad on Linux because there's no immediate error, but the installed extensions are simply broken (e.g. missing symbols). I'm going to submit a PR shortly.

@leycec
Copy link

leycec commented May 27, 2021

I'm equally astonished, appalled, and disappointed that Pandas developers like @jreback pushed back against #40285 with nonsensical commentary like "why is j more than like 4 actually useful?" I mean, come on. Scalability is an unconditional good. This is not up for meaningful debate. 🤦

The trivial resolution here is to do what everyone else in Python's standard scientific stack does: disable parallel building entirely by forcing -j 1 in setup.py. Yes, that's demonstrably awful – but there are no working alternatives on the table. Forcing -j 1 at C extension build time is exactly what SciPy and everyone else does. That's the MostlyRightThing™ to do, because setuptools upstream has no interest or intention of resolving this within the expected lifetime of our Universe.

Fortunately, this is trivial. Just override the finalize_options() method of your custom build_ext class in setup.py to resemble:

class build_ext(_build_ext):
    def finalize_options(self):
        super().finalize_options()

        # Disable distutils parallel build due to upstream race conditions. See
        # pandas-dev/pandas#30873
        if self.parallel:
            print("NOTE: -j build option not supported.")
        self.parallel = None

Boom! Issue trivially solved. Pandas then happily munch on bamboo shoots as a show of appreciation. 🎍 🐼

@jonashaag
Copy link
Contributor

I can reliably reproduce this on Windows, see e.g. #46611

@WillAyd
Copy link
Member

WillAyd commented Apr 4, 2022

Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows

@WillAyd
Copy link
Member

WillAyd commented Apr 4, 2022

Also does stdout show the MSVC as actually using the -j flag during compilation or is it mapped to something else? MSVC should be using /MP as the flag, so curious if that gets translated by setuptools along the way

https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=msvc-170

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 6, 2022

Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows

Given the discussion above, I don't think it's a compiler issue. It has to do with how setuptools has no intention of fixing this.

@WillAyd
Copy link
Member

WillAyd commented Apr 6, 2022 via email

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 6, 2022

Might be overlooking but where is the setuptools discussion?

See above, particularly #30873 (comment) and #30873 (comment)

@WillAyd
Copy link
Member

WillAyd commented Apr 6, 2022

I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?

The closest I could find is this. It is similar but not exactly the same:

pypa/setuptools#2442

From personal experience I have found the MSVC to be pretty lacking compared to other compilers, from standards complaince, feature, and error / warning reporting perspectives. So it would be good to at least try some other compilers out and figure out if this is an OS issue or a compiler issue

Depending on what we can figure out there we can try to report upstream in the appropriate channel. I think we all agree this isn't necessarily a pandas issue but if its something we want addressed we can likely push this along forward process of elimination

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Apr 6, 2022

I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?

See my original description at the top of this issue here; #30873 (comment)

The parallelization of the process is done by setuptools where it takes each extension that we want to build (see setup.py starting at line 441), and then just fires of parallel builds, one for each extension up to the limit specified. Since some of the extensions are dependent on the same C file, it could be the case that two parallel compiles are trying to write the same OBJ file at the same time. I'm guessing that MSVC is putting a lock on the OBJ file (probably smart to do), so it then reports an error.

Based on comments above, it could happen in Linux too.

Probably the right solution for us is to split the building of the extensions in two parts. One part where there are no conflicts, and you can use the -j 8 flag, and the other where the extensions must be built sequentially, so you force "-j 1" on that part.

@lithomas1
Copy link
Member

closed by #51525.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Windows Windows OS
Projects
None yet
10 participants