-
Notifications
You must be signed in to change notification settings - Fork 1.6k
ENH, TST: improvements to CI testing #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can use Intel SDE. BLIS project uses SDE to check different arches. (https://github.com/flame/blis/blob/master/travis/do_sde.sh) |
If you could integrate your ideas - namely manylinux1 as oldest imaginable glibc - in current CI config. |
|
With SDE I can reproduce the issue mentioned at numpy/numpy#13401 (On CoffeeLake emulating SkyLakeX) with 0.3.5. |
@tylerjereddy, was numpy/numpy#13401 fixed with 0.3.6? I still see it with SDE.
|
@isuruf I believe the "fix" is actually a disabling of the kernel in OpenBLAS for now: #2061 I've just started playing with the SDE in CI on my fork. I suppose the fail before / succeed after scenario is actually for SDE set to skx and OpenBLAS not using the kernel by default? I don't know if that "regression" test is easy to do from NumPy's perspective, and it will of course change when the instructions can be used some day. |
|
What I'm trying to say is SKYLAKEX is still problematic when used with SDE. |
So, if I tell the SDE to use skx, but I don't tell OPENBLAS which coretype it should use, should the linked patch prevent OPENBLAS from using the problematic Skylake kernel by default? It would fall back to some other kernel? End users can solve the problem by using |
Yes, even if you tell OPENBLAS to use SKYLAKEX, problematic kernel will not be used in openblas 0.3.6. So, the numpy test should pass for any combination of SDE cpuid and OPENBLAS core type. (given that SDE cpuid is newer or equal to the OPENBLAS core type) |
It was noted with 0.3.5-dev which is some development version from this year. Most likely problem was introduced with 0.3.3 , certainly not earlier. @isuruf could you confirm 0.3.6 solves the issue over 0.3.5 ? |
0.3.6 doesn't. I tested with SDE, not actual hardware |
The 0.3.6 "fix" specifically disables the SkylakeX DGEMM kernel in response to reports in #1955 (a Julia issue) and #2029 (with a computational chemistry package) but none of the other SkylakeX kernels contributed by fenrus75. There is no guarantee that this will also solve any of the problems you appear to be seeing with numpy. |
Noted, I'll likely end up running NumPy full linear algebra test suite with currently-distributed vs. latest develop version of OpenBLAS linked, to see if it helps once I'm setup with the emulation stuff. |
Hmm, I'm still stuck getting
I must be doing something wrong since I've not used the SDE before today. Can I not use those |
@tylerjereddy, did you build with |
Most recent log should be visible here: https://dev.azure.com/tylerjereddy/numpy-test/_build/results?buildId=821 My source for the CI runs in my fork: https://github.com/tylerjereddy/numpy/blob/intel-sde-azure/azure-pipelines.yml#L42 The OpenBLAS is pre-built from the MacPython ecosystem, but I believe we tend to use |
@tylerjereddy, manylinux1 gcc doesn't support AVX512, so the openblas library on numpy wheels can only support upto haswell even on newer cpus like skylakex. |
@isuruf So the original issue was reported on Mac, using pip/ wheels & OpenBLAS built for Mac, presumably with clang--so I'll have to switch to Mac then I suppose. Perhaps I was confused because you mentioned reproduction with the linux SDE of the issue above in your examples -- I assume you built NumPy locally with your own OpenBLAS builds in those cases? |
No, used conda's openblas and numpy |
The log shows OpenBLAS 0.3.3 ? That includes quite experimental USE_TLS=1 in Makefile.rule that you must patch out from the file as it cannot be overridden with parameters later. |
@tylerjereddy - maybe we should start building manylinux2010 wheels to test with? It's a fairly easy set of fixes to our current setup: https://github.com/matthew-brett/multibuild/issues/238 |
Or build clang 8.0.0 on manylinux1, which I do for some wheels. (Can't do it on CI except Azure) |
Yeah, that's not true, but admittedly confusing--the version numbers on OpenBLAS file names produced in the "MacPython" ecosystem are a little problematic. Luckily, the file names do usually contain a hash for the OpenBLAS develop commit used for the build if it is not a stable release. So, that 0.3.3 file name is actually some commit on the 0.3.5.dev OpenBLAS. People don't believe me when I say we need better version detection & auditing infrastructure, but anyway we now have
Good idea--for now, I'm just trying to put out the current fire for NumPy / SciPy wheels. |
Hi all, I just set up Azure pipeline. How could I configure yaml? Is it similar to travis-ci? |
@xianyi Awesome! You added the Azure pipelines app for the project? The YML config is a little different, but pretty well documented now. Many downstream projects like NumPy & SciPy have pretty detailed examples, etc., so could be slowly built up as long as the integration / PR hooks are active now. Usually there are one or two tricky steps in the setup to get it just right. |
Yes! Azure pipelines now running in #2121. It can also be useful to turn on boards in the admin config, since they're currently needed for showing some things like coverage reports: |
I enable the board. Please use the following URL. |
There have been some discussions downstream of OpenBLAS about how we might help catch issues sooner, perhaps by contributing to your CI. I have a few thoughts here, which you may very well reject, but better to know that before I try to put time in or open a PR.
ppc64le
/ POWER8 build, which we use downstream for native testing on that platform in CI.Are you open to some of these ideas--could it be helpful? The worst part is probably that a maintainer would likely have to set up the app integrations for those CI services, but that's not too bad if we can submit the PRs to help build up the testing a bit more.
This may also help change the dialogue from us just showing up and asking for bug fixes to a way to help out a little bit so that both sides avoid rushing new releases for suddenly-discovered issues at or just after release time.
There's definitely some contentious debate about what the appropriate medium for the testing is. There's also the concern about being able to test newer archs like SkylakeX with the recent issues--that seems harder / not something more CI can easily solve unless the VMs will support the instructions somehow.
I think there's also a buildbot that tests OpenBLAS on some archs, though I'm not sure how regularly, and I believe i.e., Skylake may not be available there.
The text was updated successfully, but these errors were encountered: