Skip to content

Add ARM tests on Travis #1864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 12, 2018
Merged

Add ARM tests on Travis #1864

merged 1 commit into from
Nov 12, 2018

Conversation

aytekinar
Copy link
Contributor

I have updated Travis' YAML file to add emulated tests for ARMV6 and ARMV8 architectures using the Alpine Docker images.

The idea has come from @brada4 (see comment in #1861). Because the emulated builds take a lot of time, I have allowed the jobs to fail.

This is just an initial attempt to provide such a support in Travis. Please feel free to comment, improve/modify or even close the PR as you wish.

@martin-frbg
Copy link
Collaborator

Looks like it almost made it in time, pity qemu is so slow when it has to emulate a cpu architecture.
Do you see the same arm_neon.h failure in local builds of ARMV8 ? I wonder if this failure is real, as there was some recent controversy over how much of the thunderx2 optimizations are generic enough to allow using them for the ARMV8 target.

@brada4
Copy link
Contributor

brada4 commented Nov 9, 2018

Maybe it is possible to sabotage slow tests by not installing gfortran and make the builds finish in time? It is like 10% of time still left to do at that point of failure if it was x86_64

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 9, 2018

Doubt it - he already dropped LAPACK and CBLAS from the build if I read that correctly.

@aytekinar
Copy link
Contributor Author

aytekinar commented Nov 9, 2018

Do you see the same arm_neon.h failure in local builds of ARMV8 ? I wonder if this failure is real, as there was some recent controversy over how much of the thunderx2 optimizations are generic enough to allow using them for the ARMV8 target.

Uhm... I can try, when I have time. However, I will do an emulated build locally, as I do not have an ARM64 device at hand. I do use my RPi2s for ARM32v7 builds when needed for my project. As for the other thing, i.e., the controversy, I do not have much information (and I am not knowledgable, at all, when it comes to architectures). But what I can see there is that it might be a clang related thing (the build starts and proceeds with gcc). It might be worth trying with some debian:stretch-slim image --- I have had some problems with musl-c libraries and their transition packages and headers in alpine. Maybe I am missing some package there, too.

Maybe it is possible to sabotage slow tests by not installing gfortran and make the builds finish in time? It is like 10% of time still left to do at that point of failure if it was x86_64

I had thought about the same thing. I have created an organization openblas at Docker Hub, ownership of which I can happily hand over to any of the contributors here, and I will create and push the alpine images which have the dependencies preinstalled. Then, changing the Dockerfile to have as fewer layers as possible could make at least the builds finish.

EDIT. Well, it indeed helped --- now it times out at 99% 🤣 Anyways... It was a nice try, IMO.

Doubt it - he already dropped LAPACK and CBLAS from the build if I read that correctly.

Correct. That was intentional, as I had thought that the BLAS part could be more than enough to test if the builds proceed.

@brada4
Copy link
Contributor

brada4 commented Nov 9, 2018

MUSL is tested in other alpine linux tests , does not hurt to test extra. Debian (and glibc) might be 1c bigger to stuff in 50 minutes we have... I think first thing that does not time out is already a revolution....

@aytekinar
Copy link
Contributor Author

Well, base image for Alpine is ~5MB whereas Debian's slim images are around 17MB. But you are probably right in that when we bundle them together with glibc and friends, they will eventually add up and exceed the size of Alpine by far.

Anyways... It is a pity that the builds timeout at 99% :(

@martin-frbg
Copy link
Collaborator

Maybe restricting the build to the static libopenblas would help (though this would drop the completeness check with gensymbol/linktest.c)

@brada4
Copy link
Contributor

brada4 commented Nov 9, 2018

Either way worth keeping as an open PR for reminder and retrying when ARM code changes a lot..

@martin-frbg
Copy link
Collaborator

BUILD_SHARED=NO does not get us past 98/99 percent either. Trying with make -j 3 now in my fork although I doubt it is i/o-bound.

@martin-frbg
Copy link
Collaborator

According to pkgs.alpinelinux.org, arm_neon.h should be available in /usr/lib/clang/5.0.1/include as part of the clang-dev package on aarch64.

@aytekinar
Copy link
Contributor Author

According to pkgs.alpinelinux.org, arm_neon.h should be available in /usr/lib/clang/5.0.1/include as part of the clang-dev package on aarch64.

I have updated the images --- now they have clang-dev. Please feel free to make PR's to aytekinar/openblas-alpine to modify the Docker image as needed. In fact, I have also invited you, @martin-frbg, as a collaborator to the repo.

@martin-frbg
Copy link
Collaborator

Thank you. The clang-aarch64 build no longer errors now, though obviously it still times out like the others. I am still experimenting with running more than 2 concurrent make jobs in my fork - make -j 3 looked like it made the two gcc-based builds run to completion but I am no longer sure if it was actually using 3 threads or if the setting was overruled by getarch and I was just lucky.

@aytekinar
Copy link
Contributor Author

How about -j$(nproc)? Do you think overloading is better?

@martin-frbg
Copy link
Collaborator

It will choose -j 2 automatically for the two cores provided by travis, still testing if -j3 or -j4 would really help.

@brada4
Copy link
Contributor

brada4 commented Nov 10, 2018

The way emulator could work jit-ing chains of ARM code - doubling ncpus will keep compiler code more time in RAM and JIT-ed, and tests are highly undesirabe as fresh advanced code to be jitted again.

@martin-frbg
Copy link
Collaborator

Seems 3 concurrent make jobs do not help. With the latest changes to the docker environment the ARM32 gcc job manages to finish in a bit over 47 minutes. ARM64 gcc sometimes barely makes it as well while the two clang-based jobs consistently time out at around 90 percent.

@martin-frbg martin-frbg reopened this Nov 11, 2018
@brada4
Copy link
Contributor

brada4 commented Nov 11, 2018

In a year it will fit in timeout...

@martin-frbg
Copy link
Collaborator

We could look into moving the ARM tests to CircleCI, which appears to have a per-month limit on cpu time rather than per task. (Though with the cap at 1000 minutes we would only get 4 or 5 commits checked per month so probably not worth the trouble).
Perhaps we could just live with the incomplete builds for now, but it looks to me as if the allow_failures is not parsed as intended right now - I do not see it mentioned in the log, and the entire CI run gets flagged as failed which is probably not what we want.

@aytekinar
Copy link
Contributor Author

Perhaps we could just live with the incomplete builds for now, but it looks to me as if the allow_failures is not parsed as intended right now - I do not see it mentioned in the log, and the entire CI run gets flagged as failed which is probably not what we want.

I have tried adding matrix: in front of allow_failures as it seems to me, as per the documentation, that Travis might be checking for matrix.allow_failures, which we had not had before (your naming convention has jobs instead of matrix in the top part). Let's see if this time it is parsed properly. Then we can merge this PR?

@martin-frbg
Copy link
Collaborator

The current setup uses something called "build stages", but as I understand allow_failures should work there as well.

@brada4
Copy link
Contributor

brada4 commented Nov 11, 2018

I think it depends from where the wind blows....
Probably dumping gfortran and dynamic library can make gcc version through most of the time I think it must be rested for the time travis gets speedier computers....

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 11, 2018

@aytekinar looks like the "env" is not matched for some reason, but matching on "name" and/or "services" appears to work. (Tried both at the same time so cannot say for sure which it is)

@aytekinar
Copy link
Contributor Author

Trying either does not work. I do not get it :(

Normally, it should separate as in this, right? Why do I not see this in this build? Is this a bug?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 11, 2018

The lack of separation seems to be a travis bug (or missing feature) for "build stages" (saw a bug open for it but did not keep the number or link). Haven't tried with "matrix" but looks like I got it to work with stages (no separation but a line acknowledging that the jobs were allowed to fail, and the overall status set to "passed". Now to see if it is "-names" or "-services" that actually did the trick.

@aytekinar
Copy link
Contributor Author

aytekinar commented Nov 11, 2018

I have removed stage: test from the YAML file, which seems to have fixed the problem. Finally, it is working. I am not sure if you definitely need stage: test tag, as the YAML file seems to contain only one stage, anyways. Then, we might try appending

stages:
  - test

to exactly specify what stages there are.

EDIT. I have removed services: docker, and listed env:s under matrix.allow_failures. This way, some other build/test can still use services: docker and will not be allowed to fail.

Updated `.travis.yml` file to add emulated tests for `ARMV6` and `ARMV8`
architectures with `gcc` and `clang`.  Created prebuilt images with
required dependencies. Squashed layers into one.
@martin-frbg
Copy link
Collaborator

The alternative would appear to be to keep using "jobs" where you switched to "matrix" at some point

@aytekinar
Copy link
Contributor Author

The alternative would appear to be to keep using "jobs" where you switched to "matrix" at some point

... while keeping stage: test everywhere?

@martin-frbg
Copy link
Collaborator

The alternative would appear to be to keep using "jobs" where you switched to "matrix" at some point

... while keeping stage: test everywhere?

Yes, at least that works for me - I copied your original PR to my fork for experimenting, and what I now have is just your original additions with "-name" in place of "-env" in the allowed_failures. (job still running so not sure if it was name or services keyword that did the trick - though we might as well use both)

@aytekinar
Copy link
Contributor Author

Weird. I could not get them working without removing stage: test from everywhere. Apparently, jobs and matrix are simply aliases, so it should not change much. However, I could only get the separation between build jobs and allowed failures by removing the stage tag.

It's your call. You can simply add your version here from your branch.

@martin-frbg martin-frbg merged commit 2c5725c into OpenMathLib:develop Nov 12, 2018
@martin-frbg
Copy link
Collaborator

Misunderstanding actually - "my" version does not produce the nice separation, it just adds a comment naming the allowed failures. As the "stage: test" does not appear to serve any purpose (now at least),
removing it to get a cleaner look of the results panel is alright I think.

@martin-frbg martin-frbg added this to the 0.3.4 milestone Nov 12, 2018
@aytekinar aytekinar deleted the patch-1 branch November 12, 2018 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants