Skip to content

Open MPI parsing of proc mounts needs help #1822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hppritcha opened this issue Jun 28, 2016 · 10 comments
Closed

Open MPI parsing of proc mounts needs help #1822

hppritcha opened this issue Jun 28, 2016 · 10 comments
Labels
Milestone

Comments

@hppritcha
Copy link
Member

Open MPI tries to parse proc/mounts (not the hwloc part), which dies if entries exceed the size of the buffer used to read a line from the file. The technique in hwloc using setmntent and getmntent to parse /proc/mounts is probably the way to go.

This problem shows up regularly on Cray XC's running CLE 6.

@hppritcha hppritcha added the bug label Jun 28, 2016
@hppritcha hppritcha added this to the v2.0.1 milestone Jun 28, 2016
@hjelmn
Copy link
Member

hjelmn commented Jul 1, 2016

I think I fixed this on master with the rcache/mpool rewrite. Will need to make a 2.0.x specific fix.

@ggouaillardet
Copy link
Contributor

@hjelmn
i made #1846 for master
a similar fix should be done for v2.x

@hppritcha
v1.10 hwloc manually parses /proc/mounts, is a fix required for this branch too ?

@jsquyres
Copy link
Member

@hppritcha Per @ggouaillardet's question, do you only care about v2.x for this fix, or do you need a fix for v1.10.x as well? If you only need v2.x, then is open-mpi/ompi-release#1298 sufficient?

@hppritcha
Copy link
Member Author

I think the problematic points in Open MPI recent releases have all be addressed so this can be closed.

@AlexanderKurtz
Copy link

Since I was just bitten by this, here is an example error message when running OpenMPI under docker. This hopefully improves the odds of $search_engine finding this bug report!

Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/P6LJKWCEKPXJWROJMJK3TEQEQ6:/var/lib/docker/overlay2/l/AUWTYHXASAF36QV5RKRCMUSFKR:/var/lib/docker/overlay2/l/R6UEDWX7LPYJAAG2FU3ATF2V3C:/var/lib/docker/overlay2/l/VOL35J5R5MJ3IT24W6CNYJTL4O:/var/lib/docker/overlay2/l/Z2MK7LR3EUI55CT2NUQPOTE2XI:/var/lib/docker/overlay2/l/KV24PDZFQBZCG5RDHRDAOCKWYH:/var/lib/docker/overlay2/l/A5XWI2JFDDU35WYIE4IJKKHHLD:/var/lib/docker/overlay2/l/IEQ77LFHHVS3JHGRXBTRTIC4TU:/var/lib/docker/overlay2/l/4HJEKXYLDUYTM'

@jsquyres
Copy link
Member

@AlexanderKurtz What version of Open MPI were you using?

It looks like the fix for this went into Open MPI v2.0.1 and beyond.

@AlexanderKurtz
Copy link

@jsquyres: I am using Ubuntu Xenial which only has OpenMPI 1.10.2 [0].

[0] https://packages.ubuntu.com/source/xenial/openmpi

@jsquyres
Copy link
Member

@AlexanderKurtz Gotcha. Your best solution will be to upgrade to a later version of Open MPI, then. Sorry! ☹️

More complete answer: the latest version of the Open MPI v1.10 series is v1.10.7, which was released in May of 2017 (v1.10.2 is from Jan of 2016). There are no more planned v1.10.x releases. As of this writing (39 Jan 2018), Open MPI's current release is v3.0.0, with v3.0.1 and v3.1.0 coming out soon.

@ggouaillardet
Copy link
Contributor

@jsquyres www.open-mpi.org states in the Download section that v2.1 is still supported.
The documentation section states that even v1.10 is in bug fix only (which is clearly out if date)

@jsquyres
Copy link
Member

Yes, I'm sorry if I was not clear: the most recent release is v3.0.0 (with v3.0.1 and v3.1 coming soon). But @ggouaillardet is correct -- v2.1.x is still supported; all releases beyond v2.0.1 should contain the fix.

But IMHO: If you're going to upgrade from v1.10.x, you might as well upgrade to v3.something to give you the latest/greatest/supported-for-the-longest-period-of-time-in-the-future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants