Skip to content

Consolidated: fix OFI configury / linking issues #6363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 7, 2019

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Feb 6, 2019

This PR is a consolidation of several existing PRs -- this is simpler than having a bunch of individual PRs that may end up having merge ordering issues:

Also includes some commits to remove pesky tabs. Down with tabs!

FYI @rashikakheria

@jsquyres
Copy link
Member Author

jsquyres commented Feb 6, 2019

@rashikakheria Could you please try this PR and see if it resolves the issue for you?

@jsquyres jsquyres added the NEWS label Feb 6, 2019
Copy link
Contributor

@ggouaillardet ggouaillardet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why did you choose to use global variables such as opal_ofi_CPPFLAGS instead of doing things a la verbs in which each component has its own variables (e.g. btl_ofi_CPPFLAGS, mtl_ofi_CPPFLAGS and btl_usnic_CPPFLAGS) ?

@jsquyres
Copy link
Member Author

jsquyres commented Feb 6, 2019

Because that lets us run the OFI checks just once (vs. run the tests again for each component).

@ggouaillardet
Copy link
Contributor

From config/opal_check_openfabrics.m4, the core of the test is protected by a global if test -z "$opal_check_openib_happy" so it is ran only once. Only trivial stuff is ran once per component.

@jsquyres
Copy link
Member Author

jsquyres commented Feb 6, 2019

I somewhat prefer that the results end up in a documented single set of variables instead of 3 sets of variables (one each for btl/ofi, btl/usnic, and mtl/ofi). But I don't feel strongly about this.

I know that other components do it the other way -- but this morning when I was coding it up:

  • it seemed kinda silly to have multiple sets of variables that all have identical values (i.e., one for each component).
  • it felt like having globally-scoped names really reflected what was being tested. The 3 components have their needs, but they're all testing the same back-end thing: whether OFI/libfabric is available. Hence, having the var names reflect the common back-end thing (vs. the components) just felt a little more natural to me this morning.

But again, I don't have strong feelings either way. If you'd prefer that I put it back to having the output names reflect the components, I can do that. Lemme know. 😄

@rashikakheria
Copy link

@rashikakheria Could you please try this PR and see if it resolves the issue for you?

I can confirm that this fixes my linking issue. Thanks for all of your assistance!

@jsquyres
Copy link
Member Author

jsquyres commented Feb 6, 2019

@rashikakheria Awesome; thanks. I think @ggouaillardet and I might iterate a little more to get the solution in its final state (he's many timezones away / asleep right now), but we're darn close.

@matcabral
Copy link
Contributor

@jsquyres, I did effectively start looking at this but got pulled by other stuff (apologize). In any case, I also tested this PR and is working for me. thanks,

@ggouaillardet
Copy link
Contributor

@jsquyres for maintainability purpose, I think verbs and libfabric should use a consistent approach. And I have no preference whether we should do things a la verbs or a la libfabric. But this is something we can change later, so for now, I am fine with this PR.

@bwbarrett do you want to comment on that PR ?

@jsquyres
Copy link
Member Author

jsquyres commented Feb 7, 2019

@ggouaillardet Ah -- there's an easy way to fix that: merge #6270 (I had forgotten that it wasn't already merged! 😜) . That gets rid of all the verbs stuff, and all we're left with is the "new" style libfabric stuff.

I've just rebased #6270; I'll merge it when I finish a few sanity tests and CI completes.

@jsquyres jsquyres force-pushed the pr/fix-ofi-configury branch from f37efe8 to 39e0d54 Compare February 7, 2019 14:28
ggouaillardet and others added 5 commits February 7, 2019 06:29
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <[email protected]>
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <[email protected]>
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <[email protected]>
It never lived up to its purpose (and has caused amorphous indirect
errors such as open-mpi#2519), so
delete it.

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres jsquyres force-pushed the pr/fix-ofi-configury branch from 39e0d54 to dd20174 Compare February 7, 2019 14:30
@jsquyres
Copy link
Member Author

jsquyres commented Feb 7, 2019

I rebased this on the HEAD of master, just for good measure. After CI finishes and this PR is merged:

  1. I have a v4.0.x cherry pick branch (with all the commits from this PR) ready to make a new v4.0.x PR.
  2. Similarly, I have a v3.1.x cherry pick branch (with only f5e1a67 and dd20174 from this branch) ready to make a new v3.1.x PR.
  3. I tried to make a v3.0.x cherry pick branch (with only the same 2 commits: f5e1a67 and dd20174), but it generated a ton of conflicts.

@matcabral I've sunk a lot of time in this already -- can I ask you to make the v3.0.x PR? (i.e., cherry pick just those 2 commits and fix all the conflicts)

@jsquyres jsquyres merged commit 8451cd7 into open-mpi:master Feb 7, 2019
@jsquyres jsquyres deleted the pr/fix-ofi-configury branch February 7, 2019 15:21
This was referenced Feb 7, 2019
@matcabral
Copy link
Contributor

@jsquyres, sure, I'll look at 3.0.x, either today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Patch for linking libfabric
4 participants