Skip to content

hook/prot: Connectivity Map #2825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jjhursey
Copy link
Member

@jjhursey jjhursey commented Jan 25, 2017

@artpol84
Copy link
Contributor

bot:mellanox:retest

@jjhursey
Copy link
Member Author

Some notes on the tunable options and output.

  • -mca hook_prot_verbose VALUE
    • General component verbosity
  • -mca hook_prot_enable_mpi_init BOOL
    • Enable map display at the bottom of MPI_Init
  • -mca hook_prot_enable_mpi_finalize BOOL
    • Enable map display at the top of MPI_Finalize
  • -mca hook_prot_platform_prot VALUE
    • Alias environment variable: MPI_PROT
    • 1 : Same as -mca hook_prot_enable_mpi_init t
    • 2 : Same as -mca hook_prot_enable_mpi_finalize t
  • Environment variable: MPI_PROT_MAX
    • When to stop printing the prot table

Output lower than or equal to MPI_PROT_MAX

     host | 0    1    2    3    4    5    6    7    8
    ======|==============================================
        0 : shm  ib   ib   ib   ib   ib   ib   ib   ib
        1 : ib   shm  ib   ib   ib   ib   ib   ib   ib
        2 : ib   ib   self ib   ib   ib   ib   ib   ib
        3 : ib   ib   ib   self ib   ib   ib   ib   ib
        4 : ib   ib   ib   ib   self ib   ib   ib   ib
        5 : ib   ib   ib   ib   ib   self ib   ib   ib
        6 : ib   ib   ib   ib   ib   ib   self ib   ib
        7 : ib   ib   ib   ib   ib   ib   ib   self ib
        8 : ib   ib   ib   ib   ib   ib   ib   ib   self

Output above MPI_PROT_MAX to 3*MMPI_PROT_MAX:

     host | 0 1 2 3 4       8
    ======|====================
        0 : A C C C C C C C C
        1 : C A C C C C C C C
        2 : C C B C C C C C C
        3 : C C C B C C C C C
        4 : C C C C B C C C C
        5 : C C C C C B C C C
        6 : C C C C C C B C C
        7 : C C C C C C C B C
        8 : C C C C C C C C B
    key: A == shm
    key: B == self
    key: C == ib

}

hostidprotptr = getenv("MPI_PROT_BRIEF");
if (hostidprotptr) { hostidprotbrief = atoi(hostidprotptr); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markalle It looks like MPI_PROT_BRIEF value is not used. Was the intention to force the "brief" output for all -prot output?

@jjhursey
Copy link
Member Author

jjhursey commented Jan 26, 2017

@markalle @gpaulsen We need to review PR #1974

  • During Open MPI face-to-face
    • Try changing the discovery mechanism to use the model seen in PR MCA component filter #1974
    • Once that change is in then then community would like to review it again, but is likely to accept for the v3.x timeframe if it is ready in time.

@gpaulsen
Copy link
Member

Also, there are some more details from face to face at the bottom of the face to face minutes

@markalle
Copy link
Contributor

markalle commented Jan 26, 2017 via email

@jjhursey
Copy link
Member Author

I think we might be able to use something like PR #1974 to get the string associated with the component. However, I think we might need some discussion about how the community feels about the section of this PR where we get the list of components actually being used between rank pairs.

Currently, we added a call to pml/ob1 and pml/cm to get that information. Do folks think that is fine, or are there alternative suggestions?

@hppritcha
Copy link
Member

bot:lanl:retest

@hppritcha
Copy link
Member

the LANL dlopen-disable seems to have found a legit problem with this PR:

make[2]: Entering directory `/home/hppritcha/jenkins/workspace/ompi_master_pr_disable_dlopen/ompi/tools/ompi_info'
  CC       ompi_info.o
  CC       param.o
  GENERATE ompi_info.1
  CCLD     ompi_info
../../../ompi/.libs/libmpi.so: undefined reference to `dlsym'
collect2: error: ld returned 1 exit status
make[2]: *** [ompi_info] Error 1
make[2]: Leaving directory `/home/hppritcha/jenkins/workspace/ompi_master_pr_disable_dlopen/ompi/tools/ompi_info'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/home/hppritcha/jenkins/workspace/ompi_master_pr_disable_dlopen/ompi'
make: *** [install-recursive] Error 1

@jjhursey
Copy link
Member Author

jjhursey commented Feb 1, 2017

@hppritcha Yeah, you are right. We need to do some work on this PR anyway before it's ready to review. We'll note that error for fixing during the next revision.

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reports are that --disable-dlopen fails with this PR, but upon code inspection, it's unclear how the two can be related.

@hppritcha
Copy link
Member

bot:lanl:retest

@jjhursey
Copy link
Member Author

jjhursey commented Feb 6, 2017

This PR needs some significant work before it's ready to merge. So I wouldn't worry too much about the CI until that work has started.

@hppritcha
Copy link
Member

bot:lanl:retest

@jjhursey
Copy link
Member Author

jjhursey commented Mar 7, 2017

Now that the hook framework has been committed to master I need to update this branch. I'll do that today.

markalle added 2 commits March 7, 2017 11:32
 * `-mca hook_prot_verbose VALUE`
   * General component vebosity
 * `-mca hook_prot_enable_mpi_init BOOL`
   * Enable map display at the bottom of `MPI_Init`
 * `-mca hook_prot_enable_mpi_finalize BOOL`
   * Enable map display at the top of `MPI_Finalize`
 * `-mca hook_prot_platform_prot VALUE`
   * Alias environment variable: `MPI_PROT`
   * `1 : Same as -mca hook_prot_enable_mpi_init t`
   * `2 : Same as -mca hook_prot_enable_mpi_finalize t`

Signed-off-by: Joshua Hursey <[email protected]>
Normally we print a -prot table up to 16 hosts that looks like this,
where 16 can be changed via MPI_PROT_MAX:

```
 host | 0    1    2    3    4    5    6    7    8
======|==============================================
    0 : shm  ib   ib   ib   ib   ib   ib   ib   ib
    1 : ib   shm  ib   ib   ib   ib   ib   ib   ib
    2 : ib   ib   self ib   ib   ib   ib   ib   ib
    3 : ib   ib   ib   self ib   ib   ib   ib   ib
    4 : ib   ib   ib   ib   self ib   ib   ib   ib
    5 : ib   ib   ib   ib   ib   self ib   ib   ib
    6 : ib   ib   ib   ib   ib   ib   self ib   ib
    7 : ib   ib   ib   ib   ib   ib   ib   self ib
    8 : ib   ib   ib   ib   ib   ib   ib   ib   self
```

This checkin reduces MPI_PROT_MAX to 12 but adds a shorter table output
that looks like this:

```
 host | 0 1 2 3 4       8
======|====================
    0 : A C C C C C C C C
    1 : C A C C C C C C C
    2 : C C B C C C C C C
    3 : C C C B C C C C C
    4 : C C C C B C C C C
    5 : C C C C C B C C C
    6 : C C C C C C B C C
    7 : C C C C C C C B C
    8 : C C C C C C C C B
key: A == shm
key: B == self
key: C == ib
```

That is used from 13 up to 36 ranks (or 3*MPI_PROT_MAX).

Signed-off-by: Joshua Hursey <[email protected]>
@jjhursey jjhursey force-pushed the topic/hook-fwk-w-prot branch from 685176a to a760be1 Compare March 7, 2017 17:33
@jjhursey
Copy link
Member Author

jjhursey commented Mar 7, 2017

The branch has been rebased onto master. The two commits represent the prot component.

There is still work to do on this feature - so I'm keeping the WIP label on it.

@gpaulsen
Copy link
Member

gpaulsen commented Mar 9, 2017

:retest:

@bwbarrett
Copy link
Member

@jjhursey are you looking to add this to the 3.0 release?

@jjhursey
Copy link
Member Author

It still needs some work. I don't think it'll make it for v3.0, but should be ready for the release that follows.

@hppritcha hppritcha added this to the Future milestone Mar 21, 2017
@gpaulsen
Copy link
Member

The community decided on a call 2 weeks ago, to NOT to take this for v3.0, but to aim for v3.1.

@bwbarrett
Copy link
Member

Thanks, @gpaulsen. Apparently, I have no short term memory, hence writing everything down...

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/73ca9ce0ceed7b37f7d60da3250d7259

@jjhursey
Copy link
Member Author

I'm going to close this PR for now. @gpaulsen @markalle Let's re-open a new one once we have finished refining. We should restart that conversation as I forget what is left to do.

@jjhursey jjhursey closed this Aug 31, 2017
@jjhursey jjhursey deleted the topic/hook-fwk-w-prot branch August 31, 2017 15:08
@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2020

This Functionality was later re-implemented for the v5.0 timeframe in #5507

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants