Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the handling of "prefix" #2154

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Mar 6, 2025

According to the docs, we are supposed to check the cmd
line for "prefix" entries, generating an error if we
find more than one entry that are not all identical.
The cmd line prefix specification overrules all else.

Assuming nothing was found there, then we must check
the environment for PRTE_PREFIX. We check this before
the absolute path to allow for the possibility that
the remote node installation is somewhere other than
where the local executable is located. Thus, someone
might give an absolute path to "prterun" while still
needing to set the remote prefix - but do it in the
environment instead of on the cmd line.

If nothing was found there, then we check the cmd to
see if we were given an absolute path to "prte" (or
whatever proxy name was used).

We were incorrectly looking in the first application
specification for this prefix value, even though the
comments acknowledged that "prefix" only applies to
the DAEMON job. Fix that confusion by correctly
assigning the attribute to the daemon job, and correct
all the PLM components to look in the correct place.

Fix the "tm" component which was actually ignoring
the prefix completely.

Ensure we don't foward the entire environment as it
can contain directives that conflict with the remote
daemon. Use the pristine prte_launch_environment
instead and then forward only those values that are
needed, or are directed by the user.

Handle PMIX_PREFIX the same way as above since
that might also have been relocated.

Add a new --app-pmix-prefix option as the application
might be linked against a different PMIx that is
located somewhere else. If nothing is given, default
to a PMIX_PREFIX value given to PRRTE.

Add a new --no-app-prefix option so the application
can specify that it does NOT want any PRRTE-level
value to be applied to it.

Add some text to the help system to explain
the new options and update how "prefix" works.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 6, 2025

@janjust Turned out to be a tad more involved as we were actually not correctly handling "prefix" per the docs. This should fix the problem you uncovered and bring us into compliance with the docs.

@rhc54 rhc54 force-pushed the topic/pfx branch 3 times, most recently from 9a9343e to 24e0ac6 Compare March 13, 2025 12:11
According to the docs, we are supposed to check the cmd
line for "prefix" entries, generating an error if we
find more than one entry that are not all identical.
The cmd line prefix specification overrules all else.

Assuming nothing was found there, then we must check
the environment for PRTE_PREFIX. We check this before
the absolute path to allow for the possibility that
the remote node installation is somewhere other than
where the local executable is located. Thus, someone
might give an absolute path to "prterun" while still
needing to set the remote prefix - but do it in the
environment instead of on the cmd line.

If nothing was found there, then we check the cmd to
see if we were given an absolute path to "prte" (or
whatever proxy name was used).

We were incorrectly looking in the first application
specification for this prefix value, even though the
comments acknowledged that "prefix" only applies to
the DAEMON job. Fix that confusion by correctly
assigning the attribute to the daemon job, and correct
all the PLM components to look in the correct place.

Fix the "tm" component which was actually ignoring
the prefix completely.

Ensure we don't foward the entire environment as it
can contain directives that conflict with the remote
daemon.  Use the pristine prte_launch_environment
instead and then forward only those values that are
needed, or are directed by the user.

Handle PMIX_PREFIX the same way as above since
that might also have been relocated.

Add a new --app-pmix-prefix option as the application
might be linked against a different PMIx that is
located somewhere else. If nothing is given, default
to a PMIX_PREFIX value given to PRRTE. Support a
PMIX_APP_PREFIX envar equivalent.

Add a new --no-app-prefix option so the application
can specify that it does NOT want any PRRTE-level
value to be applied to it. Support a PMIX_APP_NO_PREFIX
envar equivalent. Generate a show-help error if
both app-prefix and no-app-prefix directives are provided.

Bring the various help output up to date with the
revised and new options. Update the RST versions
as well, though I'm not quite clear how OMPI picks
those up.

Signed-off-by: Ralph Castain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant