Skip to content

Commit 6cf97e9

Browse files
committed
(torchx/components) Fix entrypoint loading to deal with deferred loading of modules to enable component registration to work properly
1 parent c74bb9c commit 6cf97e9

File tree

12 files changed

+545
-198
lines changed

12 files changed

+545
-198
lines changed

docs/source/advanced.rst

+98-14
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ resource can then be used in the following manner:
156156

157157
Registering Custom Components
158158
-------------------------------
159-
It is possible to author and register a custom set of components with the
159+
You can author and register a custom set of components with the
160160
``torchx`` CLI as builtins to the CLI. This makes it possible to customize
161161
a set of components most relevant to your team or organization and support
162162
it as a CLI ``builtin``. This way users will see your custom components
@@ -166,7 +166,63 @@ when they run
166166
167167
$ torchx builtins
168168
169-
Custom components can be registered via the following modification of the ``entry_points``:
169+
Custom components can be registered via ``[torchx.components]`` entrypoints.
170+
If ``my_project.bar`` had the following directory structure:
171+
172+
::
173+
174+
$PROJECT_ROOT/my_project/bar/
175+
|- baz.py
176+
177+
And ``baz.py`` had a single component (function) called ``trainer``:
178+
179+
::
180+
181+
# baz.py
182+
import torchx.specs as specs
183+
184+
def trainer(...) -> specs.AppDef: ...
185+
186+
187+
And the entrypoints were added as:
188+
189+
.. testcode::
190+
191+
# setup.py
192+
...
193+
entry_points={
194+
"torchx.components": [
195+
"foo = my_project.bar",
196+
],
197+
}
198+
199+
TorchX will search the module ``my_project.bar`` for all defined components and group the found
200+
components under the ``foo.*`` prefix. In this case, the component ``my_project.bar.baz.trainer``
201+
would be registered with the name ``foo.baz.trainer``.
202+
203+
.. note::
204+
Only python packages (those directories with an ``__init__.py`` file)
205+
are searched for and TorchX makes no attempt to recurse into namespace packages
206+
(directories without a ``__init__.py`` file).
207+
However you may register a top level namespace package.
208+
209+
``torchx`` CLI will display registered components via:
210+
211+
.. code-block:: shell-session
212+
213+
$ torchx builtins
214+
Found 1 builtin components:
215+
1. foo.baz.trainer
216+
217+
The custom component can then be used as:
218+
219+
.. code-block:: shell-session
220+
221+
$ torchx run foo.baz.trainer -- --name "test app"
222+
223+
224+
When you register your own components, TorchX will not include its own builtins. To add TorchX's
225+
builtin components you must specify another entry as:
170226

171227

172228
.. testcode::
@@ -176,32 +232,60 @@ Custom components can be registered via the following modification of the ``entr
176232
entry_points={
177233
"torchx.components": [
178234
"foo = my_project.bar",
235+
"torchx = torchx.components",
179236
],
180237
}
181238

182-
The line above registers a group ``foo`` that is associated with the module ``my_project.bar``.
183-
TorchX will recursively traverse lowest level dir associated with the ``my_project.bar`` and will find
184-
all defined components.
239+
This will add back the TorchX builtins but with a ``torchx.*`` component name prefix (e.g. ``torchx.dist.ddp``
240+
versus the default ``dist.ddp``).
241+
242+
If there are two registry entries pointing to the same component, for instance
185243

186-
.. note:: If there are two registry entries, e.g. ``foo = my_project.bar`` and ``test = my_project``
187-
there will be two sets of overlapping components with different aliases.
244+
.. testcode::
188245

246+
# setup.py
247+
...
248+
entry_points={
249+
"torchx.components": [
250+
"foo = my_project.bar",
251+
"test = my_project",
252+
],
253+
}
189254

190-
After registration, torchx cli will display registered components via:
255+
256+
There will be two sets of overlapping components for those components in ``my_project.bar`` with different
257+
prefix aliases: ``foo.*`` and ``test.bar.*``. Concretely,
191258

192259
.. code-block:: shell-session
193260
194261
$ torchx builtins
262+
Found 2 builtin components:
263+
1. foo.baz.trainer
264+
2. test.bar.baz.trainer
195265
196-
If ``my_project.bar`` had the following directory structure:
266+
To omit groupings and make the component names shorter, use underscore (e.g ``_`` or ``_0``, ``_1``, etc).
267+
For example:
197268

198-
::
269+
.. testcode::
199270

200-
$PROJECT_ROOT/my_project/bar/
201-
|- baz.py
271+
# setup.py
272+
...
273+
entry_points={
274+
"torchx.components": [
275+
"_0 = my_project.bar",
276+
"_1 = torchx.components",
277+
],
278+
}
202279

203-
And ``baz.py`` defines a component (function) called ``trainer``. Then the component can be run as a job in the following manner:
280+
This has the effect of exposing the trainer component as ``baz.trainer`` (as opposed to ``foo.baz.trainer``)
281+
and adds back the builtin components as in the vanilla installation of torchx, without the ``torchx.*`` prefix.
204282

205283
.. code-block:: shell-session
206284
207-
$ torchx run foo.baz.trainer -- --name "test app"
285+
$ torchx builtins
286+
Found 11 builtin components:
287+
1. baz.trainer
288+
2. dist.ddp
289+
3. utils.python
290+
4. ... <more builtins from torchx.components.* ...>
291+

scripts/kube_dist_trainer.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def register_gpu_resource() -> None:
4242

4343
def build_and_push_image() -> BuildInfo:
4444
build = build_images()
45-
push_images(build)
45+
push_images(build, container_repo="localhost")
4646
return build
4747

4848

0 commit comments

Comments
 (0)