Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Can't build containers locally #4653

Open
1 of 6 tasks
jbohnslav opened this issue Mar 19, 2025 · 2 comments
Open
1 of 6 tasks

[bug] Can't build containers locally #4653

jbohnslav opened this issue Mar 19, 2025 · 2 comments

Comments

@jbohnslav
Copy link

Checklist

Concise Description:
I want to be able to build PyTorch containers locally so that I can modify them and be sure that everything is running correctly (patching, etc).

Steps to reproduce, straight from the README

aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com
python3 -m venv dlc
source dlc/bin/activate
pip install -r src/requirements.txt
pip install -e .
bash src/setup.sh pytorch
python src/main.py --buildspec pytorch/training/buildspec-2-5-sm.yml \
                   --framework pytorch \
                   --image_types training \
                   --device_types gpu \
                   --py_versions py3

Error message:

Traceback (most recent call last):
  File "/path/to/deep-learning-containers/src/main.py", line 140, in <module>
    main()
  File "/path/to/deep-learning-containers/src/main.py", line 136, in main
    image_builder(buildspec_file, image_types, device_types)
  File "/path/to/deep-learning-containers/src/image_builder.py", line 378, in image_builder
    patch_helper.initiate_multithreaded_autopatch_prep(
  File "/path/to/deep-learning-containers/src/patch_helper.py", line 352, in initiate_multithreaded_autopatch_prep
    run(f"aws s3 cp s3://patch-dlc {download_path} --recursive", hide=True)
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/__init__.py", line 50, in run
    return Context().run(command, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/context.py", line 104, in run
    return self._run(runner, command, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/context.py", line 113, in _run
    return runner.run(command, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/runners.py", line 395, in run
    return self._run_body(command, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/runners.py", line 451, in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
                                                          ^^^^^^^^^^^^^^
  File "/path/to/deep-learning-containers/dlc/lib/python3.11/site-packages/invoke/runners.py", line 518, in _finish
    raise UnexpectedExit(result)
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!

Command: 'aws s3 cp s3://patch-dlc /path/to/patch-dlc --recursive'

Exit code: 1

Do I need to be on EC2 to run this? Is it not possible for anyone other than AWS employees to access the s3://patch-dlc bucket?

@BaiqingL
Copy link

Also running into the same issue, python version 3.12.9, following readme gets me the error

Traceback (most recent call last):
  File "/home/andy/sage/deep-learning-containers/src/main.py", line 10, in <module>
    from image_builder import image_builder
  File "/home/andy/sage/deep-learning-containers/src/image_builder.py", line 27, in <module>
    import patch_helper
  File "/home/andy/sage/deep-learning-containers/src/patch_helper.py", line 9, in <module>
    from output import OutputFormatter
  File "/home/andy/sage/deep-learning-containers/src/output.py", line 21, in <module>
    import pyfiglet
  File "/home/andy/sage/deep-learning-containers/dlc/lib/python3.12/site-packages/pyfiglet/__init__.py", line 11, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

@Yadan-Wei
Copy link
Contributor

The package missing here is used for patch CVEs, you can build image without enabling patching by disabling this tag in buildspec https://github.com/aws/deep-learning-containers/blob/master/pytorch/training/buildspec-2-5-sm.yml#L8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants