Skip to content

Fix sagemaker-entrypoint* & remove SageMaker and Vertex from Dockerfile* #699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

alvarobartt
Copy link
Member

What does this PR do?

This PR fixes the AWS SageMaker entrypoints in sagemaker-entrypoint.sh and sagemaker-entrypoint-cuda-all.sh, as those were missing the exec when running the text-embeddings-router process meaning it was being assigned a PID other than 1, so that the signals as the graceful handling (Ctrl + C) were not being propagated into the process, hence not stopping the process.

Besides that, this PR removes the AWS SageMaker stages from both Dockerfile (CPU) and Dockerfile-cuda-all (NVIDIA GPU) as it's not used, since those images are ported as-is into https://github.com/awslabs/llm-hosting-container/tree/main/huggingface/pytorch/tei/docker and re-built from scratch, so there's not a clear benefit of including the stage on the build here.

Finally, this PR also removes the BUILD_ARG for Google Cloud Vertex AI, as well as the conditional if-else when building the text-embeddings-router in Dockerfile-cuda-all that was building the binary with the --features google, which is not required as the Dockerfile is also ported as-is into https://github.com/huggingface/Google-Cloud-Containers/tree/main/containers/tei, meaning the BUILD_ARG is not being used here.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

  • @Narsil to ensure the aforementioned changes are fair and make sense
  • @arjkesh and @pagezyhf (feel free to ping anyone else for confirmation if needed) to verify that there is no inconvenient on removing the stage for AWS SageMaker in the Dockerfile and Dockerfile-cuda-all here, as those are ported and rebuilt for AWS anyway

Finally, as per the Google Cloud stuff, I can verify from a technical standpoint as per the containers released on Google Cloud that this is not an issue, since those are ported and rebuilt elsewhere.

Use `exec` so that the process runs with PID 1, allowing it to receive
signals directly; and so on, to be gracefully shut down
Removed since it's not being directly used anymore, since the
`Dockerfile` is ported as-is into
https://github.com/awslabs/llm-hosting-container/tree/main/huggingface/pytorch/tei/docker
as well as its respective entrypoint, and re-built in there, so no need
for it to live here (most likely the entrypoint could also be removed)
Neither of those is required, since both Google Cloud and AWS SageMaker
port the `Dockerfile-cuda-all` as-is, and then re-builds it there,
meaning that the actual BUILD_ARG for VERTEX is not being used at all,
neither the AWS SageMaker stage. For more information check the
repositories
https://github.com/huggingface/Google-Cloud-Containers/tree/main/containers/tei,
and
https://github.com/awslabs/llm-hosting-container/tree/main/huggingface/pytorch/tei/docker,
respectively for Google Cloud and AWS SageMaker.
@alvarobartt alvarobartt marked this pull request as ready for review August 14, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant