Skip to content

Add Dockerfile-cpu-amd for AMD CPU compatibility #638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

randomm
Copy link
Contributor

@randomm randomm commented Jun 14, 2025

What does this PR do?

This PR adds a new Dockerfile-cpu-amd to resolve Intel MKL compatibility issues on AMD processors, specifically addressing the SGEMM errors reported when running Qwen3 embedding models on AMD CPUs.

Problem

Users running text-embeddings-inference on AMD processors encounter Intel MKL errors:

  • "Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM"
  • "Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM"

This occurs because Intel MKL is optimized for Intel processors and has compatibility issues with AMD architectures.

Solution

  • Adds Dockerfile-cpu-amd: A new specialized Dockerfile following the project's existing pattern (similar to Dockerfile-cuda, Dockerfile-intel)
  • Removes Intel MKL dependencies: Uses generic BLAS libraries (libomp-dev) instead of Intel MKL
  • Maintains compatibility: No changes to existing Dockerfiles or functionality
  • Clean and minimal: Only adds the essential Dockerfile without additional workflow files

Key Changes

  • New file: Dockerfile-cpu-amd - AMD-compatible CPU Dockerfile without Intel MKL

Testing

Successfully tested on AMD server - no more Intel MKL errors

  • Image builds successfully
  • Qwen3 embedding models now work correctly on AMD processors
  • Performance is not awesome (Qwen3 0.6B model), but at least it runs on AMD chips now. Tested on AWS t3a.2xlarge instances.

Testing Results (from simple shell script)

Ran with concurrency of 3 (so not a lot!)

Total Requests: 100
Successful: 100
Failed: 0
Success Rate: 100.0%

Overall Performance (successful requests):
Average Response Time: 3.690 seconds
Median Response Time: 4.000 seconds
Min Response Time: 1.000 seconds
Max Response Time: 6.000 seconds

Performance by Text Length:

Short (1 word) ( 1 tokens):
Count: 20, Avg: 4.350s, Min: 3.000s, Max: 5.000s
Medium (3 words) ( 3 tokens):
Count: 20, Avg: 3.600s, Min: 1.000s, Max: 4.000s
Question (13 words) ( 12 tokens):
Count: 20, Avg: 2.600s, Min: 2.000s, Max: 4.000s
Paragraph (47 words) ( 42 tokens):
Count: 20, Avg: 3.100s, Min: 2.000s, Max: 6.000s
Long text (95 words) ( 74 tokens):
Count: 20, Avg: 4.800s, Min: 4.000s, Max: 6.000s

Total Test Duration: 126.00 seconds
Requests per Second: 0.79
Total Tokens Processed: 2640
Tokens per Second: 20.95

Note: most time is spent queueing.

Fixes #636

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@OlivierDehaene @Narsil

This follows the project's existing pattern of specialized Dockerfiles and provides a clean solution for AMD CPU compatibility without affecting existing functionality.

…dependencies that cause SGEMM errors on AMD processors - Uses generic BLAS (libomp-dev) instead of Intel MKL - Follows project's existing pattern of specialized Dockerfiles - Resolves issue huggingface#636
@randomm randomm force-pushed the fix-dockerfile-issue branch from 296396e to c0a00f1 Compare June 14, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models
1 participant