Skip to content

Commit

Permalink
Merge pull request opendatalab#1513 from opendatalab/dev
Browse files Browse the repository at this point in the history
docs(faq): add troubleshooting guide for old GPUs encountering CUDA errors
  • Loading branch information
myhloli authored Jan 11, 2025
2 parents 5363e61 + e41d7be commit f635681
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 1 deletion.
20 changes: 20 additions & 0 deletions docs/FAQ_en_us.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,23 @@ pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhlol
```

Reference: https://github.com/opendatalab/MinerU/issues/1004

### 9. Old Graphics Cards Such as M40 Encounter "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED"

An error occurs during operation (cuda):
```
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
```
Because BF16 precision is not supported on graphics cards before the Turing architecture and some graphics cards are not recognized by torch, it is necessary to manually disable BF16 precision.
Modify the code in lines 287-290 of the "pdf_parse_union_core_v2.py" file (note that the location may vary in different versions):
```
if torch.cuda.is_bf16_supported():
supports_bfloat16 = True
else:
supports_bfloat16 = False
```
Change it to:
```
supports_bfloat16 = False
```
Reference: https://github.com/opendatalab/MinerU/issues/1508
23 changes: 22 additions & 1 deletion docs/FAQ_zh_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ cuda11对新显卡的兼容性不好,需要升级paddle使用的cuda版本
```bash
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
```

参考:https://github.com/opendatalab/MinerU/issues/558

### 7.在部分Linux服务器上,程序一运行就报错 `非法指令 (核心已转储)``Illegal instruction (core dumped)`
Expand All @@ -74,3 +73,25 @@ pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhlol
```

参考:https://github.com/opendatalab/MinerU/issues/1004

### 9. 旧显卡如M40出现 "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED"

在运行过程中(使用CUDA)出现以下错误:
```
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
```
由于Turing架构之前的显卡不支持BF16精度,并且部分显卡未能被PyTorch正确识别,因此需要手动关闭BF16精度。

请找到并修改`pdf_parse_union_core_v2.py`文件中的第287至290行代码(注意:不同版本中位置可能有所不同),原代码如下:
```python
if torch.cuda.is_bf16_supported():
supports_bfloat16 = True
else:
supports_bfloat16 = False
```
将其修改为:
```python
supports_bfloat16 = False
```

参考:https://github.com/opendatalab/MinerU/issues/1508

0 comments on commit f635681

Please sign in to comment.