Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PaddleV3] 修改 Dockerfile 为 Paddle 3.0.0beta 并添加 CI 测试的 blacklist #1061

Merged
merged 6 commits into from
Oct 17, 2024

Conversation

megemini
Copy link
Contributor

@megemini megemini commented Oct 9, 2024

Create A Good Pull Request

  1. 修改 Dockerfile 为 Paddle 3.0.0beta,PyTorch ONNX 等一并改为最新的版本
  2. test_benchmark 使用 black.list 过滤测试
  • 默认所有都不测试
  • 后面每次修改模型,black.list 中删掉对应项,CI 中对其进行测试

Dockfile 我本地构建没啥问题:

λ 483e70b23ef6 /home python
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> import torch
>>> import tensorflow
2024-10-09 06:23:12.329805: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-09 06:23:12.360176: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-09 06:23:12.815496: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
>>> import onnx
>>> paddle.__version__
'3.0.0-beta1'
>>> torch.__version__
'2.4.1+cu121'
>>> tensorflow.__version__
'2.16.1'
>>> onnx.__version__
'1.17.0'

有个小问题,基础 docker 里面的 python 是 3.9 ,不是 Paddle 支持的最低 3.8 ~ 不过问题也不大,我这里就沿用了 ~

但是,Caffe 没有在 Dockerfile 的配置中,这个是咋配置的?

CI 里面我看 PyTorch 跟其他几个是分开测试的,不太清楚我这里脚本有木有问题,先提交一下看看吧 ~

另外,后面修改的时候,不能保证上面所有框架的最新版本都能通过,中间如果实在适配困难,可能需要再修改一下 Dockerfile ~

关联:#1060

@luotao1 请评审 ~

下面的文字请保留在PR说明的最后面,并在提完PR后,根据实际情况勾选确认以下情况

Please check the follow step before merging this pull request

  • Python code style verification
  • Review all the code diff by yourself
  • All models(TensorFLow/Caffe/ONNX/PyTorch) testing passed
  • Details about your pull request, releated issues

If this PR add new model support, please update model_zoo.md and add model to out test model zoos(@luotao1 )

  • New Model Supported
  • No New Model Supported

@megemini
Copy link
Contributor Author

Update 20241011

  • 增加 Dockerfile 安装 protobuf 为 3.20.2 版本

    Caffe 转换时,原 docker 中的 protobuf 版本太高,因此需要降低版本

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Oct 11, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation Oct 11, 2024
@megemini
Copy link
Contributor Author

Update 20241016

  • 修改 dockerfile 中的 cuda 版本为 11.2

CI 服务器上的 cuda 应该是 11.2 ,参考之前的日志:

2024-10-13 20:08:04 LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

麻烦再构建一下镜像试试吧 😅😅😅

@luotao1

python -m pip install torchmetrics==0.10.2 pytorch_lightning==1.5.3 kornia==0.5.11 hypothesis pre-commit==2.17.0 && \
python -m pip install wget timm transformers pandas nose pytest opencv-python==4.6.0.66 allure-pytest && \
python -m pip install torch==2.4.1 torchvision torchaudio tensorflow==2.16.1 onnx==1.17.0 onnxruntime && \
python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.paddlepaddle.org.cn/packages/stable/cu118/

11.2的镜像可以装3.0.0b1么?

@luotao1
Copy link
Collaborator

luotao1 commented Oct 17, 2024

可以回退到上一个commit,路径里export下即可
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/compat

  • before:
image
  • after:
3290ab3c63859ad9ff7b8eaac457ab7d

镜像我先不重新生成了,我在CI配置里加这句

Copy link
Collaborator

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 版本检查在下一个PR中完成

@luotao1 luotao1 merged commit d432e06 into PaddlePaddle:develop Oct 17, 2024
3 of 4 checks passed
@luotao1 luotao1 added the contributor External developers label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants