Skip to content

Commit e4f9f9e

Browse files
G1017yang.geng
and
yang.geng
authored
【iluvatar】Update new flaggmes operator (#755)
* add base * add base p2p * add new op and modify ops config * modify base p2p --------- Co-authored-by: yang.geng <[email protected]>
1 parent acc49a9 commit e4f9f9e

File tree

136 files changed

+1060
-87
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

136 files changed

+1060
-87
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# 参评AI芯片信息
2+
3+
* 厂商:ILUVATAR
4+
5+
## 服务器1
6+
7+
- 产品名称:BI150
8+
- 产品型号:BI150
9+
- TDP:350W
10+
11+
# 所用服务器配置
12+
13+
* 服务器数量:2
14+
15+
## 服务器1
16+
17+
* 单服务器内使用卡数:8
18+
* 服务器型号:R5300 G5
19+
* 操作系统版本:Ubuntu 20.04.6 LTS
20+
* 操作系统内核:linux5.4.0-148-generic
21+
* CPU:Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
22+
* docker版本:20.10.25
23+
* 内存:512GiB
24+
* RDMA网卡:
25+
26+
# 评测结果
27+
28+
## 核心评测结果
29+
30+
| 评测项 | 跨服务器P2P互联带宽测试值(2卡平均) | 跨服务器P2P互联带宽标定值(2卡平均) | 测试标定比例(2卡平均) |
31+
| ---- | -------------- | -------------- | ------------ |
32+
| 评测结果 | / | / | / |
33+
34+
## 能耗监控结果
35+
36+
| 监控项 | 系统平均功耗 | 系统最大功耗 | 系统功耗标准差 | 单机TDP | 单卡平均功耗(2卡平均) | 单卡最大功耗(2卡最大) | 单卡功耗标准差(2卡最大) | 单卡TDP |
37+
| ---- | ------- | ------- | ------- | ----- | ------------ | ------------ | ------------- | ----- |
38+
| 监控结果 | 1794.0W | 1794.0W | 0.0W | / | 82.08W | 88.08W | 11.35W | 400W |
39+
40+
## 其他重要监控结果
41+
42+
| 监控项 | 系统平均CPU占用 | 系统平均内存占用 | 单卡平均温度(2卡平均) | 单卡平均显存占用(2卡平均) |
43+
| ---- | --------- | -------- | ------------ | -------------- |
44+
| 监控结果 | 3.5% | 1.594% | 33.54°C | 77.965% |
45+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Melements: 1024
2+
WARMUP: 100
3+
ITERS: 1000
4+
DIST_BACKEND: "nccl"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export PYTHONPATH=/usr/local/corex/lib64/python3/dist-packages:$PYTHONPATH
2+
export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH
3+
export PATH=/usr/local/corex/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/lib64/python3/dist-packages/bin:$PATH
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#loguru

base/benchmarks/interconnect-P2P_interserver/main.py

+11-4
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@ def main(config, case_config, rank, world_size, local_rank):
4242
set_ieee_float32(config.vendor)
4343
if rank == 0:
4444
print("finish initialization")
45+
46+
if "iluvatar" in config.vendor:
47+
torch.cuda.set_device(local_rank)
48+
rank_dst = 16
49+
else:
50+
rank_dst = 8
4551

4652
Melements = case_config.Melements
4753
torchsize = (Melements, 1024, 1024)
@@ -54,8 +60,8 @@ def main(config, case_config, rank, world_size, local_rank):
5460

5561
for _ in range(case_config.WARMUP):
5662
if rank == 0:
57-
dist.send(tensor, dst=8)
58-
elif rank == 8:
63+
dist.send(tensor, dst=rank_dst)
64+
elif rank == rank_dst:
5965
dist.recv(tensor, src=0)
6066

6167
host_device_sync(config.vendor)
@@ -64,9 +70,10 @@ def main(config, case_config, rank, world_size, local_rank):
6470

6571
for _ in range(case_config.ITERS):
6672
if rank == 0:
67-
dist.send(tensor, dst=8)
68-
elif rank == 8:
73+
dist.send(tensor, dst=rank_dst)
74+
elif rank == rank_dst:
6975
dist.recv(tensor, src=0)
76+
7077
host_device_sync(config.vendor)
7178
multi_device_sync(config.vendor)
7279
end_time = time.perf_counter()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# 参评AI芯片信息
2+
3+
* 厂商:ILUVATAR
4+
5+
## 服务器1
6+
7+
- 产品名称:BI150
8+
- 产品型号:BI150
9+
- TDP:350W
10+
11+
# 所用服务器配置
12+
13+
* 服务器数量:1
14+
15+
## 服务器1
16+
17+
* 单服务器内使用卡数:2
18+
* 服务器型号:R5300 G5
19+
* 操作系统版本:Ubuntu 20.04.6 LTS
20+
* 操作系统内核:linux5.4.0-148-generic
21+
* CPU:Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
22+
* docker版本:20.10.25
23+
* 内存:512GiB
24+
25+
# 评测结果
26+
27+
## 核心评测结果
28+
29+
| 评测项 | 服务器内P2P互联带宽测试值(2卡平均) | 服务器内P2P互联带宽标定值(2卡平均) | 测试标定比例(2卡平均) |
30+
| ---- | -------------- | -------------- | ------------ |
31+
| 评测结果 | / | / | / |
32+
33+
## 能耗监控结果
34+
35+
| 监控项 | 系统平均功耗 | 系统最大功耗 | 系统功耗标准差 | 单机TDP | 单卡平均功耗(2卡平均) | 单卡最大功耗(2卡最大) | 单卡功耗标准差(2卡最大) | 单卡TDP |
36+
| ---- | ------- | ------- | ------- | ----- | ------------ | ------------ | ------------- | ----- |
37+
| 监控结果 | 2090.0W | 2090.0W | 0.0W | / | 90.22W | 92.0W | 0.98W | 350W |
38+
39+
## 其他重要监控结果
40+
41+
| 监控项 | 系统平均CPU占用 | 系统平均内存占用 | 单卡平均温度(2卡平均) | 单卡平均显存占用(2卡平均) |
42+
| ---- | --------- | -------- | ------------ | -------------- |
43+
| 监控结果 | 99.866% | 4.067% | 33.31°C | 1.238% |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Melements: 1024
2+
WARMUP: 100
3+
ITERS: 100000
4+
DIST_BACKEND: "nccl"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export PYTHONPATH=/usr/local/corex/lib64/python3/dist-packages:$PYTHONPATH
2+
export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH
3+
export PATH=/usr/local/corex/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/lib64/python3/dist-packages/bin:$PATH
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#loguru

operation/benchmarks/abs/iluvatar/BI150/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
* 服务器间AI芯片直连规格及带宽:此评测项不涉及服务期间AI芯片直连
2222

2323
# 算子库版本
24-
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems-0710_pointwise_use_tid)
24+
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems最新适配版本)
2525

2626
# 评测结果
2727

operation/benchmarks/abs/iluvatar/BI150/case_config.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
Melements: 512
2-
SPECTFLOPS: 24.576
32
WARMUP: 100
43
ITERS: 50000
54
KERNELWARMUP: 10

operation/benchmarks/add/iluvatar/BI150/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
* 服务器间AI芯片直连规格及带宽:此评测项不涉及服务期间AI芯片直连
2222

2323
# 算子库版本
24-
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems-0710_pointwise_use_tid)
24+
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems最新适配版本)
2525

2626
# 评测结果
2727

operation/benchmarks/add/iluvatar/BI150/case_config.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
Melements: 512
2-
SPECTFLOPS: 49.152
32
WARMUP: 100
43
ITERS: 50000
54
KERNELWARMUP: 10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# 参评AI芯片信息
2+
3+
* 厂商:ILUVATAR
4+
5+
* 产品名称:BI150
6+
* 产品型号:BI150
7+
* TDP:W
8+
9+
# 所用服务器配置
10+
11+
* 服务器数量:1
12+
13+
14+
* 单服务器内使用卡数:1
15+
* 服务器型号:
16+
* 操作系统版本:Ubuntu 20.04.6 LTS
17+
* 操作系统内核:linux5.4.0-148-generic
18+
* CPU:
19+
* docker版本:20.10.25
20+
* 内存:
21+
* 服务器间AI芯片直连规格及带宽:此评测项不涉及服务期间AI芯片直连
22+
23+
# 算子库版本
24+
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems最新适配版本)
25+
26+
# 评测结果
27+
28+
## 核心评测结果
29+
30+
| 评测项 | correctness | TFLOPS(cpu wall clock) | TFLOPS(kernel clock) | FU(FLOPS Utilization)-cputime | FU-kerneltime |
31+
| ---- | -------------- | -------------- | ------------ | ------ | ----- |
32+
| flaggems | True | 14.98TFLOPS | 14.92TFLOPS | 31.22% | 31.09% |
33+
| nativetorch | True | 18.17TFLOPS | 18.04TFLOPS | 37.85% | 37.59% |
34+
35+
## 其他评测结果
36+
37+
| 评测项 | cputime | kerneltime | cputime吞吐 | kerneltime吞吐 | 无预热时延 | 预热后时延 |
38+
| ---- | -------------- | -------------- | ------------ | ------------ | -------------- | -------------- |
39+
| flaggems | 4587.86us | 4606.8us | 217.97op/s | 217.07op/s | 58589429.32us | 5365.65us |
40+
| nativetorch | 3783.46us | 3810.43us | 264.31op/s | 262.44op/s | 94735.44us | 4204.33us |
41+
42+
## 能耗监控结果
43+
44+
| 监控项 | 系统平均功耗 | 系统最大功耗 | 系统功耗标准差 | 单机TDP | 单卡平均功耗 | 单卡最大功耗 | 单卡功耗标准差 | 单卡TDP |
45+
| ---- | ------- | ------- | ------- | ----- | ------------ | ------------ | ------------- | ----- |
46+
| nativetorch监控结果 | 2093.8W | 2204.0W | 30.4W | / | 203.73W | 204.0W | 0.44W | 350W |
47+
| flaggems监控结果 | 2090.0W | 2109.0W | 6.94W | / | 182.0W | 186.0W | 11.42W | 350W |
48+
49+
## 其他重要监控结果
50+
51+
| 监控项 | 系统平均CPU占用 | 系统平均内存占用 | 单卡平均温度 | 单卡最大显存占用 |
52+
| ---- | --------- | -------- | ------------ | -------------- |
53+
| nativetorch监控结果 | 99.888% | 1.879% | 56.75°C | 2.24% |
54+
| flaggems监控结果 | 99.875% | 1.882% | 52.63°C | 2.631% |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
M: 2048
2+
N: 4096
3+
K: 4096
4+
WARMUP: 100
5+
ITERS: 20000
6+
KERNELWARMUP: 10
7+
KERNELITERS: 1000
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export PYTHONPATH=/usr/local/corex/lib64/python3/dist-packages:$PYTHONPATH
2+
export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH
3+
export PATH=/usr/local/corex/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/lib64/python3/dist-packages/bin:$PATH
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#loguru
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# 参评AI芯片信息
2+
3+
* 厂商:ILUVATAR
4+
5+
* 产品名称:BI150
6+
* 产品型号:BI150
7+
* TDP:W
8+
9+
# 所用服务器配置
10+
11+
* 服务器数量:1
12+
13+
14+
* 单服务器内使用卡数:1
15+
* 服务器型号:
16+
* 操作系统版本:Ubuntu 20.04.6 LTS
17+
* 操作系统内核:linux5.4.0-148-generic
18+
* CPU:
19+
* docker版本:20.10.25
20+
* 内存:
21+
* 服务器间AI芯片直连规格及带宽:此评测项不涉及服务期间AI芯片直连
22+
23+
# 算子库版本
24+
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems最新适配版本)
25+
26+
# 评测结果
27+
28+
## 核心评测结果
29+
30+
| 评测项 | correctness | TFLOPS(cpu wall clock) | TFLOPS(kernel clock) | FU(FLOPS Utilization)-cputime | FU-kerneltime |
31+
| ---- | -------------- | -------------- | ------------ | ------ | ----- |
32+
| flaggems | True | 0.04TFLOPS | 0.04TFLOPS | 0.09% | 0.09% |
33+
| nativetorch | True | 0.03TFLOPS | 0.03TFLOPS | 0.06% | 0.06% |
34+
35+
## 其他评测结果
36+
37+
| 评测项 | cputime | kerneltime | cputime吞吐 | kerneltime吞吐 | 无预热时延 | 预热后时延 |
38+
| ---- | -------------- | -------------- | ------------ | ------------ | -------------- | -------------- |
39+
| flaggems | 12073.11us | 12109.31us | 82.83op/s | 82.58op/s | 2021354.35us | 13004.15us |
40+
| nativetorch | 19045.0us | 19065.13us | 52.51op/s | 52.45op/s | 55525.18us | 20219.84us |
41+
42+
## 能耗监控结果
43+
44+
| 监控项 | 系统平均功耗 | 系统最大功耗 | 系统功耗标准差 | 单机TDP | 单卡平均功耗 | 单卡最大功耗 | 单卡功耗标准差 | 单卡TDP |
45+
| ---- | ------- | ------- | ------- | ----- | ------------ | ------------ | ------------- | ----- |
46+
| nativetorch监控结果 | 2104.78W | 2109.0W | 7.9W | / | 118.95W | 120.0W | 0.73W | 350W |
47+
| flaggems监控结果 | 2121.67W | 2128.0W | 14.16W | / | 124.87W | 125.0W | 0.34W | 350W |
48+
49+
## 其他重要监控结果
50+
51+
| 监控项 | 系统平均CPU占用 | 系统平均内存占用 | 单卡平均温度 | 单卡最大显存占用 |
52+
| ---- | --------- | -------- | ------------ | -------------- |
53+
| nativetorch监控结果 | 99.901% | 1.789% | 35.89°C | 13.239% |
54+
| flaggems监控结果 | 99.86% | 1.795% | 36.61°C | 13.239% |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Melements: 512
2+
WARMUP: 100
3+
ITERS: 50000
4+
KERNELWARMUP: 10
5+
KERNELITERS: 1000
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export PYTHONPATH=/usr/local/corex/lib64/python3/dist-packages:$PYTHONPATH
2+
export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH
3+
export PATH=/usr/local/corex/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/lib64/python3/dist-packages/bin:$PATH
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#loguru
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# 参评AI芯片信息
2+
3+
* 厂商:ILUVATAR
4+
5+
* 产品名称:BI150
6+
* 产品型号:BI150
7+
* TDP:W
8+
9+
# 所用服务器配置
10+
11+
* 服务器数量:1
12+
13+
14+
* 单服务器内使用卡数:1
15+
* 服务器型号:
16+
* 操作系统版本:Ubuntu 20.04.6 LTS
17+
* 操作系统内核:linux5.4.0-148-generic
18+
* CPU:
19+
* docker版本:20.10.25
20+
* 内存:
21+
* 服务器间AI芯片直连规格及带宽:此评测项不涉及服务期间AI芯片直连
22+
23+
# 算子库版本
24+
FlagGems:>联系邮箱: [email protected]获取版本(FlagGems最新适配版本)
25+
26+
# 评测结果
27+
28+
## 核心评测结果
29+
30+
| 评测项 | correctness | TFLOPS(cpu wall clock) | TFLOPS(kernel clock) | FU(FLOPS Utilization)-cputime | FU-kerneltime |
31+
| ---- | -------------- | -------------- | ------------ | ------ | ----- |
32+
| flaggems | True | 0.17TFLOPS | 0.17TFLOPS | 0.35% | 0.35% |
33+
| nativetorch | True | 0.09TFLOPS | 0.09TFLOPS | 0.19% | 0.19% |
34+
35+
## 其他评测结果
36+
37+
| 评测项 | cputime | kerneltime | cputime吞吐 | kerneltime吞吐 | 无预热时延 | 预热后时延 |
38+
| ---- | -------------- | -------------- | ------------ | ------------ | -------------- | -------------- |
39+
| flaggems | 19897.5us | 19933.18us | 50.26op/s | 50.17op/s | 3148718.55us | 20645.51us |
40+
| nativetorch | 35948.85us | 35989.24us | 27.82op/s | 27.79op/s | 80880.86us | 36463.62us |
41+
42+
## 能耗监控结果
43+
44+
| 监控项 | 系统平均功耗 | 系统最大功耗 | 系统功耗标准差 | 单机TDP | 单卡平均功耗 | 单卡最大功耗 | 单卡功耗标准差 | 单卡TDP |
45+
| ---- | ------- | ------- | ------- | ----- | ------------ | ------------ | ------------- | ----- |
46+
| nativetorch监控结果 | 2162.44W | 2166.0W | 13.8W | / | 143.8W | 144.0W | 1.84W | 350W |
47+
| flaggems监控结果 | 2194.5W | 2204.0W | 28.5W | / | 178.07W | 179.0W | 0.26W | 350W |
48+
49+
## 其他重要监控结果
50+
51+
| 监控项 | 系统平均CPU占用 | 系统平均内存占用 | 单卡平均温度 | 单卡最大显存占用 |
52+
| ---- | --------- | -------- | ------------ | -------------- |
53+
| nativetorch监控结果 | 98.748% | 1.819% | 43.9°C | 39.801% |
54+
| flaggems监控结果 | 98.196% | 1.856% | 49.98°C | 40.552% |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
M: 512
2+
N: 1024
3+
WARMUP: 100
4+
ITERS: 50000
5+
KERNELWARMUP: 10
6+
KERNELITERS: 1000
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export PYTHONPATH=/usr/local/corex/lib64/python3/dist-packages:$PYTHONPATH
2+
export LD_LIBRARY_PATH=/usr/local/corex/lib64:$LD_LIBRARY_PATH
3+
export PATH=/usr/local/corex/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/lib64/python3/dist-packages/bin:$PATH
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#loguru

0 commit comments

Comments
 (0)