You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running GISel on the given input IR, we end up generating essentially a one-to-one mapping with the input IR whereas most of the computation is just duplicated.
SDISel performs a much better job by doing instcombine-like optimizations that allow it to simplify the IR and eventually exposes the CSE opportunity.
This is not that surprising given that historically GISel has had a garbage in garbage out approach, but it may make sense to strengthen GISel combines for optimized builds.
Note: I observed this with AMDGPU but I suspect it affects all backends.
To Reproduce
Download the attached IR or copy paste the snippet below and run
definevoid@foo(<1 x float> %in, ptr%out, ptr%out2) {
%t1174 = insertvalue [4 x <1 x float>] zeroinitializer, <1 x float> %in, 0%t1175 = insertvalue [4 x <1 x float>] %t1174, <1 x float> %in, 1%t1176 = insertvalue [4 x <1 x float>] %t1175, <1 x float> %in, 2%t1177 = insertvalue [4 x <1 x float>] %t1176, <1 x float> %in, 3%t1178 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] zeroinitializer, [4 x <1 x float>] %t1177, 0, 0, 0, 0%t1179 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, [4 x <1 x float>] %t1177, 0, 0, 1, 0%t1180 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1179, [4 x <1 x float>] %t1177, 0, 0, 2, 0%t1181 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1180, [4 x <1 x float>] %t1177, 0, 0, 3, 0%t1182 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1181, [4 x <1 x float>] %t1177, 1, 0, 0, 0%t1183 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1182, [4 x <1 x float>] %t1177, 1, 0, 1, 0%t1184 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1183, [4 x <1 x float>] %t1177, 1, 0, 2, 0%t1185 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1184, [4 x <1 x float>] %t1177, 1, 0, 3, 0%t1186 = extractvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, 0, 0, 0, 0, 0%t1187 = fdiv <1 x float> splat (float1.000000e+00), %t1186%t1188 = extractvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, 0, 0, 0, 0, 1%t1189 = fdiv <1 x float> splat (float1.000000e+00), %t1188store <1 x float> %t1187, ptr%outstore <1 x float> %t1189, ptr%out2retvoid
}
Note
SDISel is essentially able to do the equivalent of:
When running GISel on the given input IR, we end up generating essentially a one-to-one mapping with the input IR whereas most of the computation is just duplicated.
SDISel performs a much better job by doing
instcombine
-like optimizations that allow it to simplify the IR and eventually exposes the CSE opportunity.This is not that surprising given that historically GISel has had a garbage in garbage out approach, but it may make sense to strengthen GISel combines for optimized builds.
Note: I observed this with AMDGPU but I suspect it affects all backends.
To Reproduce
Download the attached IR or copy paste the snippet below and run
Result
GISel ends up generating two
fdiv
instructions whereas SDISel is able to CSE the whole thing and produces just one.GISel:
SDISel:
IR Snippet
Note
SDISel is essentially able to do the equivalent of:
Which yields:
repro.ll.txt
The text was updated successfully, but these errors were encountered: