Skip to content

[GISel] Missing combines to expose common subexpressions and eliminate them #122905

Closed
@qcolombet

Description

@qcolombet

When running GISel on the given input IR, we end up generating essentially a one-to-one mapping with the input IR whereas most of the computation is just duplicated.
SDISel performs a much better job by doing instcombine-like optimizations that allow it to simplify the IR and eventually exposes the CSE opportunity.

This is not that surprising given that historically GISel has had a garbage in garbage out approach, but it may make sense to strengthen GISel combines for optimized builds.

Note: I observed this with AMDGPU but I suspect it affects all backends.

To Reproduce

Download the attached IR or copy paste the snippet below and run

llc -O3 -march=amdgcn -mcpu=gfx942  -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> repro.ll -o <sd|g>isel.s 

Result

GISel ends up generating two fdiv instructions whereas SDISel is able to CSE the whole thing and produces just one.

GISel:

        v_div_scale_f32 v1, s[0:1], v0, v0, 1.0
        v_mov_b32_e32 v7, v2
        v_mov_b32_e32 v2, v3
        v_mov_b32_e32 v3, v4
        v_rcp_f32_e32 v4, v1
        v_div_scale_f32 v5, vcc, 1.0, v0, 1.0
        v_fma_f32 v8, -v1, v4, 1.0
        v_fmac_f32_e32 v4, v8, v4
        v_mul_f32_e32 v8, v5, v4
        v_fma_f32 v9, -v1, v8, v5
        v_fmac_f32_e32 v8, v9, v4
        v_fma_f32 v1, -v1, v8, v5
        v_div_fmas_f32 v5, v1, v4, v8
        v_div_fixup_f32 v5, v5, v0, 1.0
        v_div_fmas_f32 v1, v1, v4, v8
        v_div_fixup_f32 v0, v1, v0, 1.0

SDISel:

        v_div_scale_f32 v1, s[0:1], v0, v0, 1.0
        v_rcp_f32_e32 v6, v1
        s_nop 0
        v_fma_f32 v7, -v1, v6, 1.0
        v_fmac_f32_e32 v6, v7, v6
        v_div_scale_f32 v7, vcc, 1.0, v0, 1.0
        v_mul_f32_e32 v8, v7, v6
        v_fma_f32 v9, -v1, v8, v7
        v_fmac_f32_e32 v8, v9, v6
        v_fma_f32 v1, -v1, v8, v7
        v_div_fmas_f32 v1, v1, v6, v8
        v_div_fixup_f32 v0, v1, v0, 1.0

IR Snippet

define void @foo(<1 x float> %in, ptr %out, ptr %out2) {
  %t1174 = insertvalue [4 x <1 x float>] zeroinitializer, <1 x float> %in, 0
  %t1175 = insertvalue [4 x <1 x float>] %t1174, <1 x float> %in, 1
  %t1176 = insertvalue [4 x <1 x float>] %t1175, <1 x float> %in, 2
  %t1177 = insertvalue [4 x <1 x float>] %t1176, <1 x float> %in, 3
  %t1178 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] zeroinitializer, [4 x <1 x float>] %t1177, 0, 0, 0, 0
  %t1179 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, [4 x <1 x float>] %t1177, 0, 0, 1, 0
  %t1180 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1179, [4 x <1 x float>] %t1177, 0, 0, 2, 0
  %t1181 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1180, [4 x <1 x float>] %t1177, 0, 0, 3, 0
  %t1182 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1181, [4 x <1 x float>] %t1177, 1, 0, 0, 0
  %t1183 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1182, [4 x <1 x float>] %t1177, 1, 0, 1, 0
  %t1184 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1183, [4 x <1 x float>] %t1177, 1, 0, 2, 0
  %t1185 = insertvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1184, [4 x <1 x float>] %t1177, 1, 0, 3, 0
  %t1186 = extractvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, 0, 0, 0, 0, 0
  %t1187 = fdiv <1 x float> splat (float 1.000000e+00), %t1186
  %t1188 = extractvalue [2 x [1 x [4 x [1 x [4 x <1 x float>]]]]] %t1178, 0, 0, 0, 0, 1
  %t1189 = fdiv <1 x float> splat (float 1.000000e+00), %t1188

  store <1 x float> %t1187, ptr %out
  store <1 x float> %t1189, ptr %out2
  ret void
}

Note

SDISel is essentially able to do the equivalent of:

opt -passes=instcombine,early-cse -S repro.ll -o -

Which yields:

define void @foo(<1 x float> %in, ptr %out, ptr %out2) {
  %t1187 = fdiv <1 x float> splat (float 1.000000e+00), %in
  store <1 x float> %t1187, ptr %out, align 4
  store <1 x float> %t1187, ptr %out2, align 4
  ret void
}

repro.ll.txt

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions