Heuristic mapper produce worst result when size of CGRA increase #16

KelvinChung2000 · 2024-02-27T14:14:03Z

I am trying to map the following kernel.

#include <string.h>
#include <stdio.h>


#define SIZE  20
int A[SIZE], B[SIZE], C[SIZE];

void external_fun() {C[0] = 7;}

__attribute__((noinline))
void array_cond(){

   int x;
   for (int i=0;i<SIZE; i++){
	if (A[i] > B[i]){
		C[i] = A[i] - B[i];
		if (C[i] < 0)
			C[i] = 0;
		else
			C[i] = A[i] * B[i];
	}
	else{
		C[i] = A[i] + B[i];
		if (C[i] > 10)
			C[i] = 10;
		else
			C[i] = A[i] - B[i];
	}
   }
}

int main(){

int i,j;

for (i=0;i<SIZE; i++){
      A[i] = i * 2 + 5;
      B[i] = i * 3 + 8;
      C[i] = 0;
    }
    
array_cond();

for (i=0;i<SIZE; i++) printf("%d\n", C[i]);
    
return 0;

}

When the CGRA size is 4x4 the mapper produces a mapping with II=13. When the size is 8x8, the mapped II=19.

This is the configuration used

{
  "kernel"                : "array_cond",
  "targetFunction"        : false,
  "targetNested"          : false,
  "targetLoopsID"         : [0],
  "doCGRAMapping"         : true,
  "row"                   : 8,
  "column"                : 8,
  "precisionAware"        : false,
  "heterogeneity"         : false,
  "isTrimmedDemo"         : true,
  "heuristicMapping"      : true,
  "parameterizableCGRA"   : false,
  "diagonalVectorization" : false,
  "bypassConstraint"      : 4,
  "isStaticElasticCGRA"   : false,
  "ctrlMemConstraint"     : 10,
  "regConstraint"         : 8,
  "optLatency"            : {
                              "load" : 2,
                              "store": 2
                            },
  "optPipelined"          : ["load", "store"],
  "additionalFunc"        : {
                              "load" : [],
                              "store": []
                            }
}

The text was updated successfully, but these errors were encountered:

tancheng · 2024-02-28T05:17:57Z

Hi Kelvin, you can remove optLatency, optPipelined, and additionalFunc from the param.json, they are just used for evaluating specialized FUs.

Unfortunately, we still get II=12 on 4x4, and we may get worse mapping on 8x8 (due to the large exploration space for mapping). The reason is that we support control flows but didn't well optimize it. See the generated DFG as follows, the DFG looks fine/reasonable to me. Due to the if/else the kernel contains, the recurrence dependence cycle length is 12, which makes the mapping cannot achieve II less than 12. If you look into the generated kernel.ll (by modifying the dot.sh), you can figure out why there are so many blue arrows connecting br nodes with other nodes (let me know if you need some explanations).

So in conclusion, the control flows are not well optimized, which leads to "bad" performance. This can be potentially improved by manipulating the DFG with appropriate HW support.

KelvinChung2000 closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heuristic mapper produce worst result when size of CGRA increase #16

Heuristic mapper produce worst result when size of CGRA increase #16

KelvinChung2000 commented Feb 27, 2024

tancheng commented Feb 28, 2024

Heuristic mapper produce worst result when size of CGRA increase #16

Heuristic mapper produce worst result when size of CGRA increase #16

Comments

KelvinChung2000 commented Feb 27, 2024

tancheng commented Feb 28, 2024