You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add kernel build flag for prioritizing speed or size (#2408)
Adds a build flag that can be used by any kernel to provide a different implementation depending on use case.
Adds a first use case for cmsis-nn transpose conv.
The background for this PR is in #2345
BUG=none
By default CMSIS-NN is built by code that is downloaded to the TFLM tree.
12
14
It also possible to build CMSIS-NN code from an external path by specifying
13
15
CMSIS_PATH=<../path> and CMSIS_NN_PATH=<../path>. Note that both CMSIS_PATH and CMSIS_NN_PATH is needed
14
16
since CMSIS-NN has a dependency to CMSIS-Core. As a third option CMSIS-NN can be provided manually as an external library.
15
17
The examples below will illustrate this.
16
18
17
-
# Example - FVP based on Arm Corstone-300 software.
19
+
##Example - FVP based on Arm Corstone-300 software.
18
20
In this example, the kernel conv unit test is built. For more information about
19
21
this specific target, check out the [Corstone-300 readme](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/cortex_m_corstone_300/README.md).
20
22
@@ -39,3 +41,22 @@ external CMSIS-NN library as different compiler options may have been used.
39
41
Also note that if specifying CMSIS_NN_LIBS but not CMSIS_PATH and or CMSIS_NN_PATH, headers and
40
42
system/startup code from the default downloaded path of CMSIS would be used.
41
43
So CMSIS_NN_LIBS, CMSIS_NN_PATH and CMSIS_PATH should have the same base path and if not there will be a build error.
44
+
45
+
# Build for speed or size
46
+
It is possible to build for speed or size. The size option may be required for a large model on an embedded system with limited memory. Where applicable, building for size would result in higher latency paired with a smaller scratch buffer, whereas building for speed would result in lower latency with a larger scratch buffer. Currently only transpose conv supports this. See examples below.
47
+
48
+
## Example - building a static library with CMSIS-NN optimized kernels
49
+
More info on the target used in this example: https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/cortex_m_generic/README.md
50
+
51
+
Bulding for speed (default):
52
+
Note that speed is default so if leaving out OPTIMIZE_KERNELS_FOR completely that will be the default.
53
+
```
54
+
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_generic TARGET_ARCH=cortex-m55 OPTIMIZED_KERNEL_DIR=cmsis_nn OPTIMIZE_KERNELS_FOR=KERNELS_OPTIMIZED_FOR_SPEED microlite
55
+
56
+
```
57
+
58
+
Bulding for size:
59
+
```
60
+
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_generic TARGET_ARCH=cortex-m55 OPTIMIZED_KERNEL_DIR=cmsis_nn OPTIMIZE_KERNELS_FOR=KERNELS_OPTIMIZED_FOR_SIZE microlite
0 commit comments