Provide a way for kernels to request an optional scratch buffer size #2345

mansnils · 2023-12-07T09:38:13Z

A more optimized implementation may require more scratch buffer usage. This enables such an operator to have a fallback if tensor arena size is not enough,

A more optimized implementation may require more scratch buffer usage. This enables such an operator to have a fallback if tensor arena size is not enough, Change-Id: I6ccf72f3caaf2d6ff2d6cdddbf3bb62a26f31915

rascani · 2023-12-07T21:10:48Z

Just a couple initial thoughts:

From a user's perspective, determining the right size for the arena becomes a little complicated now. How do we communicate to the user that "if you provide X amount of memory, you'll improve latency"?
On the implementation, we should probably attach the actual scratch buffer size to the GetScratchBuffer call rather than flagging it in the node to use the fallback. Each node can have multiple scratch buffers, so we'll need to associate those with the scratch handles.

mansnils · 2023-12-08T19:52:41Z

Just a couple initial thoughts:

1. From a user's perspective, determining the right size for the arena becomes a little complicated now. How do we communicate to the user that "if you provide X amount of memory, you'll improve latency"?

2. On the implementation, we should probably attach the actual scratch buffer size to the `GetScratchBuffer` call rather than flagging it in the node to use the fallback. Each node can have multiple scratch buffers, so we'll need to associate those with the scratch handles.

Thanks @rascani
1.It's true that one might be more interested in a smaller size than more speed and vice versa.
If the same type of kernel (reference or OPTIMIZED_KERNEL=xxxx) would offer different scratch buffer sizes, then perhaps it would makes sense to have a command line option. Just as an example OPTIMIZE_KERNELS_FOR=size/speed/both.
Let's say then that selecting OPTIMIZE_KERNELS_FOR=speed would fail immediately if biggest scratch request was too big, and for OPTIMIZE_KERNELS_FOR=size kernels should only request its smallest scratch size.
OPTIMIZE_KERNELS_FOR=size_and_speed could behave like in this proposal. Or something along those lines..?
2.That would make sense yes and make it more usable.

rascani · 2023-12-12T00:12:23Z

@mansnils - I think I like the idea of a build flag, but I think that might beg the question: how useful is the "both" option? The code logic becomes very simple with a build flag: if optimizing for size, scratch buffer is X, else Y. The "both" option brings more complex logic and I'm not sure I see when it would be useful to the user. Maybe it makes sense for multi-tenant situations where multiple models are sharing the same arena space (thus the arena size is the max of the model set)?

mansnils · 2023-12-13T13:42:25Z

@rascani - perhaps the added complexity that comes with the "both" option can not be justified. Considering that the simpler build option proposal kind of solves the original problem, the "both" option can be seen more as nice to have. I am happy to update the draft PR with the new proposal.

…er size" This reverts commit eacae53.

mansnils · 2023-12-18T14:44:54Z

To give an example of a multi-tenant use case, it could be running a test suite with multiple models for a given target.
The simpler build flag proposal could still solve it by splitting the test suite in two and building two binaries.

github-actions · 2024-01-28T10:05:43Z

"This PR is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

mansnils · 2024-01-30T13:27:53Z

Closing in favor of #2408

Adds a build flag that can be used by any kernel to provide a different implementation depending on use case. Adds a first use case for cmsis-nn transpose conv. The background for this PR is in #2345 BUG=none

Adds a build flag that can be used by any kernel to provide a different implementation depending on use case. Adds a first use case for cmsis-nn transpose conv. The background for this PR is in tensorflow/tflite-micro#2345 BUG=none

Provide a way for kernels to request an optional scratch buffer size

eacae53

A more optimized implementation may require more scratch buffer usage. This enables such an operator to have a fallback if tensor arena size is not enough, Change-Id: I6ccf72f3caaf2d6ff2d6cdddbf3bb62a26f31915

mansnils added the ci:run label Dec 7, 2023

TFLM-bot removed the ci:run label Dec 7, 2023

mansnils added 2 commits December 15, 2023 09:06

Revert "Provide a way for kernels to request an optional scratch buff…

e823292

…er size" This reverts commit eacae53.

Add build flag for kernels to prioritize speed or size

5c98aea

github-actions bot added the Stale label Jan 28, 2024

mansnils mentioned this pull request Jan 30, 2024

Add kernel build flag for prioritizing speed or size #2408

Merged

mansnils closed this Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a way for kernels to request an optional scratch buffer size #2345

Provide a way for kernels to request an optional scratch buffer size #2345

Uh oh!

mansnils commented Dec 7, 2023

Uh oh!

rascani commented Dec 7, 2023

Uh oh!

mansnils commented Dec 8, 2023

Uh oh!

rascani commented Dec 12, 2023

Uh oh!

mansnils commented Dec 13, 2023

Uh oh!

mansnils commented Dec 18, 2023

Uh oh!

github-actions bot commented Jan 28, 2024

Uh oh!

mansnils commented Jan 30, 2024

Uh oh!

Uh oh!

Provide a way for kernels to request an optional scratch buffer size #2345

Provide a way for kernels to request an optional scratch buffer size #2345

Uh oh!

Conversation

mansnils commented Dec 7, 2023

Uh oh!

rascani commented Dec 7, 2023

Uh oh!

mansnils commented Dec 8, 2023

Uh oh!

rascani commented Dec 12, 2023

Uh oh!

mansnils commented Dec 13, 2023

Uh oh!

mansnils commented Dec 18, 2023

Uh oh!

github-actions bot commented Jan 28, 2024

Uh oh!

mansnils commented Jan 30, 2024

Uh oh!

Uh oh!