Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way for kernels to request an optional scratch buffer size #2345

Closed
wants to merge 3 commits into from

Conversation

mansnils
Copy link
Contributor

@mansnils mansnils commented Dec 7, 2023

A more optimized implementation may require more scratch buffer usage. This enables such an operator to have a fallback if tensor arena size is not enough,

A more optimized implementation may require more scratch buffer usage.
This enables such an operator to have a fallback if tensor arena size is
not enough,

Change-Id: I6ccf72f3caaf2d6ff2d6cdddbf3bb62a26f31915
@mansnils mansnils added the ci:run label Dec 7, 2023
@TFLM-bot TFLM-bot removed the ci:run label Dec 7, 2023
@rascani
Copy link
Contributor

rascani commented Dec 7, 2023

Just a couple initial thoughts:

  1. From a user's perspective, determining the right size for the arena becomes a little complicated now. How do we communicate to the user that "if you provide X amount of memory, you'll improve latency"?
  2. On the implementation, we should probably attach the actual scratch buffer size to the GetScratchBuffer call rather than flagging it in the node to use the fallback. Each node can have multiple scratch buffers, so we'll need to associate those with the scratch handles.

@mansnils
Copy link
Contributor Author

mansnils commented Dec 8, 2023

Just a couple initial thoughts:

1. From a user's perspective, determining the right size for the arena becomes a little complicated now. How do we communicate to the user that "if you provide X amount of memory, you'll improve latency"?

2. On the implementation, we should probably attach the actual scratch buffer size to the `GetScratchBuffer` call rather than flagging it in the node to use the fallback. Each node can have multiple scratch buffers, so we'll need to associate those with the scratch handles.

Thanks @rascani
1.It's true that one might be more interested in a smaller size than more speed and vice versa.
If the same type of kernel (reference or OPTIMIZED_KERNEL=xxxx) would offer different scratch buffer sizes, then perhaps it would makes sense to have a command line option. Just as an example OPTIMIZE_KERNELS_FOR=size/speed/both.
Let's say then that selecting OPTIMIZE_KERNELS_FOR=speed would fail immediately if biggest scratch request was too big, and for OPTIMIZE_KERNELS_FOR=size kernels should only request its smallest scratch size.
OPTIMIZE_KERNELS_FOR=size_and_speed could behave like in this proposal. Or something along those lines..?
2.That would make sense yes and make it more usable.

@rascani
Copy link
Contributor

rascani commented Dec 12, 2023

@mansnils - I think I like the idea of a build flag, but I think that might beg the question: how useful is the "both" option? The code logic becomes very simple with a build flag: if optimizing for size, scratch buffer is X, else Y. The "both" option brings more complex logic and I'm not sure I see when it would be useful to the user. Maybe it makes sense for multi-tenant situations where multiple models are sharing the same arena space (thus the arena size is the max of the model set)?

@mansnils
Copy link
Contributor Author

@rascani - perhaps the added complexity that comes with the "both" option can not be justified. Considering that the simpler build option proposal kind of solves the original problem, the "both" option can be seen more as nice to have. I am happy to update the draft PR with the new proposal.

@mansnils
Copy link
Contributor Author

To give an example of a multi-tenant use case, it could be running a test suite with multiple models for a given target.
The simpler build flag proposal could still solve it by splitting the test suite in two and building two binaries.

Copy link
Contributor

"This PR is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

@mansnils
Copy link
Contributor Author

Closing in favor of #2408

@mansnils mansnils closed this Jan 30, 2024
mergify bot pushed a commit that referenced this pull request Feb 23, 2024
Adds a build flag that can be used by any kernel to provide a different implementation depending on use case.
Adds a first use case for cmsis-nn transpose conv.

The background for this PR is in #2345

BUG=none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants