Speed up test execution for non-cached tests #181

gladyshcodes · 2024-12-26T23:58:18Z

What

Speed up test execution by finding ways to addressing issues outlined below.

Why

While working on #179, I have found that screenshooting perhaps takes the most time when test runs in --no-cache mode. Sometimes, screenshots are taken several times when there's no need for it. Also, delay before making a screenshot is about a second or so.

The text was updated successfully, but these errors were encountered:

m2rads · 2024-12-27T08:17:41Z

That is a good observation and I here is my plan to speed it up. Here's an outline of what I have in mind:

Instead of saving screenshots in the folder, we can save it in the memory thus making efficient use of space and
Adding screenshots automatically after executing an action. (Currently we wait for AI to instruct us when to take sc)
Performing multiple actions per screenshot whenever possible. (Not sure if this is possible with Computer use API yet)

I think implementing these simple changes should speed up the AI execution by 2x at least.

@slavingia, @gladyshcodes Wdyt?

slavingia · 2024-12-27T23:43:52Z

Makes sense. I pinged Anthropic to see if they'd support multiple actions in one step.

gladyshcodes · 2024-12-29T19:26:44Z

Makes sense. I pinged Anthropic to see if they'd support multiple actions in one step.

Have you received a callback from Anthropic yet?

slavingia · 2024-12-30T17:41:11Z

Not yet, will bump

Shawns2759 · 2025-01-01T20:56:25Z

The executions are already pretty expensive. Do we have ways to cut down on cost as well as speed up executions?

slavingia · 2025-01-01T20:57:36Z

We should probably tackle #187 first, to see that, and then evaluate. Anything that caches computer use should help.

gladyshcodes · 2025-01-01T21:54:21Z

With this, we can debug things faster and measure results

Recently we introduced caching #179 that made test execution about 6 times faster. I have several more ideas in mind:

Running tests in parallel. Similarly to how Jest does that. This can skyrocket exec time
Running 'pre-validation' phase makes an initial LLM request to evaluate the test suite and answer questions like: do we need to lunch chromium? and what tests can be ran in parallel? letting the LLM decide the most efficient execution order itself
- For example, the new Bash tool API tests (Bash tool #233) we’re rolling out soon—most of them can be run in parallel.
- The initial LLM request may increase input tokens count, increasing costs but there may be a way to round it. Maybe leverage
  LLM memory and sending all tests at once and then referencing them by name to start execution
Batching multiple requests into a single LLM call could also help.

Hoping quota of LLM providers will decrease over time (similar to how the price of GFLOPS or disk space has dropped, making this tool more affordable for everyone

slavingia · 2025-01-01T22:01:11Z

Batching multiple requests into a single LLM call could also help.

This will be huge and eventually happen.

Running tests in parallel

This seems like relatively low-hanging fruit to explore. In theory a server could run one browser for every test (just keeping in mind chaining/caching) that needs be run, and the entire test suite should just take as long as the slowest chain of tests.

PedroAVJ · 2025-01-04T00:34:15Z

Wouldn't running tests in parallel run into rate limit issues, and in turn, make null the speed gains? I suppose it depends partly on the API key tier, but when I ran the original claude computer use demo I would constantly get rate limited

slavingia · 2025-01-04T00:43:56Z

Wouldn't running tests in parallel run into rate limit issues, and in turn, make null the speed gains? I suppose it depends partly on the API key tier, but when I ran the original claude computer use demo I would constantly get rate limited

Things may have changed, but overall you're right it'll be a bottleneck. I'll bring it up with them!

slavingia changed the title ~~[SUGGESTION] Performance bottleneck for non-cached tests~~ Performance bottleneck for non-cached tests Dec 30, 2024

slavingia changed the title ~~Performance bottleneck for non-cached tests~~ Speed up test execution for non-cached tests Jan 2, 2025

rmarescu added this to Shortest Jan 19, 2025

rmarescu moved this to For discussion in Shortest Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up test execution for non-cached tests #181

Speed up test execution for non-cached tests #181

gladyshcodes commented Dec 26, 2024

m2rads commented Dec 27, 2024

slavingia commented Dec 27, 2024

gladyshcodes commented Dec 29, 2024

slavingia commented Dec 30, 2024

Shawns2759 commented Jan 1, 2025

slavingia commented Jan 1, 2025

gladyshcodes commented Jan 1, 2025

slavingia commented Jan 1, 2025

PedroAVJ commented Jan 4, 2025

slavingia commented Jan 4, 2025

Speed up test execution for non-cached tests #181

Speed up test execution for non-cached tests #181

Comments

gladyshcodes commented Dec 26, 2024

What

Why

m2rads commented Dec 27, 2024

slavingia commented Dec 27, 2024

gladyshcodes commented Dec 29, 2024

slavingia commented Dec 30, 2024

Shawns2759 commented Jan 1, 2025

slavingia commented Jan 1, 2025

gladyshcodes commented Jan 1, 2025

slavingia commented Jan 1, 2025

PedroAVJ commented Jan 4, 2025

slavingia commented Jan 4, 2025