Add Record/Replay functionality for offline processing (Issue #2759) #2760

devin-ai-integration · 2025-05-05T22:23:37Z

Record/Replay Functionality for Offline Processing

Description

This PR implements the Record/Replay functionality requested in issue #2759. This feature allows users to:

Record a CrewAI run with all LLM responses using crewai run --record
Replay the run later without making any network calls using crewai run --replay

Benefits

Faster iteration during development
Ability to work offline without network connectivity
Predictable results for testing and debugging
Lower costs by not using tokens during development iterations

Implementation Details

Added a SQLite-based storage for caching LLM responses
Modified the LLM class to intercept and cache responses during recording
Added CLI flags for record and replay modes
Added tests to verify the functionality

Testing

Added unit tests for the LLM response cache
Added integration tests for the record/replay functionality
Manually tested the CLI commands

Usage Examples

# Record a run
crew = Crew(agents=[agent], tasks=[task])
crew.record_mode = True
crew.kickoff()

# Replay a run
crew = Crew(agents=[agent], tasks=[task])
crew.replay_mode = True
crew.kickoff()

Or via CLI:

# Record a run
crewai run --record

# Replay a run
crewai run --replay

Fixes #2759

Link to Devin run: https://app.devin.ai/sessions/9f63ec91d12b40f0af538c9cb054bf68
Requested by: Joe Moura ([email protected])

Co-Authored-By: Joe Moura <[email protected]>

devin-ai-integration · 2025-05-05T22:23:43Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

joaomdmoura · 2025-05-05T22:25:47Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Record/Replay Functionality Addition

Overview

The recent pull request introduces essential capabilities for offline processing through the enhancement of Record/Replay functionality for LLM responses. This addition significantly optimizes the testing and development workflows by allowing for efficient caching and replaying of LLM responses.

Key Findings

CLI Implementation (`cli.py`)

Positive: The implementation of new flags for record and replay modes is clean and well-structured.

Improvement Suggestion: To enhance user experience, validate mutually exclusive flags at the command flow’s entry point to prevent simultaneous usage:

@crewai.command()
@click.option("--record", is_flag=True, help="Record LLM responses for later replay")
@click.option("--replay", is_flag=True, help="Replay from recorded LLM responses")
def run(record: bool = False, replay: bool = False):
    if record and replay:
        raise click.UsageError("Cannot use --record and --replay simultaneously")
    run_crew(record=record, replay=replay)

Cache Storage Implementation (`llm_response_cache_storage.py`)

Positive: Adequate handling of SQLite connections and secure hashing for requests is implemented, along with proper error handling.

Improvement Suggestions:

Connection Pooling: Enhance performance by implementing connection pooling:

class LLMResponseCacheStorage:
    def __init__(self, db_path: str):
        self.db_path = db_path
        self._connection_pool = {}
        
    def _get_connection(self):
        thread_id = threading.get_ident()
        if thread_id not in self._connection_pool:
            self._connection_pool[thread_id] = sqlite3.connect(self.db_path)
        return self._connection_pool[thread_id]

Cache Expiration: Integrate a mechanism to remove expired responses to maintain cache relevance:

def cleanup_expired_cache(self, max_age_days: int = 7):
    with sqlite3.connect(self.db_path) as conn:
        cursor = conn.cursor()
        cursor.execute(
            """
            DELETE FROM llm_response_cache
            WHERE timestamp < datetime('now', '-? days')
            """,
            (max_age_days,)
        )
        conn.commit()

LLM Integration (`llm.py`)

Issues Identified:
1. Potential memory leak concerns if the cache handler is referenced improperly.
2. Inadequate error handling during cache operations which may lead to silent failures.

Suggested Improvements: Enhance cache handling logic with robust error management:

def call(self, messages: List[Dict[str, Any]], ...) -> str:
    try:
        if self._response_cache_handler and self._response_cache_handler.is_replaying():
            cached_response = self._response_cache_handler.get_cached_response(self.model, messages)
            if cached_response:
                return cached_response
        response = self._make_llm_call(messages, tools)
        if self._response_cache_handler and self._response_cache_handler.is_recording():
            self._response_cache_handler.cache_response(self.model, messages, response)
        return response
    except Exception as e:
        logger.error(f"LLM call failed: {e}")
        raise

Testing Coverage

Positive: The test cases cover both recording and replay functionalities comprehensively.

Improvement Suggestion: Expand test cases to cover edge scenarios including error handling and concurrent cache access:

def test_llm_cache_errors():
    handler = LLMResponseCacheHandler()
    handler.storage.add.side_effect = sqlite3.Error("Mock DB error")
    with pytest.raises(sqlite3.Error):
        handler.cache_response("model", [], "response")
    
def test_concurrent_cache_access():
    handler = LLMResponseCacheHandler()
    def concurrent_access():
        for _ in range(100):
            handler.cache_response("model", [], "response")
    threads = [Thread(target=concurrent_access) for _ in range(10)]
    [t.start() for t in threads]
    [t.join() for t in threads]

Historical Context and Lessons Learned

Security: SQL injection protection through parameterized queries is properly implemented. Input validation for user inputs is still necessary.
Performance: Introducing cache size limits and eviction policies can prevent overhead from excessive cache size.
Documentation: Comprehensive documentation additions are needed to explain the cache file management and usage examples clearly.

Conclusion

The implementation of the Record/Replay functionality is well-structured with a solid foundation. Addressing the identified suggestions around error handling, cache management, and testing can greatly enhance the robustness and effectiveness of this feature, making it more suitable for production use.

Action Items

Implement cache expiration and pooling.
Enhance error handling across the caching mechanism.
Document cache management procedures and provide usage examples.
Conduct tests for concurrent access scenarios.

Overall, the new feature is a valuable addition to improve the development workflow for LLM applications. With these improvements, the feature will be more resilient and efficient.

Co-Authored-By: Joe Moura <[email protected]>

…ng, and test improvements Co-Authored-By: Joe Moura <[email protected]>

Co-Authored-By: Joe Moura <[email protected]>

mikhail · 2025-05-07T19:15:36Z

Thanks @joaomdmoura for such a speedy PR!

From a high level review this accomplishes what the feature request #2759 described. One thing I find non-obvious is the on-disk storage. Does sqlite need to be specified and configured? What will happen if --record is specified by sqlite is not configured?

Additional thoughts:

What's the behavior when --replay is specified and it's mostly found but one specific request has a cache-miss?
Can this be extended to be used in tests?
Can this be leveraged inside python code? Meaning 1 agent has neverCache=True or something?

As I'm writing this I'm wondering if "replay" should be more complex than a boolean. For example I want unique scenarios:

Replay = required: when I'm running tests I want to replay or fail if the recording is not found.
Replay = opportunistic: this is basically cache. If identical task has been performed and result recorded then use it, otherwise do your LLM thing.
Replay = disabled: ignore previous recordings

Additionally the word choice "replay" will be conflicting with "replay" feature to do a specific task.

Add Record/Replay functionality for offline processing (Issue #2759)

d5dfd5a

Co-Authored-By: Joe Moura <[email protected]>

devin-ai-integration bot and others added 3 commits May 5, 2025 22:35

Fix cache expiration and concurrent test issues

6e8e066

Co-Authored-By: Joe Moura <[email protected]>

Implement reviewer suggestions: CLI validation, enhanced error handli…

dd5f170

…ng, and test improvements Co-Authored-By: Joe Moura <[email protected]>

Fix import formatting in crew.py

5cccf4f

Co-Authored-By: Joe Moura <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Record/Replay functionality for offline processing (Issue #2759) #2760

Add Record/Replay functionality for offline processing (Issue #2759) #2760

devin-ai-integration bot commented May 5, 2025

devin-ai-integration bot commented May 5, 2025

joaomdmoura commented May 5, 2025

mikhail commented May 7, 2025 •

edited

Loading

Add Record/Replay functionality for offline processing (Issue #2759) #2760

Are you sure you want to change the base?

Add Record/Replay functionality for offline processing (Issue #2759) #2760

Conversation

devin-ai-integration bot commented May 5, 2025

Record/Replay Functionality for Offline Processing

Description

Benefits

Implementation Details

Testing

Usage Examples

devin-ai-integration bot commented May 5, 2025

🤖 Devin AI Engineer

joaomdmoura commented May 5, 2025

Code Review Comment: Record/Replay Functionality Addition

Overview

Key Findings

CLI Implementation (cli.py)

Cache Storage Implementation (llm_response_cache_storage.py)

LLM Integration (llm.py)

Testing Coverage

Historical Context and Lessons Learned

Conclusion

Action Items

mikhail commented May 7, 2025 • edited Loading

CLI Implementation (`cli.py`)

Cache Storage Implementation (`llm_response_cache_storage.py`)

LLM Integration (`llm.py`)

mikhail commented May 7, 2025 •

edited

Loading