Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yorkie Intellgence python migration #446

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

sihyeong671
Copy link
Contributor

@sihyeong671 sihyeong671 commented Feb 2, 2025

What this PR does / why we need it:
Yorkie Intelligence Python Migration project using Poetry and Pyenv, providing a standardized environment for development and dependency management.

currently in a Work In Progress (WIP) state, with API development underway.
Which issue(s) this PR fixes:

Fixes #430

Checklist:

  • Added relevant tests or not required
  • Didn't break anything

Summary by CodeRabbit

  • New Features

    • Introduced AI-powered endpoints that generate GitHub issues, pull requests, and documentation with real-time streaming responses.
    • Improved API routing to streamline user interactions.
  • Documentation

    • Added a comprehensive guide for setting up the development environment and running the application, including installation instructions and testing commands.
  • Chores

    • Enhanced development workflows with refined ignore rules and automated code linting, ensuring a more reliable build process.
    • Added environment variables for model configuration to facilitate deployment.

Copy link
Contributor

coderabbitai bot commented Feb 2, 2025

Walkthrough

The pull request introduces new project configuration and API functionality for the Yorkie Intelligence project. It adds comprehensive configuration files (e.g., .gitignore, .pre-commit-config.yaml, README.md, pyproject.toml, and environment variables), establishes FastAPI endpoints for issues, pull requests, and document writing with real-time streaming responses, and integrates LangChain-based prompt templates. Additionally, language model initialization and settings management are implemented alongside new asynchronous tests for streaming responses.

Changes

File(s) Change Summary
yorkie-intelligence/.gitignore, yorkie-intelligence/.pre-commit-config.yaml, yorkie-intelligence/README.md, yorkie-intelligence/pyproject.toml, yorkie-intelligence/src/.env.deployment Added project configuration files covering ignore patterns, pre-commit hooks, documentation, dependency management, and environment-specific variables for model settings.
yorkie-intelligence/src/main.py, yorkie-intelligence/src/api/__init__.py, yorkie-intelligence/src/api/issue/{config.py, views.py, __init__.py}, yorkie-intelligence/src/api/pr/{config.py, views.py, __init__.py}, yorkie-intelligence/src/api/write_document/{config.py, views.py, __init__.py} Introduced the FastAPI application setup with new routers and endpoints (for issues, pull requests, and document writing) that leverage prompt templates and stream responses.
yorkie-intelligence/src/common/llms.py, yorkie-intelligence/src/common/utils.py Implemented a language model factory and settings configuration for dynamic model initialization using LangChain wrappers.
yorkie-intelligence/test/test_stream_response.py Added asynchronous tests to validate streaming responses for the new API endpoints.
yorkie-intelligence/src/api/issue/models.py, yorkie-intelligence/src/api/pr/models.py, yorkie-intelligence/src/api/write_document/models.py Introduced new Pydantic models for structured data validation in issue, pull request, and document APIs.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as FastAPI App
    participant L as Language Model
    C->>A: POST request to endpoint (e.g. /intelligence/pr/create)
    A->>L: Initialize processing chain with prompt template
    L-->>A: Stream response chunks
    A->>C: Deliver StreamingResponse (text/event-stream)
Loading

Possibly related PRs

  • Introduce MAINTAINING.md and CHANGELOG.md files #296: The changes in the main PR, which involve the addition of a .gitignore file, are related to the retrieved PR as both introduce a .gitignore file aimed at improving project management and version control practices.

Poem

I'm a rabbit, hopping free,
Through lines of code so lively and spree.
Streaming responses like a magic dance,
New endpoints giving our app a chance.
With prompts and models in playful delight,
I celebrate these changes with a bunny's light.
Hop on, let's code into the night!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f8bb7d and 2e3701f.

📒 Files selected for processing (4)
  • yorkie-intelligence/.gitignore (1 hunks)
  • yorkie-intelligence/.pre-commit-config.yaml (1 hunks)
  • yorkie-intelligence/pyproject.toml (1 hunks)
  • yorkie-intelligence/src/.env.deployment (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • yorkie-intelligence/.pre-commit-config.yaml
  • yorkie-intelligence/.gitignore
  • yorkie-intelligence/src/.env.deployment
  • yorkie-intelligence/pyproject.toml

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (15)
yorkie-intelligence/README.md (3)

1-2: Header Clarity and Consistency

The file header indicates a work-in-progress state correctly. Consider adding a space before the "(WIP)" for better readability (e.g., "Yorkie Intelligence (WIP)").


3-6: Improve Instruction Grammar and Capitalization

The setup instructions on lines 4-5 would benefit from improved grammar and consistent capitalization. For instance, "install python 3.10.* version(recommend using [pyenv]...)" can be refined for clarity and correctness.

-install python 3.10.* version(recommend using [pyenv](https://github.com/pyenv/pyenv))<br/>
-install [poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)<br/>
+Install Python 3.10.* (recommended to use [pyenv](https://github.com/pyenv/pyenv))<br/>
+Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)<br/>
🧰 Tools
🪛 LanguageTool

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)


26-26: Link Formatting for OpenAPI Documentation

The OpenAPI documentation URL on line 26 is presented as a bare URL. For improved readability and compliance with markdown lint guidelines, consider formatting this as a clickable link.

-you can see openapi in http://localhost:8000
+You can access the OpenAPI documentation at [http://localhost:8000](http://localhost:8000).
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

26-26: Bare URL used
null

(MD034, no-bare-urls)

yorkie-intelligence/.pre-commit-config.yaml (2)

6-9: Fix YAML Indentation for Hooks List Items.

YAMLlint reports that the indentation for the hooks list items is off (expected 2 spaces but found 4). To align with common pre-commit configuration style and YAMLlint expectations, please adjust the indentation of the hooks section and its child elements. For example, the section should be reindented as follows:

-  hooks:
-    # Run the linter.
-    - id: ruff
-    # Run the formatter.
-    - id: ruff-format
+  hooks:
+  # Run the linter.
+  - id: ruff
+  # Run the formatter.
+  - id: ruff-format
🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 7-7: wrong indentation: expected 2 but found 4

(indentation)


[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)


9-9: Add Newline at End of File.

The file is missing a newline at the end, which is causing a YAMLlint error. Please add a newline at the end of the file to conform with best practices and linting requirements.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

yorkie-intelligence/.gitignore (1)

137-163: Environment Managers and Lock Files Settings

This section (lines 137–163) includes configurations for various Python environment tools:

  • pyenv: Ignoring the .python-version is typical for libraries but may be reconsidered for applications.
  • pipenv: Pipfile.lock is commented out (per common pipenv recommendations).
  • poetry: The comments advise that for applications it is generally recommended to commit poetry.lock for reproducible builds, yet the file explicitly ignores it.
  • pdm: The guidance for pdm is clear and follows standard practices.

If this repository represents an application (as suggested by the migration objective) rather than a library, you might consider removing the exclusion for poetry.lock so that the lock file is tracked for dependency reproducibility.

For example, you could revise the poetry section diff as follows:

-poetry.lock
+# Remove the exclusion for poetry.lock if committing it for reproducible dependency resolution.
yorkie-intelligence/src/common/utils.py (3)

5-5: Fix typo in app_name.

There's a typo in the app name: "Intellignce" should be "Intelligence".

-    app_name: str = "Yorkie Intellignce"
+    app_name: str = "Yorkie Intelligence"

12-13: Make environment file path configurable.

The environment file path is hardcoded to "src/.env". Consider making it configurable to support different environments (development, testing, production).

-    model_config = SettingsConfigDict(env_file="src/.env")
+    model_config = SettingsConfigDict(env_file="${ENV_FILE:src/.env}")

16-18: Resolve TODO comment about caching.

The TODO comment asks about the difference between basic definition and using lru_cache. Let me help clarify this:

Using lru_cache with settings can improve performance by caching the settings instance, preventing repeated environment variable lookups. However, it's only beneficial if the settings are accessed frequently and don't change during runtime.

Would you like me to help implement the lru_cache pattern for the settings?

yorkie-intelligence/src/api/pr/views.py (1)

16-22: Improve error handling in event stream.

The current error handling is too broad and doesn't provide specific error information.

     async def event_stream():
         try:
             async for chunk in chain.astream(query):
                 yield f"data: {chunk}\n\n"
-        except Exception as e:
-            yield f"data: {str(e)}\n\n"
+        except ValueError as e:
+            yield f"data: Error: Invalid input - {str(e)}\n\n"
+        except ConnectionError as e:
+            yield f"data: Error: Language model service unavailable - {str(e)}\n\n"
+        except Exception as e:
+            yield f"data: Error: Unexpected error occurred - {str(e)}\n\n"
yorkie-intelligence/src/api/issue/views.py (1)

16-22: Enhance error handling with specific error types.

The current error handling catches all exceptions generically. Consider catching specific exceptions to provide more meaningful error messages.

Apply this diff to improve error handling:

     async def event_stream():
         try:
             async for chunk in chain.astream(query):
                 yield f"data: {chunk}\n\n"
-        except Exception as e:
-            yield f"data: {str(e)}\n\n"
+        except ValueError as e:
+            yield f"data: Invalid input: {str(e)}\n\n"
+        except RuntimeError as e:
+            yield f"data: Processing error: {str(e)}\n\n"
+        except Exception as e:
+            yield f"data: Unexpected error: {str(e)}\n\n"
yorkie-intelligence/src/api/pr/config.py (2)

105-107: Improve template string readability using multi-line strings.

The current string concatenation makes the prefix hard to read and maintain.

Apply this diff to improve readability:

-    prefix="I want you to act as a GitHub PR Writer for me. I'll provide you with brief notes about GitHub PR, and you just need to write the PR. "
-    "Please ensure that you follow the template used in example provided. Do not provide the example as it is. Please write your responses in English. "
-    "If there is insufficient information to create the PR, request additional information",
+    prefix="""
+        I want you to act as a GitHub PR Writer for me. I'll provide you with brief notes about GitHub PR, and you just need to write the PR.
+        Please ensure that you follow the template used in example provided. Do not provide the example as it is. Please write your responses in English.
+        If there is insufficient information to create the PR, request additional information
+    """.strip(),

7-100: Consider adding more diverse PR examples.

The current examples are good but could benefit from including scenarios like:

  • Security fixes
  • Documentation updates
  • Performance improvements
yorkie-intelligence/src/api/issue/config.py (2)

77-79: Improve template string readability for consistency with PR config.

Similar to the PR config, the string concatenation here could be improved for better readability.

Apply this diff to improve readability:

-    prefix="I want you to act as a GitHub Issue writer. I will provide brief information about the GitHub issue I want to create, and you should write the GitHub issue. "
-    "The types of issues you can write are bug 🐞 or enhancement 🌟. Please ensure that you follow the template used in each type of issue example provided. Do not provide the example as it is. Please write your responses in English. "
-    "If there is insufficient information to create the issue, request additional information.",
+    prefix="""
+        I want you to act as a GitHub Issue writer. I will provide brief information about the GitHub issue I want to create, and you should write the GitHub issue.
+        The types of issues you can write are bug 🐞 or enhancement 🌟. Please ensure that you follow the template used in each type of issue example provided. Do not provide the example as it is. Please write your responses in English.
+        If there is insufficient information to create the issue, request additional information.
+    """.strip(),

3-5: Consider extracting common template logic to reduce duplication.

The template structure is very similar between PR and issue configs. Consider creating a shared utility for template creation.

Create a new utility file src/common/templates.py:

from typing import List, Dict
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate

def create_template(template_str: str, examples: List[Dict], prefix: str, suffix: str) -> FewShotPromptTemplate:
    example_prompt = PromptTemplate.from_template(template_str)
    return FewShotPromptTemplate(
        example_prompt=example_prompt,
        examples=examples,
        prefix=prefix,
        suffix=suffix,
        input_variables=["content"],
    )

Then use it in both config files to reduce duplication.

Also applies to: 74-82

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62deec8 and 0952f4d.

📒 Files selected for processing (13)
  • yorkie-intelligence/.gitignore (1 hunks)
  • yorkie-intelligence/.pre-commit-config.yaml (1 hunks)
  • yorkie-intelligence/README.md (1 hunks)
  • yorkie-intelligence/pyproject.toml (1 hunks)
  • yorkie-intelligence/src/.env.sample (1 hunks)
  • yorkie-intelligence/src/api/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/issue/config.py (1 hunks)
  • yorkie-intelligence/src/api/issue/views.py (1 hunks)
  • yorkie-intelligence/src/api/pr/config.py (1 hunks)
  • yorkie-intelligence/src/api/pr/views.py (1 hunks)
  • yorkie-intelligence/src/common/llms.py (1 hunks)
  • yorkie-intelligence/src/common/utils.py (1 hunks)
  • yorkie-intelligence/src/main.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • yorkie-intelligence/src/api/init.py
  • yorkie-intelligence/src/.env.sample
  • yorkie-intelligence/pyproject.toml
🧰 Additional context used
🪛 YAMLlint (1.35.1)
yorkie-intelligence/.pre-commit-config.yaml

[warning] 7-7: wrong indentation: expected 2 but found 4

(indentation)


[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

🪛 LanguageTool
yorkie-intelligence/README.md

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)

🪛 markdownlint-cli2 (0.17.2)
yorkie-intelligence/README.md

26-26: Bare URL used
null

(MD034, no-bare-urls)

🪛 Ruff (0.8.2)
yorkie-intelligence/src/api/issue/views.py

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

yorkie-intelligence/src/api/pr/views.py

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (10)
yorkie-intelligence/README.md (4)

7-8: Development Section Acknowledgment

The "### dev" section is clear and uses a standard header for development instructions.


9-11: Dev Environment Command Block

The code block for development (poetry install --no-root) is clear and correctly formatted.


13-18: Production Environment Command Block

The "### prod" section and its associated command block (poetry install --without dev) are properly formatted and straightforward.


19-20: Section Header for 'How To Start'

The "## How To Start" section header is clear and appropriately introduces the commands to start the application.

yorkie-intelligence/.gitignore (5)

1-3: Autogenerated Header is Clear and Informative

The header clearly credits the Toptal gitignore generator and documents the templates used—this is useful context for future maintenance.


4-18: Linux Section is Standard and Comprehensive

The Linux block (lines 4–18) correctly ignores common temporary files (such as backup files ending with ~), transient files from fuse, KDE preferences, and Linux trash folders. This helps maintain a clean repository.


19-47: macOS and Remote Share Files Coverage

The macOS section (lines 19–47) properly lists system files like .DS_Store, Apple-specific files, and even directories potentially created on AFP shares. This ensures that macOS system artifacts do not clutter version control.


52-136: Python Build Artifacts and Testing Reports Inclusion

The block covering Python build artifacts (lines 52–136) is well crafted. It includes rules for ignoring:

  • Byte-compiled files (__pycache__/, *.py[cod], etc.),
  • C extensions (*.so),
  • Packaging directories (e.g., build/, dist/, etc.),
  • PyInstaller artifacts, installer logs, coverage reports, translation files, and even framework-specific files for Django, Flask, Scrapy, and Sphinx docs.

This comprehensive list covers most common unwanted files for a Python project.


164-243: Additional Tools, IDE, and Editor Configurations

The remaining sections (lines 164–243) cover a wide array of tools and configurations:

  • PEP 582 support,
  • Celery scheduling files,
  • SageMath, various environment directories,
  • IDE-specific configurations for Spyder, Rope, PyCharm, and Visual Studio Code (including local history and extension build files),
  • Linters and type checkers (mypy, Pyre, pytype, ruff, LSP config).

These entries are standard and help ensure that temporary, build-specific, or environment-specific files do not get committed.

yorkie-intelligence/src/main.py (1)

10-16: Restrict CORS settings for production.

The current CORS configuration is too permissive, allowing all origins (*), methods, and headers. This might pose security risks in a production environment.

Consider restricting these settings based on your deployment requirements. For example:

 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],
-    allow_methods=["*"],
-    allow_headers=["*"],
+    allow_origins=["https://your-frontend-domain.com"],
+    allow_methods=["GET", "POST"],
+    allow_headers=["Authorization", "Content-Type"],
     allow_credentials=True,
 )
✅ Verification successful

CORS settings must be restricted for a secure production setup.

The current configuration in yorkie-intelligence/src/main.py (lines 10-16) allows all origins, methods, and headers, which is suitable only for development. For production, it's recommended to limit these settings to trusted domains and permitted HTTP methods/headers.

  • File under review: yorkie-intelligence/src/main.py
  • Issue: Permissive CORS configuration (allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
  • Recommendation: Restrict these settings based on your deployment requirements (e.g., specific domains, required methods, and headers).
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Comment on lines 21 to 25
```sh
git clone https://github.com/yorkie-team/codepair.git
cd yorkie-installigence
poetry run uvicorn src.main:app --reload
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct Repository Navigation Command

Within the startup instructions, the command on line 23 (cd yorkie-installigence) appears to be a typo given the repository name. It should likely be cd yorkie-intelligence to match the project name.

-cd yorkie-installigence
+cd yorkie-intelligence
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```sh
git clone https://github.com/yorkie-team/codepair.git
cd yorkie-installigence
poetry run uvicorn src.main:app --reload
```

Comment on lines 7 to 20
def get_model():
if SETTINGS.model_type == "ollama":
llm = ChatOllama(model=SETTINGS.model_name, temperature=0)
elif SETTINGS.model_type == "openai":
llm = ChatOpenAI(
model=SETTINGS.model_name, api_key=SETTINGS.api_key, temperature=0
)
else:
raise ValueError("Invalid model type")

# TODO
# support more model type

return llm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type hints and improve error handling.

The function would benefit from type hints and better error handling for configuration issues.

-def get_model():
+from typing import Literal
+
+def get_model() -> ChatOllama | ChatOpenAI:
+    """Get a language model instance based on configuration.
+
+    Returns:
+        ChatOllama | ChatOpenAI: Configured language model instance.
+
+    Raises:
+        ValueError: If model_type is invalid or required configuration is missing.
+    """
+    model_type = SETTINGS.model_type.lower()
+    if not SETTINGS.model_name:
+        raise ValueError("model_name must be configured")
+
     if SETTINGS.model_type == "ollama":
+        if not SETTINGS.ollama_url:
+            raise ValueError("ollama_url must be configured for Ollama models")
         llm = ChatOllama(model=SETTINGS.model_name, temperature=0)
     elif SETTINGS.model_type == "openai":
+        if not SETTINGS.api_key:
+            raise ValueError("api_key must be configured for OpenAI models")
         llm = ChatOpenAI(
             model=SETTINGS.model_name, api_key=SETTINGS.api_key, temperature=0
         )
     else:
-        raise ValueError("Invalid model type")
+        raise ValueError(f"Invalid model_type: {SETTINGS.model_type}. Supported types: ollama, openai")

     return llm
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_model():
if SETTINGS.model_type == "ollama":
llm = ChatOllama(model=SETTINGS.model_name, temperature=0)
elif SETTINGS.model_type == "openai":
llm = ChatOpenAI(
model=SETTINGS.model_name, api_key=SETTINGS.api_key, temperature=0
)
else:
raise ValueError("Invalid model type")
# TODO
# support more model type
return llm
from typing import Literal
def get_model() -> ChatOllama | ChatOpenAI:
"""Get a language model instance based on configuration.
Returns:
ChatOllama | ChatOpenAI: Configured language model instance.
Raises:
ValueError: If model_type is invalid or required configuration is missing.
"""
model_type = SETTINGS.model_type.lower()
if not SETTINGS.model_name:
raise ValueError("model_name must be configured")
if SETTINGS.model_type == "ollama":
if not SETTINGS.ollama_url:
raise ValueError("ollama_url must be configured for Ollama models")
llm = ChatOllama(model=SETTINGS.model_name, temperature=0)
elif SETTINGS.model_type == "openai":
if not SETTINGS.api_key:
raise ValueError("api_key must be configured for OpenAI models")
llm = ChatOpenAI(
model=SETTINGS.model_name, api_key=SETTINGS.api_key, temperature=0
)
else:
raise ValueError(f"Invalid model_type: {SETTINGS.model_type}. Supported types: ollama, openai")
return llm

Comment on lines 12 to 14
@router.get("/streaming_pr")
async def get_pr(query: str, llm=Depends(get_model)):
chain = issue_template_prompt | llm | StrOutputParser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and fix Depends usage.

The endpoint needs input validation and should follow FastAPI's best practices for dependency injection.

+from typing import Annotated
+from pydantic import BaseModel, Field
+
+class PRQuery(BaseModel):
+    query: str = Field(..., min_length=1, description="The query to generate PR from")
+
 @router.get("/streaming_pr")
-async def get_pr(query: str, llm=Depends(get_model)):
+async def get_pr(
+    query: Annotated[str, Field(min_length=1)],
+    llm: Annotated[ChatOllama | ChatOpenAI, Depends(get_model)]
+):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.get("/streaming_pr")
async def get_pr(query: str, llm=Depends(get_model)):
chain = issue_template_prompt | llm | StrOutputParser()
from typing import Annotated
from pydantic import BaseModel, Field
class PRQuery(BaseModel):
query: str = Field(..., min_length=1, description="The query to generate PR from")
@router.get("/streaming_pr")
async def get_pr(
query: Annotated[str, Field(min_length=1)],
llm: Annotated[ChatOllama | ChatOpenAI, Depends(get_model)]
):
chain = issue_template_prompt | llm | StrOutputParser()
🧰 Tools
🪛 Ruff (0.8.2)

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)



@router.get("/streaming_issue")
async def get_issue(query: str, llm=Depends(get_model)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor dependency injection to follow best practices.

The current implementation of Depends(get_model) in the parameter default violates best practices (B008). This could lead to unexpected behavior if the dependency is evaluated at import time.

Apply this diff to fix the dependency injection:

-async def get_issue(query: str, llm=Depends(get_model)):
+async def get_issue(query: str, llm: Any = Depends(get_model)):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def get_issue(query: str, llm=Depends(get_model)):
async def get_issue(query: str, llm: Any = Depends(get_model)):
🧰 Tools
🪛 Ruff (0.8.2)

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

@sihyeong671 sihyeong671 requested a review from devleejb February 3, 2025 00:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
yorkie-intelligence/src/api/pr/views.py (1)

12-13: 🛠️ Refactor suggestion

Add input validation and fix Depends usage.

The endpoint needs input validation and should follow FastAPI's best practices for dependency injection.

+from typing import Annotated
+from pydantic import BaseModel, Field
+
+class PRQuery(BaseModel):
+    query: str = Field(..., min_length=1, description="The query to generate PR from")
+
 @router.get("/create")
-async def make_pr(query: str, llm=Depends(get_model)):
+async def make_pr(
+    query: Annotated[str, Field(min_length=1)],
+    llm: Annotated[ChatOllama | ChatOpenAI, Depends(get_model)]
+):
🧰 Tools
🪛 Ruff (0.8.2)

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

yorkie-intelligence/src/api/issue/views.py (1)

12-13: 🛠️ Refactor suggestion

Refactor dependency injection to follow best practices.

The current implementation of Depends(get_model) in the parameter default violates best practices (B008). This could lead to unexpected behavior if the dependency is evaluated at import time.

Apply this diff to fix the dependency injection:

-async def make_issue(query: str, llm=Depends(get_model)):
+async def make_issue(query: str, llm: Any = Depends(get_model)):

Don't forget to add the following import:

from typing import Any
🧰 Tools
🪛 Ruff (0.8.2)

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🧹 Nitpick comments (6)
yorkie-intelligence/README.md (3)

1-1: Title Formatting Improvement

Consider adding a space before the "(WIP)" to improve readability. For example, change
# Yorkie Intelligence(WIP)
to
# Yorkie Intelligence (WIP)


4-5: Clarify Installation Steps

The installation instructions could be rephrased to start with a capital letter and clarify the Python version requirement. For instance, you might update them as follows:

-install python 3.10.* version(recommend using [pyenv](https://github.com/pyenv/pyenv))<br/>
-install [poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)<br/>
+Install Python 3.10.x (recommended to use [pyenv](https://github.com/pyenv/pyenv))<br/>
+Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)<br/>

27-27: Improve OpenAPI Access Instruction

The instruction on line 27 can be improved for clarity and to comply with markdown best practices by replacing the preposition and the bare URL. For example:

-you can see openapi in http://localhost:8000/docs
+You can see the OpenAPI documentation at [http://localhost:8000/docs](http://localhost:8000/docs)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~27-~27: The preposition “at” seems more likely in this position than the preposition “in”.
Context: ...n:app --reload ``` you can see openapi in http://localhost:8000/docs

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_IN_AT)

🪛 markdownlint-cli2 (0.17.2)

27-27: Bare URL used
null

(MD034, no-bare-urls)

yorkie-intelligence/src/api/pr/views.py (2)

1-7: Add type hints for language models.

Import the LLM types to improve type safety and IDE support.

 from fastapi import APIRouter, Depends
 from fastapi.responses import StreamingResponse
 from langchain_core.output_parsers import StrOutputParser
+from langchain_community.chat_models import ChatOllama
+from langchain_openai import ChatOpenAI
 
 from src.common.llms import get_model
 from .config import issue_template_prompt

14-23: Enhance error handling with structured responses.

The current error handling could be improved to provide more detailed error information to clients.

     async def event_stream():
         try:
             async for chunk in chain.astream(query):
                 yield f"data: {chunk}\n\n"
         except Exception as e:
-            yield f"data: {str(e)}\n\n"
+            error_response = {
+                "error": {
+                    "type": e.__class__.__name__,
+                    "message": str(e)
+                }
+            }
+            import json
+            yield f"data: {json.dumps(error_response)}\n\n"
yorkie-intelligence/src/api/issue/views.py (1)

14-23: Enhance error handling in event stream.

While the basic error handling is in place, consider these improvements:

  1. Log the error for debugging
  2. Return a structured error response

Apply this diff to enhance error handling:

 async def event_stream():
     try:
         async for chunk in chain.astream(query):
             yield f"data: {chunk}\n\n"
     except Exception as e:
+        import logging
+        logging.error(f"Error in event stream: {str(e)}")
+        error_response = {
+            "error": "An error occurred during processing",
+            "detail": str(e)
+        }
-        yield f"data: {str(e)}\n\n"
+        yield f"data: {error_response}\n\n"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0952f4d and 3957af0.

📒 Files selected for processing (6)
  • yorkie-intelligence/README.md (1 hunks)
  • yorkie-intelligence/src/api/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/issue/views.py (1 hunks)
  • yorkie-intelligence/src/api/pr/views.py (1 hunks)
  • yorkie-intelligence/src/common/utils.py (1 hunks)
  • yorkie-intelligence/src/main.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • yorkie-intelligence/src/main.py
  • yorkie-intelligence/src/common/utils.py
🧰 Additional context used
🪛 Ruff (0.8.2)
yorkie-intelligence/src/api/issue/views.py

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

yorkie-intelligence/src/api/pr/views.py

13-13: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🪛 LanguageTool
yorkie-intelligence/README.md

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)


[uncategorized] ~27-~27: The preposition “at” seems more likely in this position than the preposition “in”.
Context: ...n:app --reload ``` you can see openapi in http://localhost:8000/docs

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_IN_AT)

🪛 markdownlint-cli2 (0.17.2)
yorkie-intelligence/README.md

27-27: Bare URL used
null

(MD034, no-bare-urls)

🔇 Additional comments (6)
yorkie-intelligence/README.md (1)

22-25: Correct Directory Navigation Command

There is a typo in the repository directory name on line 23. The command
cd yorkie-intalligence
should be corrected to
cd yorkie-intelligence
to accurately reflect the project name and ensure proper navigation.

-cd yorkie-intalligence
+cd yorkie-intelligence
yorkie-intelligence/src/api/__init__.py (3)

1-4: LGTM!

The imports are well-organized and include all necessary dependencies for the router setup.


6-6: LGTM!

The router is correctly initialized using FastAPI's APIRouter.


8-9: LGTM!

The sub-routers are correctly included with appropriate RESTful API prefixes.

yorkie-intelligence/src/api/issue/views.py (2)

1-6: LGTM!

The imports are well-organized and include all necessary dependencies for the issue creation endpoint.


9-9: LGTM!

The router is correctly initialized using FastAPI's APIRouter.

@sihyeong671 sihyeong671 self-assigned this Feb 6, 2025
- add pytest-asyncio for llm test
- add simple llm response test
- change api response
- fix api request
- add .gitignore rule(F401)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
yorkie-intelligence/test/test_stream_response.py (1)

26-40: Fix incorrect endpoint and remove code duplication.

Critical issues found:

  1. The function name test_stream_issue suggests testing issue creation, but it's testing the PR creation endpoint
  2. The code is duplicated from test_stream_pr

Apply these changes:

  1. Update the endpoint to test issue creation:
-            "POST", "/intelligence/pr/create", json={"query": "hi"}
+            "POST", "/intelligence/issue/create", json={"query": "hi"}
  1. Consider extracting common test logic to avoid duplication:
@pytest.fixture
async def api_client():
    async with AsyncClient(
        transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
    ) as client:
        yield client

async def test_stream_endpoint(api_client, endpoint: str, query: str):
    async with api_client.stream(
        "POST", endpoint, json={"query": query}
    ) as response:
        assert response.status_code == 200
        data = []
        async for line in response.aiter_lines():
            data.append(line)
        return data

@pytest.mark.asyncio
async def test_stream_pr(api_client):
    await test_stream_endpoint(api_client, "/intelligence/pr/create", "test PR query")

@pytest.mark.asyncio
async def test_stream_issue(api_client):
    await test_stream_endpoint(api_client, "/intelligence/issue/create", "test issue query")
♻️ Duplicate comments (1)
yorkie-intelligence/src/api/issue/views.py (1)

12-13: ⚠️ Potential issue

Fix dependency injection and add type hints.

The dependency injection issue from the previous review still needs to be addressed.

Apply this diff to fix the dependency injection and add proper type hints:

-async def make_issue(query: str = Body(embed=True), llm=Depends(get_model)):
+async def make_issue(
+    query: str = Body(embed=True),
+    llm: Any = Depends(get_model)
+):

Don't forget to add the following import:

+from typing import Any
🧹 Nitpick comments (1)
yorkie-intelligence/test/test_stream_response.py (1)

7-8: Translate and implement the TODO comment.

The TODO comment in Korean reduces code accessibility. Consider:

  1. Translating it to English
  2. Implementing the model configuration for tests
  3. Adding it to the test configuration file instead of inline comments

Apply this diff:

-# TODO
-# test시 smollm2:135m 모델 사용하도록 변경
+# TODO: Configure tests to use smollm2:135m model
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3957af0 and 47be6a4.

📒 Files selected for processing (9)
  • yorkie-intelligence/pyproject.toml (1 hunks)
  • yorkie-intelligence/src/api/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/issue/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/issue/views.py (1 hunks)
  • yorkie-intelligence/src/api/pr/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/pr/views.py (1 hunks)
  • yorkie-intelligence/src/common/llms.py (1 hunks)
  • yorkie-intelligence/src/common/utils.py (1 hunks)
  • yorkie-intelligence/test/test_stream_response.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • yorkie-intelligence/src/api/issue/init.py
  • yorkie-intelligence/src/api/pr/init.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • yorkie-intelligence/src/api/pr/views.py
  • yorkie-intelligence/src/api/init.py
  • yorkie-intelligence/src/common/utils.py
  • yorkie-intelligence/src/common/llms.py
  • yorkie-intelligence/pyproject.toml
🔇 Additional comments (2)
yorkie-intelligence/test/test_stream_response.py (1)

1-4: LGTM!

The imports are well-organized and appropriate for async testing with FastAPI.

yorkie-intelligence/src/api/issue/views.py (1)

1-11: LGTM! Well-organized imports and router setup.

The imports are properly organized following PEP 8 guidelines, and the router setup is clean.

Comment on lines 11 to 24
@pytest.mark.asyncio
async def test_stream_pr():
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
) as client:
async with client.stream(
"POST", "/intelligence/pr/create", json={"query": "hi"}
) as response:
assert response.status_code == 200 # 응답 상태 확인

# data = []
# async for line in response.aiter_lines():
# data.append(line)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage and data validation.

The test has several areas for improvement:

  1. It only verifies the response status code without validating the response data
  2. Uses hardcoded test data
  3. Doesn't test error cases
  4. Contains commented-out code that should be implemented

Consider implementing these improvements:

@pytest.mark.asyncio
@pytest.mark.parametrize("query,expected_status", [
    ("valid query", 200),
    ("", 400),
    ("invalid_query" * 1000, 400)  # Test query length limits
])
async def test_stream_pr(query: str, expected_status: int):
    async with AsyncClient(
        transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
    ) as client:
        async with client.stream(
            "POST", "/intelligence/pr/create", json={"query": query}
        ) as response:
            assert response.status_code == expected_status
            
            if response.status_code == 200:
                data = []
                async for line in response.aiter_lines():
                    data.append(line)
                    assert isinstance(line, str)  # Validate response format
                assert len(data) > 0  # Ensure we received some data

- add doc write test
- add document writing function
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
yorkie-intelligence/test/test_stream_response.py (1)

42-56: Implement response data validation.

The test should validate the streamed response data. Consider:

  1. Implementing the commented-out code for response validation
  2. Adding assertions to verify the response format and content
  3. Testing with different query inputs

Apply this diff to implement response validation:

         async with client.stream(
             "POST", "/intelligence/doc/create", json={"query": "hi"}
         ) as response:
             assert response.status_code == 200  # 응답 상태 확인
+            data = []
+            async for line in response.aiter_lines():
+                data.append(line)
+                assert isinstance(line, str)
+            assert len(data) > 0
🧹 Nitpick comments (3)
yorkie-intelligence/src/api/write_document/config.py (1)

76-107: Minor consistency nitpick regarding the punctuation and quotes in the prefix.
You might consider making the usage of quotes around “Yorkie Intelligence” more uniform to prevent any confusion or unintended formatting in multiline strings.

Example potential update:

-    prefix="""You are an AI documentation assistant named "Yorkie Intelligence.
+    prefix="""You are an AI documentation assistant named "Yorkie Intelligence."
yorkie-intelligence/src/api/__init__.py (1)

7-7: Add type hints for better code maintainability.

Consider adding type hints to improve code maintainability and IDE support.

Apply this diff:

-router = APIRouter()
+router: APIRouter = APIRouter()
yorkie-intelligence/test/test_stream_response.py (1)

7-8: Translate TODO comment to English and track it.

The TODO comment in Korean suggests using the smollm2:135m model for testing. Consider:

  1. Translating the comment to English for consistency.
  2. Creating an issue to track this requirement.

Apply this diff to translate the comment:

-# TODO
-# test시 smollm2:135m 모델 사용하도록 변경
+# TODO: Update tests to use smollm2:135m model

Would you like me to create an issue to track this TODO?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47be6a4 and fbb0370.

📒 Files selected for processing (5)
  • yorkie-intelligence/src/api/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/__init__.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/config.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/views.py (1 hunks)
  • yorkie-intelligence/test/test_stream_response.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • yorkie-intelligence/src/api/write_document/init.py
🔇 Additional comments (6)
yorkie-intelligence/src/api/write_document/config.py (3)

1-2: No issues found in the imports.


3-4: Definition of 'example_prompt' looks good.


5-74: Examples provide clear and comprehensive subscription details.
Any potential modifications are optional, but please confirm that the large Markdown content is correctly parsed and displayed as intended in downstream usage.

yorkie-intelligence/src/api/__init__.py (2)

7-11: LGTM! Clean router organization.

The router configuration follows FastAPI best practices with clear prefix naming and logical separation of concerns.


9-11: 🏁 Script executed:

#!/bin/bash
# Search for API documentation or OpenAPI specs
fd -e md -e yaml -e json . | xargs rg -l "intelligence.*(?:pr|issue|doc)"

Length of output: 75

yorkie-intelligence/test/test_stream_response.py (1)

11-24: Enhance test coverage and data validation.

The test has several areas for improvement that were previously identified:

  1. Only verifies response status without validating response data
  2. Uses hardcoded test data
  3. Doesn't test error cases
  4. Contains commented-out code that should be implemented

Comment on lines +1 to +5
from fastapi import APIRouter

from src.api.pr import router as pr_router
from src.api.issue import router as issue_router
from src.api.write_document import router as doc_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use relative imports for internal modules.

The current absolute imports starting with 'src' may cause issues when the package is installed. Consider using relative imports for internal modules.

Apply this diff to improve the imports:

 from fastapi import APIRouter
 
-from src.api.pr import router as pr_router
-from src.api.issue import router as issue_router
-from src.api.write_document import router as doc_router
+from .pr import router as pr_router
+from .issue import router as issue_router
+from .write_document import router as doc_router
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from fastapi import APIRouter
from src.api.pr import router as pr_router
from src.api.issue import router as issue_router
from src.api.write_document import router as doc_router
from fastapi import APIRouter
from .pr import router as pr_router
from .issue import router as issue_router
from .write_document import router as doc_router

Comment on lines 16 to 21
async def event_stream():
try:
async for chunk in chain.astream(query):
yield chunk
except Exception as e:
yield f"\n\n{str(e)}\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and input validation.

The current error handling is basic and could be improved. Consider:

  1. Validating the input query length and format.
  2. Structuring error responses consistently with success responses.
  3. Adding specific error handling for different types of exceptions.

Apply this diff to improve error handling:

 async def event_stream():
+    if not query or len(query.strip()) == 0:
+        yield "Error: Query cannot be empty"
+        return
+
     try:
         async for chunk in chain.astream(query):
             yield chunk
-    except Exception as e:
-        yield f"\n\n{str(e)}\n\n"
+    except ValueError as e:
+        yield f"Error: Invalid input - {str(e)}"
+    except Exception as e:
+        yield f"Error: An unexpected error occurred - {str(e)}"
+        # Log the error for debugging
+        logger.error(f"Unexpected error in write_document: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def event_stream():
try:
async for chunk in chain.astream(query):
yield chunk
except Exception as e:
yield f"\n\n{str(e)}\n\n"
async def event_stream():
if not query or len(query.strip()) == 0:
yield "Error: Query cannot be empty"
return
try:
async for chunk in chain.astream(query):
yield chunk
except ValueError as e:
yield f"Error: Invalid input - {str(e)}"
except Exception as e:
yield f"Error: An unexpected error occurred - {str(e)}"
# Log the error for debugging
logger.error(f"Unexpected error in write_document: {str(e)}")

Comment on lines 31 to 33
async with client.stream(
"POST", "/intelligence/pr/create", json={"query": "hi"}
) as response:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect endpoint URL.

The test is using the PR endpoint (/intelligence/pr/create) instead of the issue endpoint.

Apply this diff to fix the endpoint URL:

         async with client.stream(
-            "POST", "/intelligence/pr/create", json={"query": "hi"}
+            "POST", "/intelligence/issue/create", json={"query": "hi"}
         ) as response:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async with client.stream(
"POST", "/intelligence/pr/create", json={"query": "hi"}
) as response:
async with client.stream(
- "POST", "/intelligence/pr/create", json={"query": "hi"}
+ "POST", "/intelligence/issue/create", json={"query": "hi"}
) as response:

Copy link
Member

@devleejb devleejb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've left few comments.

Comment on lines +9 to +11
router.include_router(pr_router, prefix="/pr")
router.include_router(issue_router, prefix="/issue")
router.include_router(doc_router, prefix="/doc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about integrating these APIs into a single file using a path parameter?

Since all actions appear to be the same, with only the prompt differing, integrating them could improve scalability. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, only few show prompt is applied all these, but I think, in the future, each function can be implemented by different way to enhancement the model performance(ex. refering a vectordb, using langgraph agent). so, IMO if we have a plan to change the method, separating api is better choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q) Also, can we make mock llm to improve testability?

Copy link
Contributor Author

@sihyeong671 sihyeong671 Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this(FakeLLM). I'll apply these for testing, and maybe it might not be optimal solution. I'll try and share experience

@devleejb devleejb requested review from blurfx and krapie February 9, 2025 14:04
- remove empty files
- change config naming rule
- fix url path
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
yorkie-intelligence/README.md (1)

22-26: Correct Repository Navigation Command
The command cd yorkie-intalligence appears to contain a typo. It should likely be updated to:

-cd yorkie-intalligence
+cd yorkie-intelligence

This correction ensures consistency with the repository name.

yorkie-intelligence/src/api/write_document/views.py (1)

16-22: Enhance error handling and input validation.

The current implementation needs improved error handling and input validation.

 async def event_stream():
+    # Validate input
+    if not query or len(query.strip()) == 0:
+        yield "Error: Query cannot be empty"
+        return
+
+    # Add logging
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.info(f"Processing document writing request with query: {query[:50]}...")
+
     try:
         async for chunk in chain.astream(query):
             yield chunk
     except Exception as e:
-        yield f"\n\n{str(e)}\n\n"
+        error_msg = f"Error during document generation: {str(e)}"
+        logger.error(error_msg)
+        yield error_msg
yorkie-intelligence/src/api/issue/views.py (2)

13-13: Fix dependency injection to follow best practices.

The current implementation of Depends(get_model) in the parameter default violates best practices (B008). This could lead to unexpected behavior if the dependency is evaluated at import time.

Apply this diff to fix the dependency injection:

-async def make_issue(query: str = Body(embed=True), llm=Depends(get_model)):
+async def make_issue(query: str = Body(embed=True), llm: Any = Depends(get_model)):

14-23: Improve error handling and add type hints for event streaming.

The error handling could be more specific and structured, and the error messages should follow SSE format.

Apply this diff to improve the implementation:

+from typing import AsyncGenerator
+
 async def make_issue(query: str = Body(embed=True), llm: Any = Depends(get_model)):
     chain = issue_template_prompt | llm | StrOutputParser()
 
-    async def event_stream():
+    async def event_stream() -> AsyncGenerator[str, None]:
         try:
             async for chunk in chain.astream(query):
-                yield chunk
-        except Exception as e:
-            yield f"\n\n{str(e)}\n\n"
+                yield f"data: {chunk}\n\n"
+        except Exception as e:
+            error_message = {
+                "error": str(e),
+                "type": e.__class__.__name__
+            }
+            yield f"event: error\ndata: {str(error_message)}\n\n"
🧹 Nitpick comments (10)
yorkie-intelligence/README.md (2)

3-5: Improve Clarity and Grammar in Setup Instructions
The instructions in these lines would read more clearly with improved phrasing. For example, consider rewording:

  • From:
    install python 3.10.* version(recommend using [pyenv](https://github.com/pyenv/pyenv))<br/>
    To:
    Install Python 3.10.* (recommended using [pyenv](https://github.com/pyenv/pyenv))<br/>
  • And similarly update the poetry installation line for consistency.
🧰 Tools
🪛 LanguageTool

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)


28-28: Format OpenAPI Documentation URL
Consider formatting the URL as a Markdown link to improve readability. For example, replace the bare URL with:

[OpenAPI docs](http://localhost:8000/docs)

Also, capitalizing the sentence ("You can see…") would enhance clarity.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

28-28: Bare URL used
null

(MD034, no-bare-urls)

yorkie-intelligence/src/api/write_document/views.py (3)

8-11: Reduce consecutive blank lines.

Multiple consecutive blank lines can be reduced to a single blank line to maintain consistent spacing.

 from .config import document_writing_template_prompt
 
-
 router = APIRouter()
 

12-14: Add endpoint documentation and type hints.

The endpoint would benefit from:

  1. OpenAPI documentation using docstring
  2. Type hints for better code maintainability
 @router.post("/")
 async def write_document(query: str = Body(embed=True), llm=Depends(get_model)):
+    """
+    Generate document content based on the provided query.
+    
+    Args:
+        query (str): The input query for document generation
+        llm: The language model instance
+    
+    Returns:
+        StreamingResponse: A stream of generated document content
+    """
+    from langchain.chains import LLMChain
+    
     chain = document_writing_template_prompt | llm | StrOutputParser()

23-24: Add response headers for SSE.

For Server-Sent Events (SSE), additional headers should be set to ensure proper client handling.

-    return StreamingResponse(event_stream(), media_type="text/event-stream")
+    response = StreamingResponse(event_stream(), media_type="text/event-stream")
+    response.headers["Cache-Control"] = "no-cache"
+    response.headers["Connection"] = "keep-alive"
+    return response
yorkie-intelligence/src/api/issue/views.py (1)

8-11: Remove excessive blank lines.

There are multiple consecutive blank lines that can be reduced to a single blank line for better code organization.

Apply this diff to fix the spacing:

 from .config import issue_template_prompt

-

 router = APIRouter()

-
yorkie-intelligence/src/common/utils.py (4)

5-5: Fix typo in APP_NAME.

There's a typo in the application name: "Intellignce" should be "Intelligence".

-    APP_NAME: str = "Yorkie Intellignce"
+    APP_NAME: str = "Yorkie Intelligence"

8-9: Consider providing default values for required fields.

MODEL_TYPE and MODEL_NAME are required fields without defaults. This might cause initialization failures if not properly configured. Consider either:

  1. Providing sensible defaults
  2. Adding validation to ensure these values are set correctly
-    MODEL_TYPE: str
-    MODEL_NAME: str
+    MODEL_TYPE: str = "ollama"  # Default to ollama as shown in .env.deployment
+    MODEL_NAME: str = "smollm2:135m"  # Default model

7-7: Make OLLAMA_URL more configurable.

The current OLLAMA_URL default assumes a localhost setup, which might not work in production or different environments. Consider using a more flexible default or environment-specific configuration.

-    OLLAMA_URL: str = "localhost:11434"
+    OLLAMA_URL: str = "http://localhost:11434"  # Add protocol

12-12: Make env file path more flexible.

The hardcoded path "src/.env.deployment" might cause issues in different environments or when running tests. Consider:

  1. Making it configurable
  2. Supporting multiple environment files
  3. Using relative paths
-    model_config = SettingsConfigDict(env_file="src/.env.deployment")
+    model_config = SettingsConfigDict(
+        env_file=".env.deployment",
+        env_file_path=[".", "src/", "../src/"]
+    )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbb0370 and 425377b.

📒 Files selected for processing (8)
  • yorkie-intelligence/README.md (1 hunks)
  • yorkie-intelligence/src/.env.deployment (1 hunks)
  • yorkie-intelligence/src/api/issue/views.py (1 hunks)
  • yorkie-intelligence/src/api/pr/views.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/views.py (1 hunks)
  • yorkie-intelligence/src/common/llms.py (1 hunks)
  • yorkie-intelligence/src/common/utils.py (1 hunks)
  • yorkie-intelligence/test/test_stream_response.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • yorkie-intelligence/src/common/llms.py
  • yorkie-intelligence/src/api/pr/views.py
  • yorkie-intelligence/test/test_stream_response.py
🧰 Additional context used
🪛 LanguageTool
yorkie-intelligence/README.md

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)

🪛 markdownlint-cli2 (0.17.2)
yorkie-intelligence/README.md

28-28: Bare URL used
null

(MD034, no-bare-urls)

🔇 Additional comments (3)
yorkie-intelligence/README.md (3)

1-1: Header Indicates WIP Status
The project title clearly states it's a work in progress, which is helpful.


10-12: Dev Installation Command Reviewed
The dev installation command poetry install --no-root is clear and conforms to standard usage.


14-18: Production Installation Command Reviewed
The production command poetry install --without dev is properly specified for production use.

@krapie
Copy link
Member

krapie commented Feb 11, 2025

I will try to also review this PR, but it will take some time 😅

- add how to run test
- fix test code with fakellm
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
yorkie-intelligence/README.md (1)

24-24: ⚠️ Potential issue

Fix the directory name typo.

The directory name contains a typo: "yorkie-intalligence" should be "yorkie-intelligence".

-cd yorkie-intalligence
+cd yorkie-intelligence
🧹 Nitpick comments (7)
yorkie-intelligence/README.md (6)

1-2: Add a project description.

Consider adding a brief description of what Yorkie Intelligence is, its purpose, and its key features to help users understand the project better.


3-6: Improve the setting section formatting and clarity.

The setting section could be improved for better readability:

 ## Setting
-install python 3.10.* version(recommend using [pyenv](https://github.com/pyenv/pyenv))<br/>
-install [poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)<br/>
+### Prerequisites
+
+1. Install Python 3.10.* (we recommend using [pyenv](https://github.com/pyenv/pyenv))
+2. Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)
🧰 Tools
🪛 LanguageTool

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)


8-18: Add context for environment setup.

Consider explaining the differences between dev and prod environments and when to use each:

 ### dev
+Development environment includes all dependencies needed for development, testing, and debugging.
 
 ```sh
 poetry install --no-root 

prod

+Production environment includes only the dependencies required to run the application.

poetry install --without dev

---

`28-28`: **Format the OpenAPI URL properly.**

The OpenAPI URL should be properly formatted as a Markdown link:

```diff
-you can see openapi in http://localhost:8000/docs
+You can access the OpenAPI documentation at [http://localhost:8000/docs](http://localhost:8000/docs)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~28-~28: The preposition “at” seems more likely in this position than the preposition “in”.
Context: ...n:app --reload ``` you can see openapi in http://localhost:8000/docs ## How To R...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_IN_AT)

🪛 markdownlint-cli2 (0.17.2)

28-28: Bare URL used
null

(MD034, no-bare-urls)


30-34: Enhance test documentation.

Consider adding more details about the test suite:

 ## How To Run TestCode
+
+Our test suite includes:
+- Unit tests
+- Integration tests
+- API endpoint tests
 
 ```sh
 poetry run python -m pytest

+To run specific test files or test cases, you can use:
+sh +poetry run python -m pytest path/to/test_file.py -k "test_name" +


---

`36-37`: **Add basic Ollama setup steps.**

Consider adding basic setup steps before referring to the detailed documentation:

```diff
 ## Ollama SetUp
-If you want to more detail, visit [here!](https://hub.docker.com/r/ollama/ollama)
+
+Quick start:
+```sh
+docker pull ollama/ollama
+docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
+```
+
+For more detailed setup instructions, visit the [Ollama Docker Hub page](https://hub.docker.com/r/ollama/ollama).
yorkie-intelligence/test/test_stream_response.py (1)

10-18: Enhance the client fixture for better reusability and maintainability.

While the fixture correctly implements dependency override and client setup, consider these improvements:

+# Constants
+BASE_URL = "http://127.0.0.1:8000"
+FAKE_RESPONSES = [
+    "hello, I'm a Yorkie Intelligence ChatBot, How Can I help you?",
+    "Here's your pull request summary...",
+    "I've analyzed your issue...",
+    "I've prepared a document..."
+]

 @pytest_asyncio.fixture()
 async def client():
-    fake_responses = ["hello, I'm a Yorkie Intelligence ChatBot, How Can I help you?"]
-    fake_llm = FakeStreamingListLLM(responses=fake_responses)
+    fake_llm = FakeStreamingListLLM(responses=FAKE_RESPONSES)
     app.dependency_overrides[get_model] = lambda: fake_llm
     async with AsyncClient(
-        transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
+        transport=ASGITransport(app=app), base_url=BASE_URL
     ) as client:
         yield client
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 304fafb and e824f3d.

📒 Files selected for processing (2)
  • yorkie-intelligence/README.md (1 hunks)
  • yorkie-intelligence/test/test_stream_response.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
yorkie-intelligence/README.md

[grammar] ~3-~3: There seems to be a noun/verb agreement error. Did you mean “installs” or “installed”?
Context: # Yorkie Intelligence(WIP) ## Setting install python 3.10.* version(recommend using [...

(SINGULAR_NOUN_VERB_AGREEMENT)


[uncategorized] ~28-~28: The preposition “at” seems more likely in this position than the preposition “in”.
Context: ...n:app --reload ``` you can see openapi in http://localhost:8000/docs ## How To R...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION_IN_AT)

🪛 markdownlint-cli2 (0.17.2)
yorkie-intelligence/README.md

28-28: Bare URL used
null

(MD034, no-bare-urls)

🔇 Additional comments (1)
yorkie-intelligence/test/test_stream_response.py (1)

1-8: LGTM! Good choice of testing tools.

The imports are well-organized and the choice of testing tools (pytest_asyncio, httpx, FakeStreamingListLLM) is appropriate for testing async streaming endpoints.

Comment on lines 21 to 48
@pytest.mark.asyncio()
async def test_stream_pr(client):
async with client.stream(
"POST", "/intelligence/pr/", json={"query": "hi"}
) as response:
assert response.status_code == 200


@pytest.mark.asyncio
async def test_stream_issue():
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
) as client:
async with client.stream(
"POST", "/intelligence/issue/", json={"query": "hi"}
) as response:
assert response.status_code == 200


@pytest.mark.asyncio
async def test_stream_write_doc():
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://127.0.0.1:8000"
) as client:
async with client.stream(
"POST", "/intelligence/doc/", json={"query": "hi"}
) as response:
assert response.status_code == 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve test coverage with parameterized tests and response validation.

The current tests only verify the response status code. They should also:

  1. Test different query scenarios
  2. Validate response data format
  3. Test error cases
  4. Verify streaming behavior

Here's a suggested implementation:

import pytest
from typing import AsyncGenerator

TEST_CASES = [
    pytest.param("valid query", 200, id="valid_query"),
    pytest.param("", 400, id="empty_query"),
    pytest.param("invalid_query" * 1000, 400, id="query_too_long"),
]

async def validate_stream_response(response: AsyncGenerator) -> None:
    data = []
    async for line in response.aiter_lines():
        data.append(line)
        assert isinstance(line, str)  # Validate response format
    assert len(data) > 0  # Ensure we received some data

@pytest.mark.asyncio
@pytest.mark.parametrize("query,expected_status", TEST_CASES)
async def test_stream_pr(client, query: str, expected_status: int):
    async with client.stream(
        "POST", "/intelligence/pr/", json={"query": query}
    ) as response:
        assert response.status_code == expected_status
        if response.status_code == 200:
            await validate_stream_response(response)

@pytest.mark.asyncio
@pytest.mark.parametrize("query,expected_status", TEST_CASES)
async def test_stream_issue(client, query: str, expected_status: int):
    async with client.stream(
        "POST", "/intelligence/issue/", json={"query": query}
    ) as response:
        assert response.status_code == expected_status
        if response.status_code == 200:
            await validate_stream_response(response)

@pytest.mark.asyncio
@pytest.mark.parametrize("query,expected_status", TEST_CASES)
async def test_stream_write_doc(client, query: str, expected_status: int):
    async with client.stream(
        "POST", "/intelligence/doc/", json={"query": query}
    ) as response:
        assert response.status_code == expected_status
        if response.status_code == 200:
            await validate_stream_response(response)

- add modles file and change query parameter type
- make in-memory chat history store
- modify apis to allow use of chat history
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
yorkie-intelligence/src/api/write_document/views.py (1)

34-42: 🛠️ Refactor suggestion

Improve error handling in event_stream function.

The current error handling is quite basic. Consider adding more specific error handling with proper error categorization and logging.

async def event_stream():
    yield f"{session_id}\n"
    try:
        async for chunk in chain.astream(
            content, config={"session_id": session_id}
        ):
            yield chunk
-    except Exception as e:
-        yield f"\n\n{str(e)}\n\n"
+    except ValueError as e:
+        yield f"\n\nInput Error: {str(e)}\n\n"
+    except ConnectionError as e:
+        yield f"\n\nConnection Error: Unable to connect to language model service\n\n"
+    except Exception as e:
+        # Add logging here
+        yield f"\n\nUnexpected Error: An error occurred while processing your request\n\n"
yorkie-intelligence/src/api/issue/views.py (2)

19-19: 🛠️ Refactor suggestion

Add type annotation to the dependency injection parameter.

The parameter llm lacks type annotation, which violates best practices (B008) for dependency injection in FastAPI. This could lead to unexpected behavior at import time.

Apply this diff to fix the dependency injection:

-async def make_issue(query: Query, llm=Depends(get_model)):
+async def make_issue(query: Query, llm: Any = Depends(get_model)):

You'll also need to add the import:

+from typing import Any
 from fastapi import APIRouter, Depends, Body

36-44: 🛠️ Refactor suggestion

Improve error handling and add type hints for event streaming.

The current error handling is basic and doesn't follow the Server-Sent Events (SSE) format. Proper SSE formatting includes prefixing messages with data: and ending with double newlines.

Apply this diff to improve the implementation:

+from typing import AsyncGenerator
+
-async def event_stream():
+async def event_stream() -> AsyncGenerator[str, None]:
    try:
        async for chunk in chain.astream(
            content, config={"session_id": session_id}
        ):
-            yield chunk
+            yield f"data: {chunk}\n\n"
    except Exception as e:
-        yield f"\n\n{str(e)}\n\n"
+        error_message = {
+            "error": str(e),
+            "type": e.__class__.__name__
+        }
+        yield f"event: error\ndata: {json.dumps(error_message)}\n\n"

Don't forget to add the import for json:

+import json
 from fastapi import APIRouter, Depends, Body
🧹 Nitpick comments (15)
yorkie-intelligence/src/api/pr/config.py (4)

12-105: Consider extracting examples to a separate configuration file.

The examples are well-structured and provide good templates for PR creation. However, hardcoding such large data structures directly in the code reduces maintainability. Consider moving these examples to a separate JSON or YAML configuration file.

- examples = [
-    {
-        "title": "Error with `qemu` when launching `yorkie` project using `yorkieteam/yorkie` image on Apple M1",
-        "content": """<!--  Thanks for sending a pull request! -->
-        
-#### What this PR does / why we need it?
-
-Change "documents" to "document" in DocEvent 
-
-#### Any background context you want to provide?
-
-There were previous proto changes in the js-sdk, so I have set the target branch to "concurrent-case-handling."
-
-#### What are the relevant tickets?
-<!--
-*Automatically closes linked issue when PR is merged.
-Usage: \`Fixes #<issue number>\`, or \`Fixes (paste link of issue)\`.
--->
-Related https://github.com/yorkie-team/yorkie/issues/612
-
-### Checklist
-- [x] Added relevant tests or not required
-- [x] Didn't break anything
-        """,
-    },
-    # ... other examples
- ]
+ from .example_data import examples  # Import from a separate module

And create a new file yorkie-intelligence/src/api/pr/example_data.py with the examples.


16-35: Fix inconsistent indentation in example PR content.

There is inconsistent indentation in the first example's content, particularly with extra spaces after the triple quotes. This might affect how Markdown is rendered.

        "content": """<!--  Thanks for sending a pull request! -->
-        
+
#### What this PR does / why we need it?

107-115: Consider breaking the long prefix string for better readability.

The prefix text is quite long and spans multiple lines. Consider using Python's triple quotes for multi-line strings to improve readability.

- prefix="I want you to act as a GitHub PR Writer for me. I'll provide you with brief notes about GitHub PR, and you just need to write the PR. "
-    "Please ensure that you follow the template used in example provided. Do not provide the example as it is. Please write your responses in English. "
-    "If there is insufficient information to create the PR, request additional information",
+ prefix="""I want you to act as a GitHub PR Writer for me. I'll provide you with brief notes about GitHub PR, and you just need to write the PR.
+    Please ensure that you follow the template used in example provided. Do not provide the example as it is. Please write your responses in English.
+    If there is insufficient information to create the PR, request additional information""",

1-127: Add docstrings to improve code documentation.

Consider adding docstrings to explain the purpose of each template and how they're intended to be used. This will help future developers understand the design intentions.

+ """
+ Configuration module for PR-related prompt templates using LangChain.
+ 
+ This module defines prompt templates for generating GitHub pull requests
+ and handling chat interactions within a Markdown editor context.
+ """
from langchain_core.prompts import (
    PromptTemplate,
    FewShotPromptTemplate,
    ChatPromptTemplate,
    MessagesPlaceholder,
)

+ """Template for formatting individual PR examples with title and content."""
example_prompt = PromptTemplate.from_template(
    "[Example]\n## Title\n{title}\n## Content\n{content}"
)

+ """Collection of example PRs used for few-shot learning."""
examples = [
    # ...
]

+ """
+ Few-shot prompt template for generating GitHub PRs.
+ 
+ Uses example PRs to guide the model in creating new PRs based on brief user input.
+ If insufficient information is provided, the prompt instructs to request more details.
+ """
pr_template_prompt = FewShotPromptTemplate(
    # ...
)

+ """
+ Chat template for AI assistance in a Markdown editor context.
+
+ Designed to maintain conversation context and respond to the latest user query.
+ """
chat_template = ChatPromptTemplate.from_messages(
    # ...
)
yorkie-intelligence/src/api/write_document/models.py (2)

1-7: Consider adding docstrings to improve code documentation.

Adding docstrings to describe the purpose of the class and its fields would improve code documentation and make it easier for other developers to understand the model's role in the application.

from pydantic import BaseModel


class Query(BaseModel):
+    """
+    Model representing a document write query.
+    
+    Attributes:
+        document_id: Unique identifier for the document
+        session_id: Optional session identifier for conversation tracking
+        content: The content to be written in the document
+    """
    document_id: str
    session_id: str | None = None
    content: str

1-7: Consider creating a common base model instead of duplicating across modules.

I notice that identical Query models are defined in multiple places (issue/models.py, pr/models.py, and here). Consider creating a shared base model in a common location to avoid duplication.

You could create a common model in src/common/models.py:

# src/common/models.py
from pydantic import BaseModel

class BaseQuery(BaseModel):
    """Base query model with common fields."""
    document_id: str
    session_id: str | None = None
    content: str

Then import and use or extend it in each module:

from src.common.models import BaseQuery

# Either use directly
Query = BaseQuery

# Or extend if needed
class Query(BaseQuery):
    # Add additional fields if required
    pass
yorkie-intelligence/src/api/write_document/views.py (3)

18-25: Add validation for content beyond Pydantic.

While Pydantic ensures that content exists, it doesn't validate its quality. Consider adding validation for empty or very short content.

async def write_document(query: Query, llm=Depends(get_model)):
    content = {"content": query.content}
+   
+    # Validate content quality
+    if not query.content.strip():
+        return StreamingResponse(
+            iter([f"{generate_session_id()}\n", "Error: Content cannot be empty"]),
+            media_type="text/event-stream"
+        )
+    if len(query.content.strip()) < 5:  # Adjust minimum length as needed
+        return StreamingResponse(
+            iter([f"{generate_session_id()}\n", "Error: Content is too short"]),
+            media_type="text/event-stream"
+        )

    if query.session_id is None:
        session_id = generate_session_id()
        _chain = document_writing_template_prompt | llm | StrOutputParser()
    else:
        session_id = query.session_id
        _chain = chat_template | llm | StrOutputParser()

1-44: Add logging throughout the endpoint.

Consider adding logging to track the flow of requests, especially for errors. This will help with debugging and monitoring in production.

from typing import Annotated
from fastapi import APIRouter, Depends, Body
from fastapi.responses import StreamingResponse
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.history import RunnableWithMessageHistory
+import logging

from src.common.llms import get_model
from src.common.utils import get_by_session_id, generate_session_id
from .models import Query
from .config import document_writing_template_prompt, chat_template

+# Set up logging
+logger = logging.getLogger(__name__)

router = APIRouter()


@router.post("/")
async def write_document(query: Query, llm=Depends(get_model)):
+    logger.info(f"Received document write request for document ID: {query.document_id}")
    content = {"content": query.content}

    if query.session_id is None:
        session_id = generate_session_id()
+        logger.info(f"Created new session: {session_id}")
        _chain = document_writing_template_prompt | llm | StrOutputParser()
    else:
        session_id = query.session_id
+        logger.info(f"Using existing session: {session_id}")
        _chain = chat_template | llm | StrOutputParser()

    chain = RunnableWithMessageHistory(
        _chain,
        get_by_session_id,
        input_messages_key="content",
        history_messages_key="chat_history",
    )

    async def event_stream():
        yield f"{session_id}\n"
        try:
+            logger.debug(f"Starting stream for session: {session_id}")
            async for chunk in chain.astream(
                content, config={"session_id": session_id}
            ):
                yield chunk
        except Exception as e:
+            logger.error(f"Error in stream for session {session_id}: {str(e)}")
            yield f"\n\n{str(e)}\n\n"

+    logger.info(f"Returning streaming response for session: {session_id}")
    return StreamingResponse(event_stream(), media_type="text/event-stream")

16-44: Add documentation for the API endpoint.

Adding API documentation will help users understand how to use this endpoint. FastAPI uses these docstrings to generate OpenAPI documentation.

@router.post("/")
async def write_document(query: Query, llm=Depends(get_model)):
+    """
+    Create or continue a document writing session.
+    
+    This endpoint accepts document content and processes it using a language model.
+    If no session_id is provided, a new session is created and the document writing
+    template is used. If a session_id is provided, the conversation continues using
+    the chat template.
+    
+    The response is streamed as events, with the first line containing the session_id.
+    
+    Args:
+        query: The Query object containing document_id, optional session_id, and content
+        llm: Language model obtained through dependency injection
+        
+    Returns:
+        StreamingResponse: A streaming response containing the generated text
+    """
    content = {"content": query.content}

    # Rest of the function...
yorkie-intelligence/src/api/issue/views.py (1)

45-45: Return a structured response object with session ID.

The API currently doesn't return the session ID to the client, which would be helpful for continuing the conversation in subsequent requests.

Apply this diff to include the session ID in the response headers:

-    return StreamingResponse(event_stream(), media_type="text/event-stream")
+    response = StreamingResponse(event_stream(), media_type="text/event-stream")
+    response.headers["X-Session-ID"] = session_id
+    return response
yorkie-intelligence/src/common/utils.py (4)

10-10: Fix typo in application name.

There's a spelling error in the application name.

-    APP_NAME: str = "Yorkie Intellignce"
+    APP_NAME: str = "Yorkie Intelligence"

39-41: Fix typo in TODO comment and clarify TTL cache implementation.

There's a spelling error in the TODO comment, and it's unclear how the TTLCache is being used, as it's not directly integrated with the InMemoryHistory class.

Apply these changes:

-# TODO
-# chane this to TTLCache
-store = TTLCache(maxsize=100, ttl=60)
+# TODO: Refine TTL parameters based on usage patterns
+# Cache for storing conversation histories with 60-second expiration
+store = TTLCache(maxsize=100, ttl=60)

Consider making the TTL and maxsize configurable via settings:

-store = TTLCache(maxsize=100, ttl=60)
+store = TTLCache(maxsize=SETTINGS.CACHE_MAX_SIZE, ttl=SETTINGS.CACHE_TTL_SECONDS)

And add these settings to the Settings class:

 class Settings(BaseSettings):
     APP_NAME: str = "Yorkie Intelligence"
     GO_BACKEND_URL: str = ""
     OLLAMA_URL: str = "localhost:11434"
     MODEL_TYPE: str
     MODEL_NAME: str
     API_KEY: str | None = None
+    CACHE_MAX_SIZE: int = 100
+    CACHE_TTL_SECONDS: int = 60

17-17: Consider making the env file path configurable.

Hardcoding the environment file path makes it less flexible for different deployment scenarios.

-    model_config = SettingsConfigDict(env_file="src/.env.deployment")
+    model_config = SettingsConfigDict(
+        env_file=".env.deployment", 
+        env_file_encoding="utf-8",
+        env_nested_delimiter="__"
+    )

This allows for more flexibility when deployed in different environments, as the default path can be overridden by environment variables.


30-30: Add type validation for messages.

The messages field should ensure that only valid message types are added.

-    messages: list[BaseMessage] = Field(default_factory=list)
+    messages: list[BaseMessage] = Field(
+        default_factory=list,
+        description="List of conversation messages"
+    )
yorkie-intelligence/src/api/write_document/config.py (1)

8-8: Consider adding more detail to the example prompt template.

The example prompt is quite minimal compared to the detailed examples. Adding more structure could help guide the model better.

-example_prompt = PromptTemplate.from_template("[Example]\n## Output\n{output}")
+example_prompt = PromptTemplate.from_template("[Example]\n## Title\n{title}\n## Output\n{output}")

If you implement this change, you'll need to add a "title" field to each example in the examples list.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e824f3d and 390d500.

📒 Files selected for processing (11)
  • yorkie-intelligence/pyproject.toml (1 hunks)
  • yorkie-intelligence/src/api/issue/config.py (1 hunks)
  • yorkie-intelligence/src/api/issue/models.py (1 hunks)
  • yorkie-intelligence/src/api/issue/views.py (1 hunks)
  • yorkie-intelligence/src/api/pr/config.py (1 hunks)
  • yorkie-intelligence/src/api/pr/models.py (1 hunks)
  • yorkie-intelligence/src/api/pr/views.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/config.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/models.py (1 hunks)
  • yorkie-intelligence/src/api/write_document/views.py (1 hunks)
  • yorkie-intelligence/src/common/utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • yorkie-intelligence/src/api/pr/views.py
  • yorkie-intelligence/pyproject.toml
🔇 Additional comments (22)
yorkie-intelligence/src/api/pr/config.py (3)

1-6: Imports look good.

The imports are clean and correctly specify only the necessary classes from langchain_core.prompts.


8-10: Simple and effective template structure.

The example_prompt is well defined with a clear structure for formatting PR examples with title and content sections.


117-126: Well-structured chat template.

The chat template is appropriately structured with a system message, history placeholder, and human message, which is perfect for a conversational AI assistant in a Markdown editor context.

yorkie-intelligence/src/api/write_document/models.py (1)

4-7: Model definition looks good.

The Query class is properly defined with appropriate type annotations for all fields. Using Pydantic's BaseModel is a good practice for input validation in FastAPI applications.

yorkie-intelligence/src/api/pr/models.py (3)

4-7: Model definition looks good.

The Query class is properly defined with appropriate type annotations for all fields. Using Pydantic's BaseModel is a good practice for input validation in FastAPI applications.


1-7: Consider adding docstrings to improve code documentation.

Adding docstrings to describe the purpose of the class and its fields would improve code documentation and make it easier for other developers to understand the model's role in the application.

from pydantic import BaseModel


class Query(BaseModel):
+    """
+    Model representing a pull request creation query.
+    
+    Attributes:
+        document_id: Unique identifier for the document
+        session_id: Optional session identifier for conversation tracking
+        content: The content for the pull request
+    """
    document_id: str
    session_id: str | None = None
    content: str

1-7: Code duplication across model files.

This model definition is duplicated across multiple files (issue/models.py, write_document/models.py, and here). Consider creating a common base model to reduce duplication.

yorkie-intelligence/src/api/issue/models.py (3)

4-7: Model definition looks good.

The Query class is properly defined with appropriate type annotations for all fields. Using Pydantic's BaseModel is a good practice for input validation in FastAPI applications.


1-7: Consider adding docstrings to improve code documentation.

Adding docstrings to describe the purpose of the class and its fields would improve code documentation and make it easier for other developers to understand the model's role in the application.

from pydantic import BaseModel


class Query(BaseModel):
+    """
+    Model representing an issue creation query.
+    
+    Attributes:
+        document_id: Unique identifier for the document
+        session_id: Optional session identifier for conversation tracking
+        content: The content for the issue
+    """
    document_id: str
    session_id: str | None = None
    content: str

1-7: Code duplication across model files.

This model definition is duplicated across multiple files (pr/models.py, write_document/models.py, and here). Consider creating a common base model to reduce duplication.

yorkie-intelligence/src/api/write_document/views.py (3)

16-17: API endpoint definition looks good.

The POST endpoint is properly defined using FastAPI's router with appropriate parameter annotations. Using the Query model from your models file for request validation is a good practice.


20-25: Conditional chain selection based on session is well implemented.

The logic for selecting different templates based on session existence is clear and well-structured. This approach allows for different conversation flows based on whether it's a new or existing session.


27-32: LangChain message history integration is well implemented.

Using RunnableWithMessageHistory for maintaining conversation history is a good approach. The configuration with appropriate input and history message keys is correct.

yorkie-intelligence/src/api/issue/views.py (1)

29-34:

❓ Verification inconclusive

Add error handling for RunnableWithMessageHistory initialization.

The current implementation doesn't handle potential errors during chain initialization, which could occur if there are issues with the message history store.

Consider adding a try-except block around this initialization:

-    chain = RunnableWithMessageHistory(
-        _chain,
-        get_by_session_id,
-        input_messages_key="content",
-        history_messages_key="chat_history",
-    )
+    try:
+        chain = RunnableWithMessageHistory(
+            _chain,
+            get_by_session_id,
+            input_messages_key="content",
+            history_messages_key="chat_history",
+        )
+    except Exception as e:
+        return StreamingResponse(
+            iter([f"data: Error initializing chat history: {str(e)}\n\n"]),
+            media_type="text/event-stream"
+        )

Let's verify how RunnableWithMessageHistory is typically used and if it throws exceptions:


🏁 Script executed:

#!/bin/bash
# Look for examples of RunnableWithMessageHistory error handling in the codebase
rg -A 5 -B 5 "RunnableWithMessageHistory.*try" --type py

Length of output: 58


Action Required: Verify Exception Handling for RunnableWithMessageHistory Initialization

After searching the repository for existing patterns of exception handling around RunnableWithMessageHistory, no similar try-except blocks were found. This suggests that error handling for this initialization might not be uniformly applied elsewhere. However, considering that issues with the message history store could potentially lead to runtime errors, it's important to manually verify whether exceptions may occur during initialization and if a try-except block is appropriate.

Please review the following changes in your local environment to ensure they fit your error-handling strategy:

  • Confirm that RunnableWithMessageHistory may indeed raise exceptions during initialization.
  • Verify that wrapping the initialization in a try-except block does not inadvertently obscure underlying issues.
  • Ensure that returning a StreamingResponse with the error message is consistent with the project’s expected error propagation framework.

Here’s the suggested diff:

-    chain = RunnableWithMessageHistory(
-        _chain,
-        get_by_session_id,
-        input_messages_key="content",
-        history_messages_key="chat_history",
-    )
+    try:
+        chain = RunnableWithMessageHistory(
+            _chain,
+            get_by_session_id,
+            input_messages_key="content",
+            history_messages_key="chat_history",
+        )
+    except Exception as e:
+        return StreamingResponse(
+            iter([f"data: Error initializing chat history: {str(e)}\n\n"]),
+            media_type="text/event-stream"
+        )

Please manually verify this change and confirm that it handles runtime exceptions in accordance with your expectations.

yorkie-intelligence/src/common/utils.py (2)

26-27:

❓ Verification inconclusive

Address the TODO comments for TTL cache implementation.

There are unresolved TODO comments about applying TTL cache, but a TTLCache is already implemented on line 41. This suggests inconsistency or incomplete implementation.


🏁 Script executed:

#!/bin/bash
# Check if InMemoryHistory already uses the TTLCache defined on line 41
rg -A 5 -B 5 "InMemoryHistory.*TTLCache" --type py

Length of output: 52


Action: Reconcile TTLCache Usage in InMemoryHistory

The grep search did not return any instance where InMemoryHistory leverages the TTLCache (as intended by the comment). Please verify one of the following:

  • If TTLCache is meant to be integrated within InMemoryHistory, update the class to actually use it.
  • Otherwise, remove or update the TODO comment in yorkie-intelligence/src/common/utils.py (lines 26–27) to reflect the current implementation.

44-47:

❓ Verification inconclusive

Consider thread safety for the global store variable.

The get_by_session_id function modifies a global dictionary, which could lead to race conditions in a multi-threaded environment.

Verify if the code runs in a multi-threaded environment:

If multi-threading is used, consider using a thread-safe dictionary or adding locks:

+import threading
+
+# Thread lock for store access
+_store_lock = threading.Lock()
 
 def get_by_session_id(session_id: str) -> BaseChatMessageHistory:
-    if session_id not in store:
-        store[session_id] = InMemoryHistory()
-    return store[session_id]
+    with _store_lock:
+        if session_id not in store:
+            store[session_id] = InMemoryHistory()
+        return store[session_id]

🏁 Script executed:

#!/bin/bash
# Check if the application uses multi-threading
rg -A 2 -B 2 "ThreadPoolExecutor|ProcessPoolExecutor|threading|multiprocessing|concurrent.futures" --type py

Length of output: 110


Review Threading Concern Verification for Global Store in get_by_session_id

It appears from the search output that there are no clear signs of multi-threading usage in the codebase (no references to ThreadPoolExecutor, threading, etc.). However, because the inference quality is low and the output was empty, please manually verify that no part of the application creates concurrent access to store. If there is any chance of future threading, consider applying a lock as suggested:

+import threading
+
+# Thread lock for store access
+_store_lock = threading.Lock()
 
 def get_by_session_id(session_id: str) -> BaseChatMessageHistory:
-    if session_id not in store:
-        store[session_id] = InMemoryHistory()
-    return store[session_id]
+    with _store_lock:
+        if session_id not in store:
+            store[session_id] = InMemoryHistory()
+        return store[session_id]

Please double-check the threading model usage across the codebase to confirm if this function may be accessed concurrently.

yorkie-intelligence/src/api/issue/config.py (3)

12-77: Excellent examples for few-shot learning.

The examples provided for GitHub issues are comprehensive, well-structured, and cover different issue types (bug, enhancement, common issue). They follow GitHub issue conventions and provide clear templates for the model to learn from.

The inclusion of real-world examples with proper markdown formatting will help the model generate more accurate and useful issue templates.


79-87: Well-structured prompt template with clear instructions.

The few-shot prompt template is well-designed with clear instructions for the model. It provides context on the task (GitHub issue writing), specifies the types of issues it can write, and instructs the model to follow the templates from the examples.

The instruction to request additional information when there's insufficient data is particularly valuable for ensuring quality issue creation.


89-98: Good chat template design for Markdown editor context.

The chat template is designed appropriately for a Markdown editor interface. It provides clear system instructions and properly handles the conversation history through the MessagesPlaceholder.

This implementation will allow for maintaining context in multi-turn conversations while keeping the model focused on the task of assisting with Markdown editing.

yorkie-intelligence/src/api/write_document/config.py (3)

10-79: Comprehensive documentation example with detailed formatting.

The example provided is detailed and illustrates proper documentation format for JavaScript code, including proper annotation of events, code examples, and alerts for important information.

The example will help the model generate similarly well-structured documentation responses.


81-112: Thorough AI assistant guidelines with clear boundaries.

The guidelines for the AI documentation assistant are comprehensive and clearly define expected behavior. They enforce neutrality, professionalism, and focus on documentation tasks while explicitly prohibiting certain types of content.

The instructions are specific about the assistant's name, language, and the format of responses, which will help ensure consistency in the interaction with users.


114-123: Chat template is consistent with other modules.

The chat template follows the same pattern as in the issue config, providing a consistent user experience across different features of the application.

Consistency in design patterns helps maintain a cohesive codebase and reduces cognitive load for developers working on the project.

Copy link
Member

@krapie krapie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!
It seems like the PR is still WIP, so I will start with the small but common ones.

- change pre-commit ruff version to 0.9.10
- add autofix arg in ruff
- add end line
- change .env.devlopyment API_KEY to empty string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Migration of Backend from NestJS to Go
3 participants