Skip to content

Conversation

@luxannaxul
Copy link

This PR is an updated version of PR #203, revised to align with the current project policies and coding standards.

The changes focus on improving MCP tool integration and consistency with the existing codebase. In particular, the implementation was adjusted to follow established conventions around tool registration, schema definitions, and centralized configuration.

As part of this update:

Tool names, descriptions, and user-facing text are now sourced from the appropriate shared modules.

MCP input schemas consistently use the existing typed schema definitions.

The overall structure of the MCP tools was aligned with the current architecture and style guidelines.

The behavior of existing tools remains unchanged, and the update is intended to be backward-compatible.

Feedback and suggestions are very welcome.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @luxannaxul, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Micro-Code-Processor (MCP) tools to enhance their adherence to project coding standards and improve maintainability. It introduces two key new functionalities: an update_repository tool for incremental graph database updates and a semantic_search tool for natural language-based code querying. These updates streamline tool integration and provide more flexible ways to interact with the codebase's knowledge graph.

Highlights

  • MCP Tool Refactoring: The existing MCP tools have been refactored to align with current project policies and coding standards, ensuring consistency in tool registration, schema definitions, and centralized configuration.
  • New UPDATE_REPOSITORY Tool: A new tool, update_repository, has been introduced to allow incremental updates to the Memgraph knowledge graph without clearing existing data.
  • New SEMANTIC_SEARCH Tool: A new semantic_search tool has been added, enabling natural language queries for functions based on their purpose, with conditional availability based on installed dependencies.
  • Improved Tool Descriptions and Schemas: Tool names, descriptions, and user-facing text are now sourced from shared modules, and MCP input schemas consistently use existing typed schema definitions.
  • Backward Compatibility: The changes are designed to be backward-compatible, ensuring existing tool behavior remains unchanged.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively introduces semantic search capabilities and a repository update tool, enhancing the MCP server's functionality. The implementation, particularly the handling of optional dependencies for semantic search, is well-structured. I've identified one critical issue related to a schema type mismatch that could cause runtime errors, along with a few medium-severity suggestions to improve code consistency, error handling, and remove redundancy. Addressing these points will further solidify this valuable feature addition.

cs.MCPParamName.TOP_K: MCPInputSchemaProperty(
type=cs.MCPSchemaType.INTEGER,
description=td.MCP_PARAM_TOP_K,
default="5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The TOP_K parameter is defined with type INTEGER, but its default value is provided as a string "5". This type mismatch can lead to schema validation errors or unexpected behavior when the tool is used with default parameters. The default value should be an integer to match the specified type.

Suggested change
default="5",
default=5,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class MCPInputSchemaProperty(TypedDict, total=False):
type: str
description: str
default: str

Comment on lines +286 to +288
except Exception as e:
logger.error(lg.MCP_ERROR_UPDATING.format(error=e))
return cs.MCP_UPDATE_ERROR.format(error=e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can obscure the underlying cause of an error and make debugging more difficult. It's better to catch more specific exceptions that you expect updater.run() to raise. This allows for more precise error logging and handling. If GraphUpdater can raise specific custom exceptions, they should be caught here.

return te.ERROR_WRAPPER.format(message=e)

async def semantic_search(self, natural_language_query: str, top_k: int = 5) -> str:
if self._semantic_search_tool is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You've introduced a _semantic_search_available flag in the constructor, which is a great way to track the availability of this optional feature. For consistency and clarity, it would be better to use this flag here instead of checking if _semantic_search_tool is None.

Suggested change
if self._semantic_search_tool is None:
if not self._semantic_search_available:

result = await self._semantic_search_tool.function(
query=natural_language_query, top_k=top_k
)
return str(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _semantic_search_tool.function already returns a formatted string. The str() cast here is redundant and can be removed.

Suggested change
return str(result)
return result

@greptile-apps
Copy link

greptile-apps bot commented Jan 2, 2026

Greptile Summary

Added two new MCP tools to enhance repository management and code discovery capabilities.

Key Changes:

  • update_repository tool: Provides incremental repository updates without database wipe, complementing the existing index_repository tool which performs full rebuilds
  • semantic_search tool: Conditionally registered only when semantic dependencies (torch, transformers, qdrant_client) are installed, preventing tool exposure in environments lacking required packages
  • Centralized string management: Tool names, descriptions, parameters, log messages, and response strings consistently sourced from constants.py, logs.py, and tool_descriptions.py following project conventions
  • Formatting improvements: Multi-element tuples in constants.py split across lines for consistency with project style guidelines

Architecture:

The conditional registration pattern uses has_semantic_dependencies() at initialization to check for optional dependencies. When unavailable, the tool is not registered and a clear log message guides users to install via uv sync --extra semantic. The update_repository tool leverages existing GraphUpdater infrastructure but skips the destructive clean_database() call that index_repository performs.

Minor Issue:

One syntax error in the top_k parameter default value (string "5" instead of integer 5).

Confidence Score: 4/5

  • Safe to merge after fixing the minor type error in default parameter value
  • The PR follows established project patterns for tool registration, centralized string management, and optional dependency handling. The implementation is well-structured with proper error handling and logging. One minor syntax issue exists (string default instead of integer for top_k parameter) that should be corrected before merge. The changes are backward-compatible and don't modify existing tool behavior.
  • codebase_rag/mcp/tools.py requires correction to the top_k default value (line 243)

Important Files Changed

Filename Overview
codebase_rag/constants.py Formatting changes for consistency - splits multi-element tuples to multiple lines following project style guidelines
codebase_rag/logs.py Adds log message templates for new MCP tools (semantic_search, update_repository)
codebase_rag/tools/tool_descriptions.py Adds descriptions for new MCP tools and parameters, following centralized string management pattern
codebase_rag/mcp/tools.py Adds optional semantic_search tool and update_repository tool with conditional registration; minor logic issue in dictionary access

Sequence Diagram

sequenceDiagram
    participant Client as MCP Client
    participant Registry as MCPToolsRegistry
    participant DepCheck as has_semantic_dependencies()
    participant SemanticTool as create_semantic_search_tool()
    participant Updater as GraphUpdater
    participant Ingestor as MemgraphIngestor
    participant VectorStore as Qdrant/Embeddings

    Note over Registry: __init__ - Tool Registration

    Registry->>DepCheck: Check if semantic deps installed
    alt Semantic dependencies available
        DepCheck-->>Registry: True
        Registry->>SemanticTool: Import and create tool
        SemanticTool-->>Registry: semantic_search_tool
        Note over Registry: Register SEMANTIC_SEARCH tool
    else Dependencies not available
        DepCheck-->>Registry: False
        Note over Registry: Log warning, skip registration
    end

    Note over Registry: Always register UPDATE_REPOSITORY

    Note over Client,VectorStore: Tool Execution Flow

    alt semantic_search (if available)
        Client->>Registry: semantic_search(query, top_k)
        Registry->>SemanticTool: Execute async function
        SemanticTool->>VectorStore: Embed query & search
        VectorStore-->>SemanticTool: Similar node IDs + scores
        SemanticTool->>Ingestor: Query nodes by IDs
        Ingestor-->>SemanticTool: Node metadata
        SemanticTool-->>Registry: Formatted results string
        Registry-->>Client: Search results
    else semantic_search (unavailable)
        Client->>Registry: semantic_search(query, top_k)
        Registry-->>Client: Error: Install semantic extras
    end

    alt update_repository (new)
        Client->>Registry: update_repository()
        Registry->>Updater: GraphUpdater.run()
        Note over Updater: NO database wipe
        Updater->>Ingestor: Incremental updates
        Ingestor-->>Updater: Success
        Updater-->>Registry: Complete
        Registry-->>Client: Update success message
    end

    alt index_repository (existing)
        Client->>Registry: index_repository()
        Registry->>Ingestor: clean_database()
        Note over Ingestor: FULL database wipe
        Ingestor-->>Registry: Cleared
        Registry->>Updater: GraphUpdater.run()
        Updater->>Ingestor: Full rebuild
        Ingestor-->>Updater: Success
        Updater-->>Registry: Complete
        Registry-->>Client: Index success message
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. codebase_rag/mcp/tools.py, line 240-244 (link)

    syntax: default value should be integer not string

    The default field expects the actual default value, not a string representation. Since top_k has type INTEGER, the default should be 5 (int) not "5" (string).

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@luxannaxul
Copy link
Author

luxannaxul commented Jan 7, 2026

@vitali87
please review this PR.
I do not really understand, why there should be a type error with the default value, becouse my language server is fine with what I did and gives me the opposite type error, when i change the vallue as suggested by the AI. The code seems to be ok, becouse using the new semantic search tool in the mcp server works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant