Skip to content

Conversation

@shaohuzhang1
Copy link
Contributor

feat: Knowledge write node chunk embeding

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 28, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

self.post_embedding(document_model_list, knowledge_id, workspace_id)

write_content_list = [{
"name": document.get("name"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code appears to be functional but could benefit from certain enhancements and optimizations. Here are some suggestions:

  1. Code Formatting: Ensure consistent indentation and spacing for better readability:

        ...
            }).refresh()
  2. Variable Naming: Use meaningful variable names to improve code clarity, e.g., write_content_list instead of write_content.

  3. Comments: Add comments where necessary to explain the purpose of each function or operation.

  4. Documentation Strings: Include docstrings for all classes and methods.

  5. Error Handling: Consider adding error handling for database operations and other edge cases.

Here is a revised version with these improvements:

@@ -20,6 +20,7 @@
 from common.utils.common import bulk_create_in_batches
 from knowledge.models import Document, KnowledgeType, Paragraph, File, FileSourceType, Problem, ProblemParagraphMapping
 from knowledge.serializers.common import ProblemParagraphObject, ProblemParagraphManage
+from knowledge.serializers.document import DocumentSerializers
 
 
 class ParagraphInstanceSerializer(serializers.Serializer):
@@ -201,9 +201,19 @@ def save(self, documents):
         """
         Save the list of documents into the database.

         :param documents: List of dictionaries containing document data.
         :return: Tuple containing lists of saved models, knowledge ID, and workspace ID.
         """
         if not isinstance(documents, list) or documents == []:
             raise ValueError("Documents must be a non-empty list.")

         # Proceed with saving the documents...
@@ -228,7 +248,19 @@ def execute(self, documents, **kwargs) -> NodeResult:
     """

     # Execute the logic to process the documents

     document_model_list, knowledge_id, workspace_id = self.save(documents)

-    # Call a static method to perform additional operations like embedding generation
+    """
+    Post-processing step to generate embeddings, update metadata, etc., after documents have been created.
+    This can include indexing in external systems, setting up permissions, etc.
+    """
+    self.post_embedding(document_model_list, knowledge_id, workspace_id)

     write_content_list = [
         {
             "name": document.get("name"),

These changes enhance the code's readability, maintainability, and robustness while maintaining its functionality.

@zhanweizhang7 zhanweizhang7 merged commit 35fe162 into v2 Nov 28, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_embeding branch November 28, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants