-
Notifications
You must be signed in to change notification settings - Fork 189
upload file to sandbox #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
7dcbf2f
1e2502a
e12bd29
1cef23c
5bd3591
c8a9472
5fd25f6
a4d3d36
6efadd4
3e82be7
393a926
9602c6c
985cb26
9a4c0a3
2efc727
249edf5
22cd958
30e408b
94b338a
6bb7a30
bbf321f
852e6ec
5ae6b57
c64e2ba
36cdb1e
e0921fe
a3c1c55
bae12e6
7d9dee2
3b91e7b
954113e
624aea7
788fab0
0f56092
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,11 +1,14 @@ | ||||||||||||||||||||||||||||
import base64 | ||||||||||||||||||||||||||||
import mimetypes | ||||||||||||||||||||||||||||
import os | ||||||||||||||||||||||||||||
import re | ||||||||||||||||||||||||||||
import uuid | ||||||||||||||||||||||||||||
from io import BytesIO | ||||||||||||||||||||||||||||
from pathlib import Path | ||||||||||||||||||||||||||||
from typing import List, Optional, Tuple | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
from app.engine.index import IndexConfig, get_index | ||||||||||||||||||||||||||||
from app.engine.utils.file_helper import FileMetadata, save_file | ||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about moving save_file to file.py (this service) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think the file helper and file service are a bit duplicated, but the idea is to separate them out to be reused in both the engine code and API code. |
||||||||||||||||||||||||||||
from llama_index.core import VectorStoreIndex | ||||||||||||||||||||||||||||
from llama_index.core.ingestion import IngestionPipeline | ||||||||||||||||||||||||||||
from llama_index.core.readers.file.base import ( | ||||||||||||||||||||||||||||
|
@@ -31,94 +34,141 @@ def get_llamaparse_parser(): | |||||||||||||||||||||||||||
def default_file_loaders_map(): | ||||||||||||||||||||||||||||
default_loaders = get_file_loaders_map() | ||||||||||||||||||||||||||||
default_loaders[".txt"] = FlatReader | ||||||||||||||||||||||||||||
default_loaders[".csv"] = FlatReader | ||||||||||||||||||||||||||||
leehuwuj marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||
return default_loaders | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
class PrivateFileService: | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
To store the files uploaded by the user and add them to the index. | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
PRIVATE_STORE_PATH = "output/uploaded" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||||||||
def preprocess_base64_file(base64_content: str) -> Tuple[bytes, str | None]: | ||||||||||||||||||||||||||||
def _preprocess_base64_file(base64_content: str) -> Tuple[bytes, str | None]: | ||||||||||||||||||||||||||||
header, data = base64_content.split(",", 1) | ||||||||||||||||||||||||||||
mime_type = header.split(";")[0].split(":", 1)[1] | ||||||||||||||||||||||||||||
extension = mimetypes.guess_extension(mime_type) | ||||||||||||||||||||||||||||
# File data as bytes | ||||||||||||||||||||||||||||
return base64.b64decode(data), extension | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||||||||
def store_and_parse_file(file_name, file_data, extension) -> List[Document]: | ||||||||||||||||||||||||||||
def _store_file(file_name, file_data) -> FileMetadata: | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
Store the file to the private directory and return the file metadata | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
# Store file to the private directory | ||||||||||||||||||||||||||||
os.makedirs(PrivateFileService.PRIVATE_STORE_PATH, exist_ok=True) | ||||||||||||||||||||||||||||
file_path = Path(os.path.join(PrivateFileService.PRIVATE_STORE_PATH, file_name)) | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
# write file | ||||||||||||||||||||||||||||
with open(file_path, "wb") as f: | ||||||||||||||||||||||||||||
f.write(file_data) | ||||||||||||||||||||||||||||
return save_file(file_data, file_path=str(file_path)) | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||||||||
def _load_file_to_documents(file_metadata: FileMetadata) -> List[Document]: | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
Load the file from the private directory and return the documents | ||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||
extension = file_metadata.name.split(".")[-1] | ||||||||||||||||||||||||||||
|
extension = file_metadata.name.split(".")[-1] | |
_, extension = os.path.splitext(file_metadata.name) | |
extension = extension.lstrip(".") |
leehuwuj marked this conversation as resolved.
Show resolved
Hide resolved
leehuwuj marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent return type in _add_file_to_llama_cloud_index
method.
The method _add_file_to_llama_cloud_index
is annotated to return None
, but it returns a list containing the result of LLamaCloudFileService.add_file_to_pipeline
. Ensure the return type matches the annotation or update the annotation if a return value is intended.
Consider updating the method to return None
if the return value is not needed:
-return [
- LLamaCloudFileService.add_file_to_pipeline(
- project_id,
- pipeline_id,
- upload_file,
- custom_metadata={},
- )
-]
+LLamaCloudFileService.add_file_to_pipeline(
+ project_id,
+ pipeline_id,
+ upload_file,
+ custom_metadata={},
+)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
LLamaCloudFileService.add_file_to_pipeline( | |
project_id, | |
pipeline_id, | |
upload_file, | |
custom_metadata={}, | |
) | |
] | |
LLamaCloudFileService.add_file_to_pipeline( | |
project_id, | |
pipeline_id, | |
upload_file, | |
custom_metadata={}, | |
) |
leehuwuj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure file_metadata.document_ids
is consistently populated.
In the process_file
method, when using LlamaCloudIndex
, file_metadata.document_ids
is not populated. If document_ids
are needed later, consider updating the method to handle this case.
If document_ids
are not applicable for LlamaCloudIndex
, ensure that any downstream code handles file_metadata.document_ids
being None
or empty.
Uh oh!
There was an error while loading. Please reload this page.