-
Notifications
You must be signed in to change notification settings - Fork 3k
Groundx doc pipeline #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Groundx doc pipeline #191
Conversation
WalkthroughIntroduces PDF and image preview in the upload flow, restructures UI with an active_tab state and segmented tabs, expands analysis displays and chat handling, and replaces granular progress tracking with a spinner-based polling loop. Adds display_pdf(file) and updates processing/preview/analysis sequences and styling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant UI as App UI (app.py)
participant GX as Ground X API
participant Utils as groundx_utils.poll_until_complete
U->>UI: Upload file
alt application/pdf
UI->>UI: display_pdf(file) via base64 iframe
else image/*
UI->>UI: Render image preview
else docx/other
UI->>UI: Show "preview after processing" notice
end
U->>UI: Click "Process"
UI->>GX: Start processing (create document)
UI->>Utils: poll_until_complete(process_id)
activate Utils
Utils->>GX: get_processing_status_by_id(...).ingest (poll)
GX-->>Utils: status (processing|complete|error|cancelled)
loop until terminal or timeout
Utils->>GX: poll status
GX-->>Utils: status
end
Utils-->>UI: completion or raise error
deactivate Utils
alt complete
UI->>GX: Fetch X-Ray data
UI->>UI: Switch active_tab -> analysis
UI->>UI: Render Analysis tabs (JSON, Summary, File Summary, Extracted Text, Keywords)
UI->>UI: Embedded document preview and sample content
else error/cancelled/timeout
UI->>UI: Show error message
end
U->>UI: Open Chat tab
UI->>UI: prepare_chat_context(xray, prompt)
UI->>UI: generate_chat_response(prompt, context)
UI-->>U: Stream/Show response (chat history maintained)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
groundX-doc-pipeline/groundx_utils.py (1)
70-83
: Add HTTP client timeouts when fetching X-Ray JSONExternal GETs lack timeouts; this can hang the app indefinitely.
- response = requests.get(document.xray_url) + response = requests.get(document.xray_url, timeout=15)- response = requests.get(doc.xray_url) + response = requests.get(doc.xray_url, timeout=15)Also applies to: 92-105
🧹 Nitpick comments (7)
groundX-doc-pipeline/groundx_utils.py (3)
52-66
: Use monotonic clock and configurable poll interval; keep UX spinnerLoop works, but timeouts should use time.monotonic and a poll_interval param to tune cadence. Also tolerate brief API hiccups without breaking the spinner.
-def poll_until_complete(gx: GroundX, process_id: str, timeout: int = 600) -> None: +def poll_until_complete(gx: GroundX, process_id: str, timeout: int = 600, poll_interval: float = 3.0) -> None: """Monitor document processing status until completion""" - start_time = time.time() + start_time = time.monotonic() # Use a spinner container for better UX with st.spinner("Processing document..."): while True: - status = gx.documents.get_processing_status_by_id(process_id=process_id).ingest + try: + status = gx.documents.get_processing_status_by_id(process_id=process_id).ingest + except Exception as e: + # brief backoff on transient errors + if time.monotonic() - start_time > timeout: + raise TimeoutError("Ground X ingest timed out.") from e + time.sleep(min(1.0, poll_interval)) + continue if status.status in {"complete", "error", "cancelled"}: break - if time.time() - start_time > timeout: + if time.monotonic() - start_time > timeout: raise TimeoutError("Ground X ingest timed out.") - time.sleep(3) + time.sleep(poll_interval)
25-35
: Fix return type hint for ensure_bucket (actual id is int)Function returns bucket.bucket_id which appears to be an int. Align the annotation (or use int | str) to avoid downstream confusion.
-@st.cache_resource(show_spinner=False) -def ensure_bucket(_gx: GroundX, name: str = "gx_demo") -> str: +@st.cache_resource(show_spinner=False) +def ensure_bucket(_gx: GroundX, name: str = "gx_demo") -> int:
36-51
: Align bucket_id typing across helpersingest_document accepts Union[str,int] at runtime; reflect this in type hints for clarity.
-def ingest_document(gx: GroundX, bucket_id: str, path: Path, mime: str) -> str: +from typing import Union + +def ingest_document(gx: GroundX, bucket_id: Union[str, int], path: Path, mime: str) -> str:groundX-doc-pipeline/app.py (4)
820-838
: Remove unused in_chat_mode flagin_chat_mode is set but never read; dead state.
- # Ensure we stay in chat mode - st.session_state.in_chat_mode = True
47-246
: Reduce CSS duplication and risky negative marginsLarge repeated blocks with aggressive overrides/negative margins make the layout brittle and harder to maintain. Consolidate shared button/column styles into a single CSS block and avoid overlapping z-index hacks unless needed.
503-538
: Minor: collapse upload status steps into a single status containerUX copy looks good. Consider wrapping step messages in a single st.status or st.container to avoid jitter.
482-486
: Clean up temp files after processingNamedTemporaryFile(delete=False) leaves files behind. After successful processing, unlink the file unless you intentionally keep it for re-processing.
- st.session_state.uploaded_file_path = tmp_file.name + st.session_state.uploaded_file_path = tmp_file.name + # TODO: after processing completes, consider: Path(tmp_file.name).unlink(missing_ok=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
groundX-doc-pipeline/app.py
(9 hunks)groundX-doc-pipeline/groundx_utils.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
groundX-doc-pipeline/groundx_utils.py (1)
groundX-doc-pipeline/evaluation_geval.py (2)
_poll_until_complete
(129-139)process_invoice
(117-127)
groundX-doc-pipeline/app.py (2)
groundX-doc-pipeline/groundx_utils.py (1)
process_document
(107-128)firecrawl-agent/app.py (1)
display_pdf
(60-71)
🪛 Ruff (0.12.2)
groundX-doc-pipeline/groundx_utils.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
groundX-doc-pipeline/app.py
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Simple statements must be separated by newlines or semicolons
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Simple statements must be separated by newlines or semicolons
267-267: SyntaxError: Unexpected indentation
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Simple statements must be separated by newlines or semicolons
274-274: SyntaxError: Unexpected indentation
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-278: SyntaxError: Expected a statement
278-278: SyntaxError: Unexpected indentation
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Simple statements must be separated by newlines or semicolons
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Simple statements must be separated by newlines or semicolons
407-407: SyntaxError: Unexpected indentation
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-435: SyntaxError: Expected a statement
435-435: SyntaxError: Unexpected indentation
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Simple statements must be separated by newlines or semicolons
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Simple statements must be separated by newlines or semicolons
554-554: SyntaxError: Unexpected indentation
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-599: SyntaxError: Expected a statement
600-600: SyntaxError: Unexpected indentation
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Simple statements must be separated by newlines or semicolons
603-603: SyntaxError: Unexpected indentation
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Simple statements must be separated by newlines or semicolons
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Simple statements must be separated by newlines or semicolons
746-746: SyntaxError: Unexpected indentation
762-762: SyntaxError: unindent does not match any outer indentation level
770-770: SyntaxError: unindent does not match any outer indentation level
772-772: SyntaxError: Expected a statement
772-772: SyntaxError: Expected a statement
772-772: SyntaxError: Expected a statement
772-772: SyntaxError: Expected a statement
772-772: SyntaxError: Simple statements must be separated by newlines or semicolons
774-774: SyntaxError: Expected a statement
774-774: SyntaxError: Expected a statement
774-774: SyntaxError: Expected a statement
774-774: SyntaxError: Expected a statement
774-775: SyntaxError: Expected a statement
775-775: SyntaxError: Expected a statement
775-775: SyntaxError: Expected a statement
775-775: SyntaxError: Expected a statement
775-775: SyntaxError: Expected a statement
775-775: SyntaxError: Simple statements must be separated by newlines or semicolons
776-776: SyntaxError: Unexpected indentation
792-792: SyntaxError: unindent does not match any outer indentation level
794-794: SyntaxError: Expected a statement
794-794: SyntaxError: Expected a statement
794-794: SyntaxError: Expected a statement
794-794: SyntaxError: Expected a statement
794-794: SyntaxError: Simple statements must be separated by newlines or semicolons
796-796: SyntaxError: Expected a statement
796-796: SyntaxError: Expected a statement
796-796: SyntaxError: Expected a statement
796-796: SyntaxError: Expected a statement
796-797: SyntaxError: Expected a statement
797-797: SyntaxError: Expected a statement
797-797: SyntaxError: Expected a statement
797-797: SyntaxError: Expected a statement
797-797: SyntaxError: Expected a statement
797-797: SyntaxError: Simple statements must be separated by newlines or semicolons
798-798: SyntaxError: Unexpected indentation
812-812: SyntaxError: unindent does not match any outer indentation level
820-820: SyntaxError: unindent does not match any outer indentation level
820-820: SyntaxError: Invalid annotated assignment target
820-821: SyntaxError: Expected an expression
821-821: SyntaxError: Unexpected indentation
835-835: SyntaxError: Expected a statement
835-835: SyntaxError: Expected a statement
835-835: SyntaxError: Expected a statement
835-835: SyntaxError: Expected a statement
835-835: SyntaxError: Simple statements must be separated by newlines or semicolons
837-837: SyntaxError: Unexpected indentation
840-840: SyntaxError: Expected a statement
840-840: SyntaxError: Expected a statement
840-840: SyntaxError: Expected a statement
840-840: SyntaxError: Expected a statement
840-841: SyntaxError: Expected a statement
841-841: SyntaxError: Expected a statement
841-841: SyntaxError: Expected a statement
841-841: SyntaxError: Expected a statement
841-841: SyntaxError: Expected a statement
841-841: SyntaxError: Simple statements must be separated by newlines or semicolons
857-857: SyntaxError: Expected a statement
def display_pdf(file): | ||
"""Display PDF preview using embedded iframe""" | ||
st.markdown("### PDF Preview") | ||
base64_pdf = base64.b64encode(file.read()).decode("utf-8") | ||
<<<<<<< Updated upstream | ||
|
||
# Embedding PDF in HTML | ||
======= | ||
>>>>>>> Stashed changes | ||
pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf" | ||
style="border: 1px solid #ddd; border-radius: 8px;" | ||
> | ||
</iframe>""" | ||
<<<<<<< Updated upstream | ||
|
||
# Displaying File | ||
st.markdown(pdf_display, unsafe_allow_html=True) | ||
|
||
# Chat Interface Functions | ||
======= | ||
st.markdown(pdf_display, unsafe_allow_html=True) | ||
|
||
>>>>>>> Stashed changes | ||
def prepare_chat_context(xray_data, prompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve merge conflict in display_pdf and reset file pointer
Unresolved conflict markers will crash the app. Also reset the file object after reading for downstream use.
-def display_pdf(file):
- """Display PDF preview using embedded iframe"""
- st.markdown("### PDF Preview")
- base64_pdf = base64.b64encode(file.read()).decode("utf-8")
-<<<<<<< Updated upstream
-
- # Embedding PDF in HTML
-=======
->>>>>>> Stashed changes
- pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
- style="border: 1px solid #ddd; border-radius: 8px;"
- >
- </iframe>"""
-<<<<<<< Updated upstream
-
- # Displaying File
- st.markdown(pdf_display, unsafe_allow_html=True)
-
-=======
- st.markdown(pdf_display, unsafe_allow_html=True)
-
->>>>>>> Stashed changes
+def display_pdf(file):
+ """Display PDF preview using embedded iframe"""
+ st.markdown("### PDF Preview")
+ data = file.getvalue() if hasattr(file, "getvalue") else file.read()
+ base64_pdf = base64.b64encode(data).decode("utf-8")
+ pdf_display = f'''<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
+ style="border: 1px solid #ddd; border-radius: 8px;"></iframe>'''
+ st.markdown(pdf_display, unsafe_allow_html=True)
+ if hasattr(file, "seek"):
+ try:
+ file.seek(0)
+ except Exception:
+ pass
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def display_pdf(file): | |
"""Display PDF preview using embedded iframe""" | |
st.markdown("### PDF Preview") | |
base64_pdf = base64.b64encode(file.read()).decode("utf-8") | |
<<<<<<< Updated upstream | |
# Embedding PDF in HTML | |
======= | |
>>>>>>> Stashed changes | |
pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf" | |
style="border: 1px solid #ddd; border-radius: 8px;" | |
> | |
</iframe>""" | |
<<<<<<< Updated upstream | |
# Displaying File | |
st.markdown(pdf_display, unsafe_allow_html=True) | |
# Chat Interface Functions | |
======= | |
st.markdown(pdf_display, unsafe_allow_html=True) | |
>>>>>>> Stashed changes | |
def prepare_chat_context(xray_data, prompt): | |
def display_pdf(file): | |
"""Display PDF preview using embedded iframe""" | |
st.markdown("### PDF Preview") | |
# Read bytes, accommodating both fastAPI-style InMemoryUploadedFile and standard file-like objects | |
data = file.getvalue() if hasattr(file, "getvalue") else file.read() | |
base64_pdf = base64.b64encode(data).decode("utf-8") | |
pdf_display = f'''<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf" | |
style="border: 1px solid #ddd; border-radius: 8px;"></iframe>''' | |
st.markdown(pdf_display, unsafe_allow_html=True) | |
# Reset pointer so the file can be re-read later | |
if hasattr(file, "seek"): | |
try: | |
file.seek(0) | |
except Exception: | |
pass | |
def prepare_chat_context(xray_data, prompt): |
🧰 Tools
🪛 Ruff (0.12.2)
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Expected a statement
262-262: SyntaxError: Simple statements must be separated by newlines or semicolons
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-265: SyntaxError: Expected a statement
265-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Expected a statement
266-266: SyntaxError: Simple statements must be separated by newlines or semicolons
267-267: SyntaxError: Unexpected indentation
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Expected a statement
271-271: SyntaxError: Simple statements must be separated by newlines or semicolons
274-274: SyntaxError: Unexpected indentation
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-277: SyntaxError: Expected a statement
277-278: SyntaxError: Expected a statement
278-278: SyntaxError: Unexpected indentation
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Expected a statement
280-280: SyntaxError: Simple statements must be separated by newlines or semicolons
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 258 to 281, remove the unresolved
Git conflict markers and duplicate lines in display_pdf, keep a single coherent
implementation that builds the base64 PDF iframe and calls st.markdown once;
after reading the file to create base64_pdf call file.seek(0) to reset the file
pointer so downstream code can reuse the file, and ensure there are no leftover
"<<<<<<<", "=======" or ">>>>>>>" strings.
for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]: | ||
if key not in st.session_state: | ||
st.session_state[key] = None if key == "xray_data" else False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Initialize session_state keys with correct types
Current defaults set strings/paths to False, but later code checks for None (e.g., Line 852). This causes wrong branch selection and UX issues.
-for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]:
- if key not in st.session_state:
- st.session_state[key] = None if key == "xray_data" else False
+defaults = {
+ "xray_data": None,
+ "uploaded_file_path": None,
+ "uploaded_file_name": None,
+ "uploaded_file_type": None,
+ "processing_complete": False,
+ "used_existing_file": False,
+ "auto_loaded_file": False,
+ "active_tab": None,
+}
+for k, v in defaults.items():
+ if k not in st.session_state:
+ st.session_state[k] = v
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]: | |
if key not in st.session_state: | |
st.session_state[key] = None if key == "xray_data" else False | |
defaults = { | |
"xray_data": None, | |
"uploaded_file_path": None, | |
"uploaded_file_name": None, | |
"uploaded_file_type": None, | |
"processing_complete": False, | |
"used_existing_file": False, | |
"auto_loaded_file": False, | |
"active_tab": None, | |
} | |
for k, v in defaults.items(): | |
if k not in st.session_state: | |
st.session_state[k] = v |
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 354 to 357, the session_state
initialization sets string/path keys to False which later breaks None checks;
change the defaults so xray_data, uploaded_file_path, uploaded_file_name,
uploaded_file_type, and active_tab are initialized to None, while
processing_complete, used_existing_file, and auto_loaded_file remain initialized
to False so boolean flags keep correct types and downstream None checks behave
as expected.
<<<<<<< Updated upstream | ||
# Document Preview Section | ||
st.markdown("---") | ||
st.markdown("### 📄 Document Preview") | ||
|
||
# Show preview based on file type | ||
if uploaded.type == "application/pdf": | ||
# For PDF files, show the actual PDF preview using iframe | ||
display_pdf(uploaded) | ||
|
||
elif uploaded.type.startswith("image/"): | ||
# For image files, show the actual image | ||
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True) | ||
|
||
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": | ||
# For DOCX files | ||
st.info("📝 **Word Document** - Preview will be available after processing") | ||
st.markdown(f"**Content**: Text extraction in progress...") | ||
|
||
else: | ||
# For other file types | ||
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing") | ||
|
||
# Show file metadata | ||
st.markdown("**File Details:**") | ||
st.markdown(f"- **Name**: {uploaded.name}") | ||
st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB") | ||
st.markdown(f"- **Type**: {uploaded.type}") | ||
st.markdown(f"- **Status**: Ready for processing") | ||
======= | ||
st.markdown("---") | ||
st.markdown("### 📄 Document Preview") | ||
|
||
if uploaded.type == "application/pdf": | ||
display_pdf(uploaded) | ||
elif uploaded.type.startswith("image/"): | ||
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True) | ||
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": | ||
st.info("📝 **Word Document** - Preview will be available after processing") | ||
st.markdown(f"**Content**: Text extraction in progress...") | ||
else: | ||
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing") | ||
|
||
>>>>>>> Stashed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve merge conflict in Document Preview block (keep metadata + preview)
Conflict markers remain and will break execution. Recommend keeping the richer version with file metadata.
-<<<<<<< Updated upstream
- # Document Preview Section
- st.markdown("---")
- st.markdown("### 📄 Document Preview")
-
- # Show preview based on file type
- if uploaded.type == "application/pdf":
- # For PDF files, show the actual PDF preview using iframe
- display_pdf(uploaded)
-
- elif uploaded.type.startswith("image/"):
- # For image files, show the actual image
- st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
-
- elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
- # For DOCX files
- st.info("📝 **Word Document** - Preview will be available after processing")
- st.markdown(f"**Content**: Text extraction in progress...")
-
- else:
- # For other file types
- st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
-
- # Show file metadata
- st.markdown("**File Details:**")
- st.markdown(f"- **Name**: {uploaded.name}")
- st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB")
- st.markdown(f"- **Type**: {uploaded.type}")
- st.markdown(f"- **Status**: Ready for processing")
-=======
- st.markdown("---")
- st.markdown("### 📄 Document Preview")
-
- if uploaded.type == "application/pdf":
- display_pdf(uploaded)
- elif uploaded.type.startswith("image/"):
- st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
- elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
- st.info("📝 **Word Document** - Preview will be available after processing")
- st.markdown(f"**Content**: Text extraction in progress...")
- else:
- st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
-
->>>>>>> Stashed changes
+ st.markdown("---")
+ st.markdown("### 📄 Document Preview")
+ if uploaded.type == "application/pdf":
+ display_pdf(uploaded)
+ elif uploaded.type.startswith("image/"):
+ st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
+ elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
+ st.info("📝 Word Document — preview will be available after processing")
+ st.markdown("Content: text extraction in progress…")
+ else:
+ st.info(f"📄 {uploaded.type} — preview will be available after processing")
+ st.markdown("**File Details:**")
+ st.markdown(f"- Name: {uploaded.name}")
+ st.markdown(f"- Size: {uploaded.size / 1024:.1f} KB")
+ st.markdown(f"- Type: {uploaded.type}")
+ st.markdown("- Status: Ready for processing")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
<<<<<<< Updated upstream | |
# Document Preview Section | |
st.markdown("---") | |
st.markdown("### 📄 Document Preview") | |
# Show preview based on file type | |
if uploaded.type == "application/pdf": | |
# For PDF files, show the actual PDF preview using iframe | |
display_pdf(uploaded) | |
elif uploaded.type.startswith("image/"): | |
# For image files, show the actual image | |
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True) | |
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": | |
# For DOCX files | |
st.info("📝 **Word Document** - Preview will be available after processing") | |
st.markdown(f"**Content**: Text extraction in progress...") | |
else: | |
# For other file types | |
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing") | |
# Show file metadata | |
st.markdown("**File Details:**") | |
st.markdown(f"- **Name**: {uploaded.name}") | |
st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB") | |
st.markdown(f"- **Type**: {uploaded.type}") | |
st.markdown(f"- **Status**: Ready for processing") | |
======= | |
st.markdown("---") | |
st.markdown("### 📄 Document Preview") | |
if uploaded.type == "application/pdf": | |
display_pdf(uploaded) | |
elif uploaded.type.startswith("image/"): | |
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True) | |
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": | |
st.info("📝 **Word Document** - Preview will be available after processing") | |
st.markdown(f"**Content**: Text extraction in progress...") | |
else: | |
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing") | |
>>>>>>> Stashed changes | |
st.markdown("---") | |
st.markdown("### 📄 Document Preview") | |
if uploaded.type == "application/pdf": | |
display_pdf(uploaded) | |
elif uploaded.type.startswith("image/"): | |
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True) | |
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": | |
st.info("📝 Word Document — preview will be available after processing") | |
st.markdown("Content: text extraction in progress…") | |
else: | |
st.info(f"📄 {uploaded.type} — preview will be available after processing") | |
st.markdown("**File Details:**") | |
st.markdown(f"- Name: {uploaded.name}") | |
st.markdown(f"- Size: {uploaded.size / 1024:.1f} KB") | |
st.markdown(f"- Type: {uploaded.type}") | |
st.markdown("- Status: Ready for processing") |
🧰 Tools
🪛 Ruff (0.12.2)
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Expected a statement
405-405: SyntaxError: Simple statements must be separated by newlines or semicolons
407-407: SyntaxError: Unexpected indentation
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-434: SyntaxError: Expected a statement
434-435: SyntaxError: Expected a statement
435-435: SyntaxError: Unexpected indentation
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Expected a statement
448-448: SyntaxError: Simple statements must be separated by newlines or semicolons
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 405 to 448, remove the leftover Git
conflict markers and merge the two variants so the richer block is kept: keep
the "Document Preview" heading, the conditional preview rendering for PDF,
images and DOCX, the fallback info message, and also retain the file metadata
lines (Name, Size, Type, Status). Replace the conflict markers (<<<<<<<,
=======, >>>>>>>) with a single coherent block that includes both the preview
logic and the metadata st.markdown lines.
<<<<<<< Updated upstream | ||
# Document Preview Section (after processing) | ||
with st.expander("📄 Document Preview", expanded=False): | ||
st.markdown("### 📋 Document Summary") | ||
file_summary = xray.get("fileSummary") | ||
if file_summary: | ||
st.markdown(file_summary) | ||
else: | ||
st.info("No summary available") | ||
|
||
st.markdown("### 📝 Sample Content") | ||
# Show first few chunks of extracted text | ||
if "documentPages" in xray and xray["documentPages"]: | ||
sample_texts = [] | ||
for page in xray["documentPages"][:2]: # First 2 pages | ||
if "chunks" in page: | ||
for chunk in page["chunks"][:2]: # First 2 chunks per page | ||
if "text" in chunk and chunk["text"]: | ||
text = chunk["text"] | ||
if len(text) > 200: | ||
text = text[:200] + "..." | ||
sample_texts.append(text) | ||
|
||
if sample_texts: | ||
for i, text in enumerate(sample_texts, 1): | ||
st.markdown(f"**Sample {i}:**") | ||
st.markdown(text) | ||
st.markdown("---") | ||
else: | ||
st.info("No text content available for preview") | ||
|
||
st.markdown("### 🏷️ Key Topics") | ||
if xray.get("fileKeywords"): | ||
keywords_list = xray["fileKeywords"].split(",") | ||
# Show first 10 keywords | ||
display_keywords = keywords_list[:10] | ||
keyword_tags = " ".join([f"`{kw.strip()}`" for kw in display_keywords]) | ||
st.markdown(keyword_tags) | ||
else: | ||
st.info("No keywords available") | ||
|
||
# Primary interface tabs for analysis and interaction | ||
main_tabs = st.tabs([ | ||
"📊 X-Ray Analysis", | ||
"💬 Chat" | ||
]) | ||
======= | ||
# Create a left-aligned container for the tab buttons | ||
col1, col2 = st.columns([1, 4]) | ||
>>>>>>> Stashed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve conflict: choose segmented buttons over Streamlit tabs
Per PR summary, primary navigation moved to segmented buttons. Remove the old st.tabs() block and keep the segmented-control layout.
-<<<<<<< Updated upstream
- # Document Preview Section (after processing)
- with st.expander("📄 Document Preview", expanded=False):
- ...
- # Primary interface tabs for analysis and interaction
- main_tabs = st.tabs([
- "📊 X-Ray Analysis",
- "💬 Chat"
- ])
-=======
- # Create a left-aligned container for the tab buttons
- col1, col2 = st.columns([1, 4])
->>>>>>> Stashed changes
+ # Segmented buttons for primary nav (Analysis vs Chat)
+ col1, col2 = st.columns([1, 4])
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Expected a statement
552-552: SyntaxError: Simple statements must be separated by newlines or semicolons
554-554: SyntaxError: Unexpected indentation
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-598: SyntaxError: Expected a statement
598-599: SyntaxError: Expected a statement
600-600: SyntaxError: Unexpected indentation
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Expected a statement
601-601: SyntaxError: Simple statements must be separated by newlines or semicolons
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 552 to 601, resolve the merge
conflict by removing the old Streamlit tabs block and any conflict markers; keep
the segmented-control layout (the col1, col2 = st.columns([1, 4]) section) as
the primary navigation. Delete the entire st.tabs(...) block and the surrounding
conflict markers (<<<<<<<, =======, >>>>>>>) so only the segmented buttons
layout remains and follow-up code uses that layout for navigation.
with tabs[0]: | ||
st.subheader("🔍 Raw JSON Data") | ||
st.json(xray) | ||
|
||
with tabs[1]: | ||
st.subheader("📝 Narrative Summary") | ||
# Extract and display narrative content from document chunks | ||
narratives = [] | ||
if "documentPages" in xray: | ||
for page in xray["documentPages"]: | ||
if "chunks" in page: | ||
for chunk in page["chunks"]: | ||
if "narrative" in chunk and chunk["narrative"]: | ||
narratives.extend(chunk["narrative"]) | ||
|
||
if narratives: | ||
for i, narrative in enumerate(narratives, 1): | ||
st.markdown(f"**Narrative {i}:**") | ||
st.markdown(narrative) | ||
st.divider() | ||
else: | ||
st.info("No narrative text found in the X-Ray data") | ||
with tabs[1]: | ||
st.subheader("📝 Narrative Summary") | ||
<<<<<<< Updated upstream | ||
# Extract and display narrative content from document chunks | ||
======= | ||
>>>>>>> Stashed changes | ||
narratives = [] | ||
if "documentPages" in xray: | ||
for page in xray["documentPages"]: | ||
if "chunks" in page: | ||
for chunk in page["chunks"]: | ||
if "narrative" in chunk and chunk["narrative"]: | ||
narratives.extend(chunk["narrative"]) | ||
|
||
if narratives: | ||
for i, narrative in enumerate(narratives, 1): | ||
st.markdown(f"**Narrative {i}:**") | ||
st.markdown(narrative) | ||
st.divider() | ||
else: | ||
st.info("No narrative text found in the X-Ray data") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove remaining conflict markers in analysis sub-tabs
Only comments differ between versions; keep the code as-is and drop the markers.
-<<<<<<< Updated upstream
- # Extract and display narrative content from document chunks
-=======
->>>>>>> Stashed changes
-<<<<<<< Updated upstream
- # Extract and display suggested text content from document chunks
-=======
->>>>>>> Stashed changes
-<<<<<<< Updated upstream
- # Extract and display raw text content from document chunks
-=======
->>>>>>> Stashed changes
Also applies to: 770-791, 792-811
🧰 Tools
🪛 Ruff (0.12.2)
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Expected a statement
742-742: SyntaxError: Simple statements must be separated by newlines or semicolons
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-744: SyntaxError: Expected a statement
744-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Expected a statement
745-745: SyntaxError: Simple statements must be separated by newlines or semicolons
746-746: SyntaxError: Unexpected indentation
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 736 to 761 (and also at ranges
770-791 and 792-811), there are leftover Git conflict markers (<<<<<<<, =======,
>>>>>>>) in the analysis sub-tabs; remove those markers and retain the existing
code as-is (keep the narrative extraction/display logic unchanged), ensuring no
extra whitespace or commented markers remain and the file compiles/runs cleanly.
Summary by CodeRabbit
New Features
UI/UX
Behavior Changes