Skip to content

Conversation

namanvirk18
Copy link
Contributor

@namanvirk18 namanvirk18 commented Sep 5, 2025

Summary by CodeRabbit

  • New Features

    • Inline PDF and image previews in the upload flow, with file metadata display.
    • Embedded document preview and expanded analysis results (summary, sample content, keywords, extracted text).
  • UI/UX

    • Streamlined header and redesigned tab navigation (analysis/chat) with segmented control.
    • Richer analysis sections: JSON Output, Narrative Summary, File Summary, Suggested Text, Extracted Text, Keywords.
    • Two-column processing steps with automatic switch to analysis on completion.
    • Persistent chat history and focused chat view.
    • Updated branding and styling for tabs, previews, buttons, and layout.
  • Behavior Changes

    • Progress bar replaced with a spinner during processing.

Copy link
Contributor

coderabbitai bot commented Sep 5, 2025

Walkthrough

Introduces PDF and image preview in the upload flow, restructures UI with an active_tab state and segmented tabs, expands analysis displays and chat handling, and replaces granular progress tracking with a spinner-based polling loop. Adds display_pdf(file) and updates processing/preview/analysis sequences and styling.

Changes

Cohort / File(s) Summary
UI overhaul and previews
groundX-doc-pipeline/app.py
Added display_pdf(file) using base64 and iframe; integrated PDF/image previews and file metadata in upload flow; introduced active_tab session state and segmented tab UI; reworked processing steps with auto-tab switch; expanded analysis sections (document preview, summaries, extracted text, keywords, JSON); strengthened chat flow with context and history; updated branding and CSS.
Processing status polling simplification
groundX-doc-pipeline/groundx_utils.py
Replaced detailed progress parsing with st.spinner-based loop polling gx.documents.get_processing_status_by_id(...).ingest until terminal states or timeout; preserved timeout; now raises RuntimeError if not complete; no public signatures changed.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant UI as App UI (app.py)
  participant GX as Ground X API
  participant Utils as groundx_utils.poll_until_complete

  U->>UI: Upload file
  alt application/pdf
    UI->>UI: display_pdf(file) via base64 iframe
  else image/*
    UI->>UI: Render image preview
  else docx/other
    UI->>UI: Show "preview after processing" notice
  end
  U->>UI: Click "Process"
  UI->>GX: Start processing (create document)
  UI->>Utils: poll_until_complete(process_id)
  activate Utils
  Utils->>GX: get_processing_status_by_id(...).ingest (poll)
  GX-->>Utils: status (processing|complete|error|cancelled)
  loop until terminal or timeout
    Utils->>GX: poll status
    GX-->>Utils: status
  end
  Utils-->>UI: completion or raise error
  deactivate Utils
  alt complete
    UI->>GX: Fetch X-Ray data
    UI->>UI: Switch active_tab -> analysis
    UI->>UI: Render Analysis tabs (JSON, Summary, File Summary, Extracted Text, Keywords)
    UI->>UI: Embedded document preview and sample content
  else error/cancelled/timeout
    UI->>UI: Show error message
  end

  U->>UI: Open Chat tab
  UI->>UI: prepare_chat_context(xray, prompt)
  UI->>UI: generate_chat_response(prompt, context)
  UI-->>U: Stream/Show response (chat history maintained)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I nibbled bytes and flipped a tab,
A spinner twirled—no progress drab.
I framed a PDF with base64 flair,
Preview here, analysis there.
Chat squeaks wise with context tight—
Hop-hop! Your docs are clear in sight.
🐇📄✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
groundX-doc-pipeline/groundx_utils.py (1)

70-83: Add HTTP client timeouts when fetching X-Ray JSON

External GETs lack timeouts; this can hang the app indefinitely.

-        response = requests.get(document.xray_url)
+        response = requests.get(document.xray_url, timeout=15)
-                response = requests.get(doc.xray_url)
+                response = requests.get(doc.xray_url, timeout=15)

Also applies to: 92-105

🧹 Nitpick comments (7)
groundX-doc-pipeline/groundx_utils.py (3)

52-66: Use monotonic clock and configurable poll interval; keep UX spinner

Loop works, but timeouts should use time.monotonic and a poll_interval param to tune cadence. Also tolerate brief API hiccups without breaking the spinner.

-def poll_until_complete(gx: GroundX, process_id: str, timeout: int = 600) -> None:
+def poll_until_complete(gx: GroundX, process_id: str, timeout: int = 600, poll_interval: float = 3.0) -> None:
     """Monitor document processing status until completion"""
-    start_time = time.time()
+    start_time = time.monotonic()
     
     # Use a spinner container for better UX
     with st.spinner("Processing document..."):
         while True:
-            status = gx.documents.get_processing_status_by_id(process_id=process_id).ingest
+            try:
+                status = gx.documents.get_processing_status_by_id(process_id=process_id).ingest
+            except Exception as e:
+                # brief backoff on transient errors
+                if time.monotonic() - start_time > timeout:
+                    raise TimeoutError("Ground X ingest timed out.") from e
+                time.sleep(min(1.0, poll_interval))
+                continue
             
             if status.status in {"complete", "error", "cancelled"}:
                 break
-            if time.time() - start_time > timeout:
+            if time.monotonic() - start_time > timeout:
                 raise TimeoutError("Ground X ingest timed out.")
-            time.sleep(3)
+            time.sleep(poll_interval)

25-35: Fix return type hint for ensure_bucket (actual id is int)

Function returns bucket.bucket_id which appears to be an int. Align the annotation (or use int | str) to avoid downstream confusion.

-@st.cache_resource(show_spinner=False)
-def ensure_bucket(_gx: GroundX, name: str = "gx_demo") -> str:
+@st.cache_resource(show_spinner=False)
+def ensure_bucket(_gx: GroundX, name: str = "gx_demo") -> int:

36-51: Align bucket_id typing across helpers

ingest_document accepts Union[str,int] at runtime; reflect this in type hints for clarity.

-def ingest_document(gx: GroundX, bucket_id: str, path: Path, mime: str) -> str:
+from typing import Union
+
+def ingest_document(gx: GroundX, bucket_id: Union[str, int], path: Path, mime: str) -> str:
groundX-doc-pipeline/app.py (4)

820-838: Remove unused in_chat_mode flag

in_chat_mode is set but never read; dead state.

-            # Ensure we stay in chat mode
-            st.session_state.in_chat_mode = True

47-246: Reduce CSS duplication and risky negative margins

Large repeated blocks with aggressive overrides/negative margins make the layout brittle and harder to maintain. Consolidate shared button/column styles into a single CSS block and avoid overlapping z-index hacks unless needed.


503-538: Minor: collapse upload status steps into a single status container

UX copy looks good. Consider wrapping step messages in a single st.status or st.container to avoid jitter.


482-486: Clean up temp files after processing

NamedTemporaryFile(delete=False) leaves files behind. After successful processing, unlink the file unless you intentionally keep it for re-processing.

-    st.session_state.uploaded_file_path = tmp_file.name
+    st.session_state.uploaded_file_path = tmp_file.name
+    # TODO: after processing completes, consider: Path(tmp_file.name).unlink(missing_ok=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e597969 and 1fc4085.

📒 Files selected for processing (2)
  • groundX-doc-pipeline/app.py (9 hunks)
  • groundX-doc-pipeline/groundx_utils.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
groundX-doc-pipeline/groundx_utils.py (1)
groundX-doc-pipeline/evaluation_geval.py (2)
  • _poll_until_complete (129-139)
  • process_invoice (117-127)
groundX-doc-pipeline/app.py (2)
groundX-doc-pipeline/groundx_utils.py (1)
  • process_document (107-128)
firecrawl-agent/app.py (1)
  • display_pdf (60-71)
🪛 Ruff (0.12.2)
groundX-doc-pipeline/groundx_utils.py

64-64: Avoid specifying long messages outside the exception class

(TRY003)

groundX-doc-pipeline/app.py

262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Simple statements must be separated by newlines or semicolons


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Simple statements must be separated by newlines or semicolons


267-267: SyntaxError: Unexpected indentation


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Simple statements must be separated by newlines or semicolons


274-274: SyntaxError: Unexpected indentation


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-278: SyntaxError: Expected a statement


278-278: SyntaxError: Unexpected indentation


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Simple statements must be separated by newlines or semicolons


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Simple statements must be separated by newlines or semicolons


407-407: SyntaxError: Unexpected indentation


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-435: SyntaxError: Expected a statement


435-435: SyntaxError: Unexpected indentation


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Simple statements must be separated by newlines or semicolons


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Simple statements must be separated by newlines or semicolons


554-554: SyntaxError: Unexpected indentation


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-599: SyntaxError: Expected a statement


600-600: SyntaxError: Unexpected indentation


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Simple statements must be separated by newlines or semicolons


603-603: SyntaxError: Unexpected indentation


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Simple statements must be separated by newlines or semicolons


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Simple statements must be separated by newlines or semicolons


746-746: SyntaxError: Unexpected indentation


762-762: SyntaxError: unindent does not match any outer indentation level


770-770: SyntaxError: unindent does not match any outer indentation level


772-772: SyntaxError: Expected a statement


772-772: SyntaxError: Expected a statement


772-772: SyntaxError: Expected a statement


772-772: SyntaxError: Expected a statement


772-772: SyntaxError: Simple statements must be separated by newlines or semicolons


774-774: SyntaxError: Expected a statement


774-774: SyntaxError: Expected a statement


774-774: SyntaxError: Expected a statement


774-774: SyntaxError: Expected a statement


774-775: SyntaxError: Expected a statement


775-775: SyntaxError: Expected a statement


775-775: SyntaxError: Expected a statement


775-775: SyntaxError: Expected a statement


775-775: SyntaxError: Expected a statement


775-775: SyntaxError: Simple statements must be separated by newlines or semicolons


776-776: SyntaxError: Unexpected indentation


792-792: SyntaxError: unindent does not match any outer indentation level


794-794: SyntaxError: Expected a statement


794-794: SyntaxError: Expected a statement


794-794: SyntaxError: Expected a statement


794-794: SyntaxError: Expected a statement


794-794: SyntaxError: Simple statements must be separated by newlines or semicolons


796-796: SyntaxError: Expected a statement


796-796: SyntaxError: Expected a statement


796-796: SyntaxError: Expected a statement


796-796: SyntaxError: Expected a statement


796-797: SyntaxError: Expected a statement


797-797: SyntaxError: Expected a statement


797-797: SyntaxError: Expected a statement


797-797: SyntaxError: Expected a statement


797-797: SyntaxError: Expected a statement


797-797: SyntaxError: Simple statements must be separated by newlines or semicolons


798-798: SyntaxError: Unexpected indentation


812-812: SyntaxError: unindent does not match any outer indentation level


820-820: SyntaxError: unindent does not match any outer indentation level


820-820: SyntaxError: Invalid annotated assignment target


820-821: SyntaxError: Expected an expression


821-821: SyntaxError: Unexpected indentation


835-835: SyntaxError: Expected a statement


835-835: SyntaxError: Expected a statement


835-835: SyntaxError: Expected a statement


835-835: SyntaxError: Expected a statement


835-835: SyntaxError: Simple statements must be separated by newlines or semicolons


837-837: SyntaxError: Unexpected indentation


840-840: SyntaxError: Expected a statement


840-840: SyntaxError: Expected a statement


840-840: SyntaxError: Expected a statement


840-840: SyntaxError: Expected a statement


840-841: SyntaxError: Expected a statement


841-841: SyntaxError: Expected a statement


841-841: SyntaxError: Expected a statement


841-841: SyntaxError: Expected a statement


841-841: SyntaxError: Expected a statement


841-841: SyntaxError: Simple statements must be separated by newlines or semicolons


857-857: SyntaxError: Expected a statement

Comment on lines +258 to 281
def display_pdf(file):
"""Display PDF preview using embedded iframe"""
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")
<<<<<<< Updated upstream

# Embedding PDF in HTML
=======
>>>>>>> Stashed changes
pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
style="border: 1px solid #ddd; border-radius: 8px;"
>
</iframe>"""
<<<<<<< Updated upstream

# Displaying File
st.markdown(pdf_display, unsafe_allow_html=True)

# Chat Interface Functions
=======
st.markdown(pdf_display, unsafe_allow_html=True)

>>>>>>> Stashed changes
def prepare_chat_context(xray_data, prompt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resolve merge conflict in display_pdf and reset file pointer

Unresolved conflict markers will crash the app. Also reset the file object after reading for downstream use.

-def display_pdf(file):
-    """Display PDF preview using embedded iframe"""
-    st.markdown("### PDF Preview")
-    base64_pdf = base64.b64encode(file.read()).decode("utf-8")
-<<<<<<< Updated upstream
-    
-    # Embedding PDF in HTML
-=======
->>>>>>> Stashed changes
-    pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
-                        style="border: 1px solid #ddd; border-radius: 8px;"
-                    >
-                    </iframe>"""
-<<<<<<< Updated upstream
-    
-    # Displaying File
-    st.markdown(pdf_display, unsafe_allow_html=True)
-
-=======
-    st.markdown(pdf_display, unsafe_allow_html=True)
-
->>>>>>> Stashed changes
+def display_pdf(file):
+    """Display PDF preview using embedded iframe"""
+    st.markdown("### PDF Preview")
+    data = file.getvalue() if hasattr(file, "getvalue") else file.read()
+    base64_pdf = base64.b64encode(data).decode("utf-8")
+    pdf_display = f'''<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
+                        style="border: 1px solid #ddd; border-radius: 8px;"></iframe>'''
+    st.markdown(pdf_display, unsafe_allow_html=True)
+    if hasattr(file, "seek"):
+        try:
+            file.seek(0)
+        except Exception:
+            pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def display_pdf(file):
"""Display PDF preview using embedded iframe"""
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")
<<<<<<< Updated upstream
# Embedding PDF in HTML
=======
>>>>>>> Stashed changes
pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
style="border: 1px solid #ddd; border-radius: 8px;"
>
</iframe>"""
<<<<<<< Updated upstream
# Displaying File
st.markdown(pdf_display, unsafe_allow_html=True)
# Chat Interface Functions
=======
st.markdown(pdf_display, unsafe_allow_html=True)
>>>>>>> Stashed changes
def prepare_chat_context(xray_data, prompt):
def display_pdf(file):
"""Display PDF preview using embedded iframe"""
st.markdown("### PDF Preview")
# Read bytes, accommodating both fastAPI-style InMemoryUploadedFile and standard file-like objects
data = file.getvalue() if hasattr(file, "getvalue") else file.read()
base64_pdf = base64.b64encode(data).decode("utf-8")
pdf_display = f'''<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="400" type="application/pdf"
style="border: 1px solid #ddd; border-radius: 8px;"></iframe>'''
st.markdown(pdf_display, unsafe_allow_html=True)
# Reset pointer so the file can be re-read later
if hasattr(file, "seek"):
try:
file.seek(0)
except Exception:
pass
def prepare_chat_context(xray_data, prompt):
🧰 Tools
🪛 Ruff (0.12.2)

262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Expected a statement


262-262: SyntaxError: Simple statements must be separated by newlines or semicolons


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-265: SyntaxError: Expected a statement


265-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Expected a statement


266-266: SyntaxError: Simple statements must be separated by newlines or semicolons


267-267: SyntaxError: Unexpected indentation


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Expected a statement


271-271: SyntaxError: Simple statements must be separated by newlines or semicolons


274-274: SyntaxError: Unexpected indentation


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-277: SyntaxError: Expected a statement


277-278: SyntaxError: Expected a statement


278-278: SyntaxError: Unexpected indentation


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Expected a statement


280-280: SyntaxError: Simple statements must be separated by newlines or semicolons

🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 258 to 281, remove the unresolved
Git conflict markers and duplicate lines in display_pdf, keep a single coherent
implementation that builds the base64 PDF iframe and calls st.markdown once;
after reading the file to create base64_pdf call file.seek(0) to reset the file
pointer so downstream code can reuse the file, and ensure there are no leftover
"<<<<<<<", "=======" or ">>>>>>>" strings.

Comment on lines +354 to 357
for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]:
if key not in st.session_state:
st.session_state[key] = None if key == "xray_data" else False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Initialize session_state keys with correct types

Current defaults set strings/paths to False, but later code checks for None (e.g., Line 852). This causes wrong branch selection and UX issues.

-for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]:
-    if key not in st.session_state:
-        st.session_state[key] = None if key == "xray_data" else False
+defaults = {
+    "xray_data": None,
+    "uploaded_file_path": None,
+    "uploaded_file_name": None,
+    "uploaded_file_type": None,
+    "processing_complete": False,
+    "used_existing_file": False,
+    "auto_loaded_file": False,
+    "active_tab": None,
+}
+for k, v in defaults.items():
+    if k not in st.session_state:
+        st.session_state[k] = v
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for key in ["xray_data", "uploaded_file_path", "uploaded_file_name", "uploaded_file_type", "processing_complete", "used_existing_file", "auto_loaded_file", "active_tab"]:
if key not in st.session_state:
st.session_state[key] = None if key == "xray_data" else False
defaults = {
"xray_data": None,
"uploaded_file_path": None,
"uploaded_file_name": None,
"uploaded_file_type": None,
"processing_complete": False,
"used_existing_file": False,
"auto_loaded_file": False,
"active_tab": None,
}
for k, v in defaults.items():
if k not in st.session_state:
st.session_state[k] = v
🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 354 to 357, the session_state
initialization sets string/path keys to False which later breaks None checks;
change the defaults so xray_data, uploaded_file_path, uploaded_file_name,
uploaded_file_type, and active_tab are initialized to None, while
processing_complete, used_existing_file, and auto_loaded_file remain initialized
to False so boolean flags keep correct types and downstream None checks behave
as expected.

Comment on lines +405 to +448
<<<<<<< Updated upstream
# Document Preview Section
st.markdown("---")
st.markdown("### 📄 Document Preview")

# Show preview based on file type
if uploaded.type == "application/pdf":
# For PDF files, show the actual PDF preview using iframe
display_pdf(uploaded)

elif uploaded.type.startswith("image/"):
# For image files, show the actual image
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)

elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
# For DOCX files
st.info("📝 **Word Document** - Preview will be available after processing")
st.markdown(f"**Content**: Text extraction in progress...")

else:
# For other file types
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")

# Show file metadata
st.markdown("**File Details:**")
st.markdown(f"- **Name**: {uploaded.name}")
st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB")
st.markdown(f"- **Type**: {uploaded.type}")
st.markdown(f"- **Status**: Ready for processing")
=======
st.markdown("---")
st.markdown("### 📄 Document Preview")

if uploaded.type == "application/pdf":
display_pdf(uploaded)
elif uploaded.type.startswith("image/"):
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
st.info("📝 **Word Document** - Preview will be available after processing")
st.markdown(f"**Content**: Text extraction in progress...")
else:
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")

>>>>>>> Stashed changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resolve merge conflict in Document Preview block (keep metadata + preview)

Conflict markers remain and will break execution. Recommend keeping the richer version with file metadata.

-<<<<<<< Updated upstream
-        # Document Preview Section
-        st.markdown("---")
-        st.markdown("### 📄 Document Preview")
-        
-        # Show preview based on file type
-        if uploaded.type == "application/pdf":
-            # For PDF files, show the actual PDF preview using iframe
-            display_pdf(uploaded)
-            
-        elif uploaded.type.startswith("image/"):
-            # For image files, show the actual image
-            st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
-            
-        elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
-            # For DOCX files
-            st.info("📝 **Word Document** - Preview will be available after processing")
-            st.markdown(f"**Content**: Text extraction in progress...")
-            
-        else:
-            # For other file types
-            st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
-        
-        # Show file metadata
-        st.markdown("**File Details:**")
-        st.markdown(f"- **Name**: {uploaded.name}")
-        st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB")
-        st.markdown(f"- **Type**: {uploaded.type}")
-        st.markdown(f"- **Status**: Ready for processing")
-=======
-        st.markdown("---")
-        st.markdown("### 📄 Document Preview")
-        
-        if uploaded.type == "application/pdf":
-            display_pdf(uploaded)
-        elif uploaded.type.startswith("image/"):
-            st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
-        elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
-            st.info("📝 **Word Document** - Preview will be available after processing")
-            st.markdown(f"**Content**: Text extraction in progress...")
-        else:
-            st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
-        
->>>>>>> Stashed changes
+        st.markdown("---")
+        st.markdown("### 📄 Document Preview")
+        if uploaded.type == "application/pdf":
+            display_pdf(uploaded)
+        elif uploaded.type.startswith("image/"):
+            st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
+        elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
+            st.info("📝 Word Document — preview will be available after processing")
+            st.markdown("Content: text extraction in progress…")
+        else:
+            st.info(f"📄 {uploaded.type} — preview will be available after processing")
+        st.markdown("**File Details:**")
+        st.markdown(f"- Name: {uploaded.name}")
+        st.markdown(f"- Size: {uploaded.size / 1024:.1f} KB")
+        st.markdown(f"- Type: {uploaded.type}")
+        st.markdown("- Status: Ready for processing")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<<<<<<< Updated upstream
# Document Preview Section
st.markdown("---")
st.markdown("### 📄 Document Preview")
# Show preview based on file type
if uploaded.type == "application/pdf":
# For PDF files, show the actual PDF preview using iframe
display_pdf(uploaded)
elif uploaded.type.startswith("image/"):
# For image files, show the actual image
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
# For DOCX files
st.info("📝 **Word Document** - Preview will be available after processing")
st.markdown(f"**Content**: Text extraction in progress...")
else:
# For other file types
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
# Show file metadata
st.markdown("**File Details:**")
st.markdown(f"- **Name**: {uploaded.name}")
st.markdown(f"- **Size**: {uploaded.size / 1024:.1f} KB")
st.markdown(f"- **Type**: {uploaded.type}")
st.markdown(f"- **Status**: Ready for processing")
=======
st.markdown("---")
st.markdown("### 📄 Document Preview")
if uploaded.type == "application/pdf":
display_pdf(uploaded)
elif uploaded.type.startswith("image/"):
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
st.info("📝 **Word Document** - Preview will be available after processing")
st.markdown(f"**Content**: Text extraction in progress...")
else:
st.info(f"📄 **{uploaded.type}** - Preview will be available after processing")
>>>>>>> Stashed changes
st.markdown("---")
st.markdown("### 📄 Document Preview")
if uploaded.type == "application/pdf":
display_pdf(uploaded)
elif uploaded.type.startswith("image/"):
st.image(uploaded, caption=f"Preview: {uploaded.name}", use_column_width=True)
elif uploaded.type == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
st.info("📝 Word Document — preview will be available after processing")
st.markdown("Content: text extraction in progress…")
else:
st.info(f"📄 {uploaded.type} — preview will be available after processing")
st.markdown("**File Details:**")
st.markdown(f"- Name: {uploaded.name}")
st.markdown(f"- Size: {uploaded.size / 1024:.1f} KB")
st.markdown(f"- Type: {uploaded.type}")
st.markdown("- Status: Ready for processing")
🧰 Tools
🪛 Ruff (0.12.2)

405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Expected a statement


405-405: SyntaxError: Simple statements must be separated by newlines or semicolons


407-407: SyntaxError: Unexpected indentation


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-434: SyntaxError: Expected a statement


434-435: SyntaxError: Expected a statement


435-435: SyntaxError: Unexpected indentation


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Expected a statement


448-448: SyntaxError: Simple statements must be separated by newlines or semicolons

🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 405 to 448, remove the leftover Git
conflict markers and merge the two variants so the richer block is kept: keep
the "Document Preview" heading, the conditional preview rendering for PDF,
images and DOCX, the fallback info message, and also retain the file metadata
lines (Name, Size, Type, Status). Replace the conflict markers (<<<<<<<,
=======, >>>>>>>) with a single coherent block that includes both the preview
logic and the metadata st.markdown lines.

Comment on lines +552 to +601
<<<<<<< Updated upstream
# Document Preview Section (after processing)
with st.expander("📄 Document Preview", expanded=False):
st.markdown("### 📋 Document Summary")
file_summary = xray.get("fileSummary")
if file_summary:
st.markdown(file_summary)
else:
st.info("No summary available")

st.markdown("### 📝 Sample Content")
# Show first few chunks of extracted text
if "documentPages" in xray and xray["documentPages"]:
sample_texts = []
for page in xray["documentPages"][:2]: # First 2 pages
if "chunks" in page:
for chunk in page["chunks"][:2]: # First 2 chunks per page
if "text" in chunk and chunk["text"]:
text = chunk["text"]
if len(text) > 200:
text = text[:200] + "..."
sample_texts.append(text)

if sample_texts:
for i, text in enumerate(sample_texts, 1):
st.markdown(f"**Sample {i}:**")
st.markdown(text)
st.markdown("---")
else:
st.info("No text content available for preview")

st.markdown("### 🏷️ Key Topics")
if xray.get("fileKeywords"):
keywords_list = xray["fileKeywords"].split(",")
# Show first 10 keywords
display_keywords = keywords_list[:10]
keyword_tags = " ".join([f"`{kw.strip()}`" for kw in display_keywords])
st.markdown(keyword_tags)
else:
st.info("No keywords available")

# Primary interface tabs for analysis and interaction
main_tabs = st.tabs([
"📊 X-Ray Analysis",
"💬 Chat"
])
=======
# Create a left-aligned container for the tab buttons
col1, col2 = st.columns([1, 4])
>>>>>>> Stashed changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resolve conflict: choose segmented buttons over Streamlit tabs

Per PR summary, primary navigation moved to segmented buttons. Remove the old st.tabs() block and keep the segmented-control layout.

-<<<<<<< Updated upstream
-    # Document Preview Section (after processing)
-    with st.expander("📄 Document Preview", expanded=False):
-        ...
-    # Primary interface tabs for analysis and interaction
-    main_tabs = st.tabs([
-        "📊 X-Ray Analysis",
-        "💬 Chat"
-    ])
-=======
-    # Create a left-aligned container for the tab buttons
-    col1, col2 = st.columns([1, 4])
->>>>>>> Stashed changes
+    # Segmented buttons for primary nav (Analysis vs Chat)
+    col1, col2 = st.columns([1, 4])

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.12.2)

552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Expected a statement


552-552: SyntaxError: Simple statements must be separated by newlines or semicolons


554-554: SyntaxError: Unexpected indentation


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-598: SyntaxError: Expected a statement


598-599: SyntaxError: Expected a statement


600-600: SyntaxError: Unexpected indentation


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Expected a statement


601-601: SyntaxError: Simple statements must be separated by newlines or semicolons

🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 552 to 601, resolve the merge
conflict by removing the old Streamlit tabs block and any conflict markers; keep
the segmented-control layout (the col1, col2 = st.columns([1, 4]) section) as
the primary navigation. Delete the entire st.tabs(...) block and the surrounding
conflict markers (<<<<<<<, =======, >>>>>>>) so only the segmented buttons
layout remains and follow-up code uses that layout for navigation.

Comment on lines +736 to 761
with tabs[0]:
st.subheader("🔍 Raw JSON Data")
st.json(xray)

with tabs[1]:
st.subheader("📝 Narrative Summary")
# Extract and display narrative content from document chunks
narratives = []
if "documentPages" in xray:
for page in xray["documentPages"]:
if "chunks" in page:
for chunk in page["chunks"]:
if "narrative" in chunk and chunk["narrative"]:
narratives.extend(chunk["narrative"])

if narratives:
for i, narrative in enumerate(narratives, 1):
st.markdown(f"**Narrative {i}:**")
st.markdown(narrative)
st.divider()
else:
st.info("No narrative text found in the X-Ray data")
with tabs[1]:
st.subheader("📝 Narrative Summary")
<<<<<<< Updated upstream
# Extract and display narrative content from document chunks
=======
>>>>>>> Stashed changes
narratives = []
if "documentPages" in xray:
for page in xray["documentPages"]:
if "chunks" in page:
for chunk in page["chunks"]:
if "narrative" in chunk and chunk["narrative"]:
narratives.extend(chunk["narrative"])

if narratives:
for i, narrative in enumerate(narratives, 1):
st.markdown(f"**Narrative {i}:**")
st.markdown(narrative)
st.divider()
else:
st.info("No narrative text found in the X-Ray data")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove remaining conflict markers in analysis sub-tabs

Only comments differ between versions; keep the code as-is and drop the markers.

-<<<<<<< Updated upstream
-            # Extract and display narrative content from document chunks
-=======
->>>>>>> Stashed changes
-<<<<<<< Updated upstream
-            # Extract and display suggested text content from document chunks
-=======
->>>>>>> Stashed changes
-<<<<<<< Updated upstream
-            # Extract and display raw text content from document chunks
-=======
->>>>>>> Stashed changes

Also applies to: 770-791, 792-811

🧰 Tools
🪛 Ruff (0.12.2)

742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Expected a statement


742-742: SyntaxError: Simple statements must be separated by newlines or semicolons


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-744: SyntaxError: Expected a statement


744-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Expected a statement


745-745: SyntaxError: Simple statements must be separated by newlines or semicolons


746-746: SyntaxError: Unexpected indentation

🤖 Prompt for AI Agents
In groundX-doc-pipeline/app.py around lines 736 to 761 (and also at ranges
770-791 and 792-811), there are leftover Git conflict markers (<<<<<<<, =======,
>>>>>>>) in the analysis sub-tabs; remove those markers and retain the existing
code as-is (keep the narrative extraction/display logic unchanged), ensuring no
extra whitespace or commented markers remain and the file compiles/runs cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant