Skip to content

feat: sitemapextractor #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 58 commits into from
Jun 11, 2025
Merged

feat: sitemapextractor #14

merged 58 commits into from
Jun 11, 2025

Conversation

a-klos
Copy link
Member

@a-klos a-klos commented Jun 3, 2025

This pull request introduces significant updates to the admin-api-lib and extractor-api-lib documentation and OpenAPI specification. The changes include renaming endpoints and classes for improved clarity, expanding functionality to support non-file sources, and updating the OpenAPI specification to version 3.1.0. Below is a categorized summary of the most important changes:

Endpoint and Class Renaming

  • Renamed /upload_documents to /upload_file for file uploads and /load_confluence to /upload_source for non-file source uploads. Updated descriptions to reflect the changes. [1] [2]
  • Updated class names in README.md to align with the new endpoint names, e.g., document_uploader to file_uploader and confluence_loader to source_uploader.

Support for Non-File Sources

  • Replaced /extract_from_confluence with /extract_from_source to generalize data extraction for non-file sources. Updated descriptions to clarify the types of sources and data supported. [1] [2]
  • Introduced a new general_source_extractor class to handle various non-file sources, including Confluence, with appropriate extractor selection logic.

OpenAPI Specification Update

  • Upgraded OpenAPI version from 3.0.2 to 3.1.0.
  • Added detailed schemas for request and response bodies, including DocumentStatus, KeyValuePair, and ValidationError objects.
  • Updated endpoint paths and descriptions, including /upload_file, /upload_source, /delete_document, and /document_reference. Enhanced error handling with additional response codes (e.g., 422 for validation errors).

These changes improve the API's clarity, flexibility, and compliance with modern OpenAPI standards.

a-klos and others added 22 commits June 2, 2025 08:16
…e unused managed_page_summary_enhancer module
…r DefaultSourceUploader to use it

refactor: update JSON serialization in ExtractionParameters, ExtractionRequest, InformationPiece, and KeyValuePair models
refactor: remove unused test files for confluence and thread management integration
@a-klos a-klos marked this pull request as ready for review June 5, 2025 09:13
@a-klos a-klos requested a review from MirUlr June 11, 2025 05:35
@a-klos a-klos merged commit 4038bea into main Jun 11, 2025
6 checks passed
@a-klos a-klos deleted the feat/sitemapextractor branch June 11, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants