Add XML file ingestion support#560
Merged
KaifAhmad1 merged 3 commits intoMay 19, 2026
Merged
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
…n keys - Add test_xml_ingestor_ingests_string to cover the public ingest_string() method which had no test coverage - Document all source_type return keys in the ingest() docstring so callers know to use result["xml"] rather than result["data"] for XML sources
KaifAhmad1
approved these changes
May 19, 2026
Contributor
KaifAhmad1
left a comment
There was a problem hiding this comment.
PR #560 — XML File Ingestion Support
Author: @Luffy2208 | Reviewer: @KaifAhmad1 | Closes: #233
What It Does
- Adds
XMLIngestorandXMLIngestionDatafor parsing local XML files into structured data - Extracts nested element tree, flat element list, namespaces, attributes, and document metadata
- Optional XSD and DTD validation with detailed error reports
- Wired into the public API via
ingest_xml(),ingest_file(..., method="xml"), andingest("file.xml")auto-detection
Issues Found & Fixed
- Missing
ingest_stringtest — public method had no coverage; test added inbc6c443 ingest()return key undocumented — XML returnsresult["xml"], notresult["data"]; all return keys documented inbc6c443- Changelog missing —
[Unreleased]entry added ina397fb5
Security
resolve_entities=Falseandno_network=Trueby default — blocks XXE injectionhuge_treeandallow_networkare explicit opt-ins
Tests
- 8/8 XML tests pass (7 original + 1 added)
- 18/18 existing ingest tests pass — no regressions
Verdict
Approved. Follows project conventions, secure by default. All review issues resolved on the PR branch before merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds dedicated XML file ingestion support for Semantica.
This PR introduces a new
XMLIngestorthat parses local XML files into structured data instead of treating them only as plain text. It extracts nested element hierarchy, a flat element list, namespaces, attributes, document metadata, and optional validation results.XML ingestion is also wired into the public ingest API through:
ingest_xml()ingest_file(..., method="xml").xmlauto-detection viaingest("file.xml")Type of Change
Related Issues
Closes #233
Fixes #233
Changes Made
Added
semantica/ingest/xml_ingestor.pywith:XMLIngestorXMLIngestionDataImplemented XML parsing with:
Added optional:
Added malformed XML handling using:
ProcessingErrorValidationErrorAdded:
ingest_xml()convenience methodingest()Registered XML ingestion methods in the ingest method registry.
Exported:
XMLIngestorXMLIngestionDataingest_xmlfrom
semantica.ingestAdded XML ingestion tests covering:
Updated ingest documentation and usage examples for XML ingestion.
Testing
python -m build)Live testing
Test Commands