-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Adding RDF ingestion capabilities #15473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…y value assignments Added logic to defer value assignments for structured properties until after all entities are processed. This ensures that definitions are committed before validating value assignments, improving the integrity of the ingestion process.
Introduced a new dependency management system for entity processing, allowing entities to be processed in a topological order based on their dependencies. This change includes the addition of a `dependencies` field in the `EntityMetadata` class and updates to the `EntityRegistry` to support this new ordering mechanism. The processing order now respects entity dependencies, improving the integrity and reliability of the ingestion process.
…a sources This commit introduces a detailed capability summary JSON file that outlines the capabilities, descriptions, and support statuses for multiple data sources, including ABS, Athena, Azure AD, BigQuery, and many others. The summary is generated by the metadata ingestion script and includes information on features such as deletion detection, lineage support, and data profiling. This enhancement aims to improve the clarity and accessibility of data source capabilities within the ingestion framework.
This commit removes the DomainMCPBuilder as domains are now treated solely as data structures for organizing glossary terms, rather than being ingested as DataHub domain entities. Updates to documentation and comments throughout the codebase clarify that domains are not ingested and are only used to create glossary nodes and terms. Additionally, adjustments were made in the ingestion target to skip domain MCP creation, ensuring a clearer understanding of the domain's role in the ingestion process.
…files This commit deletes several autogenerated files related to capability summaries and lineage data, including capability_summary.json files from multiple directories and lineage_helper.py. These files are no longer needed as part of the ingestion process, streamlining the codebase and reducing clutter. The removal of these files is part of an effort to simplify the ingestion framework and improve maintainability.
This commit removes the QueryFactory and associated query classes from the RDF ingestion source, simplifying the architecture by eliminating unused query capabilities. Additionally, the Orchestrator class has been updated to remove query-related dependencies, focusing solely on the source and target interfaces. The export_targets.py file has also been deleted as it was no longer necessary. This refactor streamlines the ingestion process and enhances maintainability.
This commit deletes the rdf_README.md and SHACL_MIGRATION_GUIDE.md files from the RDF ingestion source. These files are no longer necessary, streamlining the documentation and focusing on essential components of the ingestion framework.
This commit corrects the links in the RDF specification documentation to point to the appropriate entity-specific specification files. The changes ensure that references to the glossary term, relationship, and domain specifications are accurate and accessible. Additionally, the SHACL Migration Guide section has been removed from the README to streamline the documentation.
…ting This commit removes redundant RDF imports from the constants file and updates the tooltip formatting for the RDF_SOURCE field in the rdf.ts file to enhance readability. The changes improve code organization and maintainability.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This commit updates type annotations across various RDF-related classes and methods to improve type safety and clarity. Additionally, it enhances error handling by adding warnings when extractors or converters are not found, ensuring better debugging and maintainability. The changes also include minor adjustments to method signatures for consistency.
This commit refactors the RDF ingestion code by moving the TargetInterface to the orchestrator module and removing the obsolete target_factory module. Additionally, it simplifies the DataHubDomain and RDFGlossaryTerm classes by removing unnecessary properties, and updates tests to reflect these changes. The relationship extraction logic is also streamlined to only support BROADER and NARROWER types, enhancing clarity and maintainability.
Bundle ReportChanges will increase total bundle size by 3.42kB (0.01%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
Files in
|
No description provided.