Skip to content

Conversation

@stephengoldbaum
Copy link

No description provided.

…y value assignments

Added logic to defer value assignments for structured properties until after all entities are processed. This ensures that definitions are committed before validating value assignments, improving the integrity of the ingestion process.
Introduced a new dependency management system for entity processing, allowing entities to be processed in a topological order based on their dependencies. This change includes the addition of a `dependencies` field in the `EntityMetadata` class and updates to the `EntityRegistry` to support this new ordering mechanism. The processing order now respects entity dependencies, improving the integrity and reliability of the ingestion process.
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Dec 3, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Dec 3, 2025
…a sources

This commit introduces a detailed capability summary JSON file that outlines the capabilities, descriptions, and support statuses for multiple data sources, including ABS, Athena, Azure AD, BigQuery, and many others. The summary is generated by the metadata ingestion script and includes information on features such as deletion detection, lineage support, and data profiling. This enhancement aims to improve the clarity and accessibility of data source capabilities within the ingestion framework.
This commit removes the DomainMCPBuilder as domains are now treated solely as data structures for organizing glossary terms, rather than being ingested as DataHub domain entities. Updates to documentation and comments throughout the codebase clarify that domains are not ingested and are only used to create glossary nodes and terms. Additionally, adjustments were made in the ingestion target to skip domain MCP creation, ensuring a clearer understanding of the domain's role in the ingestion process.
…files

This commit deletes several autogenerated files related to capability summaries and lineage data, including capability_summary.json files from multiple directories and lineage_helper.py. These files are no longer needed as part of the ingestion process, streamlining the codebase and reducing clutter. The removal of these files is part of an effort to simplify the ingestion framework and improve maintainability.
This commit removes the QueryFactory and associated query classes from the RDF ingestion source, simplifying the architecture by eliminating unused query capabilities. Additionally, the Orchestrator class has been updated to remove query-related dependencies, focusing solely on the source and target interfaces. The export_targets.py file has also been deleted as it was no longer necessary. This refactor streamlines the ingestion process and enhances maintainability.
This commit deletes the rdf_README.md and SHACL_MIGRATION_GUIDE.md files from the RDF ingestion source. These files are no longer necessary, streamlining the documentation and focusing on essential components of the ingestion framework.
This commit corrects the links in the RDF specification documentation to point to the appropriate entity-specific specification files. The changes ensure that references to the glossary term, relationship, and domain specifications are accurate and accessible. Additionally, the SHACL Migration Guide section has been removed from the README to streamline the documentation.
…ting

This commit removes redundant RDF imports from the constants file and updates the tooltip formatting for the RDF_SOURCE field in the rdf.ts file to enhance readability. The changes improve code organization and maintainability.
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

This commit updates type annotations across various RDF-related classes and methods to improve type safety and clarity. Additionally, it enhances error handling by adding warnings when extractors or converters are not found, ensuring better debugging and maintainability. The changes also include minor adjustments to method signatures for consistency.
This commit refactors the RDF ingestion code by moving the TargetInterface to the orchestrator module and removing the obsolete target_factory module. Additionally, it simplifies the DataHubDomain and RDFGlossaryTerm classes by removing unnecessary properties, and updates tests to reflect these changes. The relationship extraction logic is also streamlined to only support BROADER and NARROWER types, enhancing clarity and maintainability.
@codecov
Copy link

codecov bot commented Dec 6, 2025

Bundle Report

Changes will increase total bundle size by 3.42kB (0.01%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 28.77MB 3.42kB (0.01%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js 3.42kB 19.15MB 0.02%

Files in assets/index-*.js:

  • ./src/app/ingestV2/source/builder/RecipeForm/constants.ts → Total Size: 10.05kB

  • ./src/app/ingestV2/source/multiStepBuilder/steps/step1SelectSource/sources.json → Total Size: 36.08kB

  • ./src/app/ingestV2/source/builder/RecipeForm/rdf.ts → Total Size: 2.48kB

  • ./src/app/ingestV2/source/builder/constants.ts → Total Size: 6.06kB

  • ./src/app/ingest/source/builder/sources.json → Total Size: 35.09kB

  • ./src/app/ingestV2/source/builder/sources.json → Total Size: 35.09kB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant