Skip to content

feat(ingestion/grafana): Add datasets and charts to dashboards with lineage and tags. Lineage back to source #12417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

acrylJonny
Copy link
Collaborator

@acrylJonny acrylJonny commented Jan 21, 2025

Adding functionality to the existing Grafana connector. The existing connector supports Dashboard identification only Changed implement the following:

  • Charts
    • Input datasets
    • Input Columns
  • Datasets
    • Dataset Schema
    • Subtyping with definition where SQL
  • Lineage:
    • between upstream sources and datasets
    • between datasets and charts
    • between charts and workbooks
    • Supports SQL lineage reconciliation using datahub.sql_parsing.sqlglot_lineage falling back to sqlparse
  • Tag extraction with propagation back from dashboard through to chart and dataset
  • Owner extraction
image

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@acrylJonny acrylJonny marked this pull request as draft January 21, 2025 15:20
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jan 21, 2025
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@acrylJonny acrylJonny marked this pull request as ready for review January 21, 2025 22:21
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 21, 2025
@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 22, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 22, 2025
@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 22, 2025
@qgab-flowdesk
Copy link

Hi! Any update on this one? Really impatient to get this connector 👀

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jun 9, 2025
Copy link

codecov bot commented Jun 10, 2025

Bundle Report

Changes will increase total bundle size by 31.41kB (0.16%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 19.72MB 31.41kB (0.16%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js 401 bytes 16.06MB 0.0%
assets/grafana-*.png (New) 31.01kB 31.01kB 100.0% 🚀

Files in assets/index-*.js:

  • ./src/images/grafana.png → Total Size: 47 bytes

  • ./src/app/ingest/source/builder/constants.ts → Total Size: 6.39kB

  • ./src/app/ingest/source/builder/sources.json → Total Size: 32.23kB


For optimal lineage extraction from SQL-based data sources:

- Queries should be well-formed and complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Well-formed" might be a bit ambiguous, could you clarify what kinds of queries are supported, and which ones aren’t?

service_account_token: "your_token"

# Lineage extraction (default: true)
extract_lineage: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most of the sources, we usually call this config field: include_lineage. It would be nice to keep consistent names across sources.
And that would be aligned with the one a couple of lines below: include_column_lineage

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jun 20, 2025
props[key] = str(value)

if panel.targets:
props["queryCount"] = str(len(panel.targets))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does queryCount describe the value?
by checking this code, I would call it targetsCount

Comment on lines +62 to +67
basic_mode: bool = Field(
default=False,
description="Enable basic extraction mode for users with limited permissions. "
"In basic mode, only dashboard metadata is extracted without detailed panel information, "
"lineage, or folder hierarchy. This requires only basic dashboard read permissions.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the experience for a user running with limited permissions and basic_mode: False?
Will the user get errors/warnings with the permissions errors or any suggestion to set basicMode: True?

)

# Lineage configuration
extract_lineage: bool = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as suggested before, I would call this include_lineage

default_factory=dict,
description="Map of Grafana datasource types/UIDs to platform connection configs for lineage extraction",
)
stateful_ingestion: Optional[StatefulStaleMetadataRemovalConfig] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed considering the config already inherits from StatefulIngestionConfigBase?

Comment on lines +3 to +8
References:
- Grafana HTTP API: https://grafana.com/docs/grafana/latest/developers/http_api/
- Dashboard API: https://grafana.com/docs/grafana/latest/developers/http_api/dashboard/
- Folder API: https://grafana.com/docs/grafana/latest/developers/http_api/folder/
- Search API: https://grafana.com/docs/grafana/latest/developers/http_api/other/#search-api
- Dashboard JSON structure: https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/view-dashboard-json-model/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💟

Comment on lines +25 to +28
targets: List[Dict[str, Any]] = Field(default_factory=list)
datasource: Optional[Dict[str, Any]] = None
field_config: Dict[str, Any] = Field(default_factory=dict, alias="fieldConfig")
transformations: List[Dict[str, Any]] = Field(default_factory=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the keys in this dicts?

a comment may help, or a better naming: eg targets_by_id, targets_by_name...

also, can we narrow down the value types rather than just Any?

@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants