feat(ingestion/grafana): Add datasets and charts to dashboards with lineage and tags. Lineage back to source #12417

acrylJonny · 2025-01-21T15:20:10Z

Adding functionality to the existing Grafana connector. The existing connector supports Dashboard identification only Changed implement the following:

Charts
- Input datasets
- Input Columns
Datasets
- Dataset Schema
- Subtyping with definition where SQL
Lineage:
- between upstream sources and datasets
- between datasets and charts
- between charts and workbooks
- Supports SQL lineage reconciliation using datahub.sql_parsing.sqlglot_lineage falling back to sqlparse
Tag extraction with propagation back from dashboard through to chart and dataset
Owner extraction

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

codecov · 2025-01-21T15:58:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

metadata-ingestion/src/datahub/ingestion/source/grafana/models.py

metadata-ingestion/src/datahub/ingestion/source/grafana/field_utils.py

metadata-ingestion/src/datahub/ingestion/source/grafana/models.py

metadata-ingestion/src/datahub/ingestion/source/grafana/report.py

metadata-ingestion/tests/integration/grafana/test_grafana.py

metadata-ingestion/tests/unit/grafana/test_grafana_entity_mcp_builder.py

metadata-ingestion/src/datahub/ingestion/source/grafana/grafana_config.py

qgab-flowdesk · 2025-06-09T10:28:25Z

Hi! Any update on this one? Really impatient to get this connector 👀

codecov · 2025-06-10T15:48:39Z

Bundle Report

Changes will increase total bundle size by 31.41kB (0.16%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
datahub-react-web-esm	19.72MB	31.41kB (0.16%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/index-*.js`	401 bytes	16.06MB	0.0%
*`assets/grafana-.png`** (New)	31.01kB	31.01kB	100.0% 🚀

Files in assets/index-*.js:

./src/images/grafana.png → Total Size: 47 bytes
./src/app/ingest/source/builder/constants.ts → Total Size: 6.39kB
./src/app/ingest/source/builder/sources.json → Total Size: 32.23kB

sgomezvillamor · 2025-06-20T06:55:41Z

metadata-ingestion/docs/sources/grafana/grafana_pre.md

+
+For optimal lineage extraction from SQL-based data sources:
+
+- Queries should be well-formed and complete


"Well-formed" might be a bit ambiguous, could you clarify what kinds of queries are supported, and which ones aren’t?

metadata-ingestion/docs/sources/grafana/grafana_pre.md

sgomezvillamor · 2025-06-20T07:01:29Z

metadata-ingestion/docs/sources/grafana/grafana_pre.md

+    service_account_token: "your_token"
+
+    # Lineage extraction (default: true)
+    extract_lineage: true


In most of the sources, we usually call this config field: include_lineage. It would be nice to keep consistent names across sources.
And that would be aligned with the one a couple of lines below: include_column_lineage

sgomezvillamor · 2025-06-20T07:06:27Z

metadata-ingestion/src/datahub/ingestion/source/grafana/entity_mcp_builder.py

+            props[key] = str(value)
+
+    if panel.targets:
+        props["queryCount"] = str(len(panel.targets))


does queryCount describe the value?
by checking this code, I would call it targetsCount

sgomezvillamor · 2025-06-20T07:14:32Z

metadata-ingestion/src/datahub/ingestion/source/grafana/grafana_config.py

+    basic_mode: bool = Field(
+        default=False,
+        description="Enable basic extraction mode for users with limited permissions. "
+        "In basic mode, only dashboard metadata is extracted without detailed panel information, "
+        "lineage, or folder hierarchy. This requires only basic dashboard read permissions.",
+    )


What's the experience for a user running with limited permissions and basic_mode: False?
Will the user get errors/warnings with the permissions errors or any suggestion to set basicMode: True?

sgomezvillamor · 2025-06-20T07:15:00Z

metadata-ingestion/src/datahub/ingestion/source/grafana/grafana_config.py

+    )
+
+    # Lineage configuration
+    extract_lineage: bool = Field(


as suggested before, I would call this include_lineage

sgomezvillamor · 2025-06-20T07:15:42Z

metadata-ingestion/src/datahub/ingestion/source/grafana/grafana_config.py

+        default_factory=dict,
+        description="Map of Grafana datasource types/UIDs to platform connection configs for lineage extraction",
+    )
+    stateful_ingestion: Optional[StatefulStaleMetadataRemovalConfig] = None


is this needed considering the config already inherits from StatefulIngestionConfigBase?

sgomezvillamor · 2025-06-20T07:21:27Z

metadata-ingestion/src/datahub/ingestion/source/grafana/models.py

+References:
+- Grafana HTTP API: https://grafana.com/docs/grafana/latest/developers/http_api/
+- Dashboard API: https://grafana.com/docs/grafana/latest/developers/http_api/dashboard/
+- Folder API: https://grafana.com/docs/grafana/latest/developers/http_api/folder/
+- Search API: https://grafana.com/docs/grafana/latest/developers/http_api/other/#search-api
+- Dashboard JSON structure: https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/view-dashboard-json-model/


sgomezvillamor · 2025-06-20T07:24:12Z

metadata-ingestion/src/datahub/ingestion/source/grafana/models.py

+    targets: List[Dict[str, Any]] = Field(default_factory=list)
+    datasource: Optional[Dict[str, Any]] = None
+    field_config: Dict[str, Any] = Field(default_factory=dict, alias="fieldConfig")
+    transformations: List[Dict[str, Any]] = Field(default_factory=list)


what are the keys in this dicts?

a comment may help, or a better naming: eg targets_by_id, targets_by_name...

also, can we narrow down the value types rather than just Any?

Co-authored-by: Sergio Gómez Villamor <[email protected]>

acrylJonny added 2 commits January 21, 2025 15:17

initial commit

d4be2de

Delete grafana2

660daf8

acrylJonny marked this pull request as draft January 21, 2025 15:20

Merge branch 'master' into grafana-improvements

6cfbf5f

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jan 21, 2025

vercel bot deployed to Preview January 21, 2025 15:55 View deployment

acrylJonny and others added 3 commits January 21, 2025 21:44

second commit

23e4eca

updating tests and adding tags and ownership options

5468909

Merge branch 'master' into grafana-improvements

e684eed

acrylJonny marked this pull request as ready for review January 21, 2025 22:21

datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 21, 2025

acrylJonny added 2 commits January 21, 2025 22:39

error updates and better formatting of docs

eda52c4

Update grafana_api.py

507c811

vercel bot deployed to Preview January 21, 2025 23:39 View deployment

hsheth2 reviewed Jan 22, 2025

View reviewed changes

metadata-ingestion/src/datahub/ingestion/source/grafana/models.py Outdated Show resolved Hide resolved

hsheth2 requested review from sgomezvillamor and mayurinehate January 22, 2025 00:04

datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 22, 2025

acrylJonny and others added 3 commits January 22, 2025 11:08

updating to use basemodel. Cleaning up code

0e9f0dc

test updates

c832e91

Merge branch 'master' into grafana-improvements

0fe68d2

datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jan 22, 2025

reorder libraries to be alphabetical

c0e63d7

vercel bot had a problem deploying to Preview January 22, 2025 11:54 Failure

sgomezvillamor reviewed Jan 22, 2025

View reviewed changes

metadata-ingestion/src/datahub/ingestion/source/grafana/field_utils.py Outdated Show resolved Hide resolved

datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 22, 2025