-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/grafana): Add datasets and charts to dashboards with lineage and tags. Lineage back to source #12417
base: master
Are you sure you want to change the base?
feat(ingestion/grafana): Add datasets and charts to dashboards with lineage and tags. Lineage back to source #12417
Conversation
metadata-ingestion/src/datahub/ingestion/source/grafana/models.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/grafana/field_utils.py
Outdated
Show resolved
Hide resolved
|
||
|
||
def _deduplicate_fields(fields: List[SchemaFieldClass]) -> List[SchemaFieldClass]: | ||
"""Remove duplicate fields based on fieldPath while preserving order.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this in the codebase — does it fit the purpose?
def deduplicate_list(iterable: Iterable[_T]) -> List[_T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gave it a try and unfortunately it doesn't do the business. I gave it a go and it was causing hashing issues.
metadata-ingestion/src/datahub/ingestion/source/grafana/field_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good!
Reviewing lengthy PRs is always challenging.
Since the code is very-well-structured (kudos!), this would be a great opportunity to add proper unit testing.
for col in columns | ||
] | ||
except (IndexError, ValueError, StopIteration): | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log some warning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put this in logger but were you looking for this to be in the report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger should be fine
given we are catching three types of exceptions, adding the exception to the log trace may help to discriminate the issue
logger.warning(f"Failed to parse SQL {target.get('rawSql')}", ex)
"""Fetch all folders from Grafana with pagination.""" | ||
folders: List[Folder] = [] | ||
page = 1 | ||
per_page = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the experience with other sources, it's nice to have this as a config parameter with a default value (page_size
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to just specify the page size to reduce the overhead of anyone needing to think about changing this, but happy to change.
platform_instance: Optional[str] = Field( | ||
default=None, description="Platform instance" | ||
) | ||
env: str = Field(default="PROD", description="Environment") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two may be removed if using PlatformInstanceConfigMixin
and EnvConfigMixin
in GrafanaSourceConfig
, are they really needed at PlatformConnectionConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the to use the parent class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platform_instance
and env
still there as well as platform_instance
in GrafanaSourceConfig
🤔
what parent class do you refer to?
metadata-ingestion/src/datahub/ingestion/source/grafana/models.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/grafana/models.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/grafana/models.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/grafana/snapshots.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I'd much appreciate a "Concept Mapping" section in docs.
- Automated tag extraction | ||
- Support for both HTTP and HTTPS connections with optional SSL verification | ||
|
||
Prerequisites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Can you upgrade Prerequisites
to a H3 heading and maybe move to metadata-ingestion/docs/sources/grafana/grafana_pre.md
?
- Access data source configurations | ||
- View user information | ||
|
||
A sample configuration file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I absolutely enjoyed reading this doc in flow, we have a place to special place to keep the recipe. Can you move it to metadata-ingestion/docs/sources/grafana/grafana_recipe.yml
) | ||
|
||
# Generate dashboard container first | ||
yield from gen_containers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly this is new addition over only dashboard entity that was present earlier. so one container corresponding to each dashboard - to encompass panels (chart) and datasource (dataset) entity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so wondering if datasources are dashboard scoped or global in grafana.
platform_instance=self.source_config.platform_instance, | ||
|
||
yield from add_dataset_to_container( | ||
container_key=folder_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set parent_container_key = folder_key
in gen_containers for dashboard instead
yield lineage.as_workunit() | ||
|
||
# Create chart MCE | ||
dataset_urn, chart_mce = build_chart_mce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to move away from Snapshot / MCE and use Aspect / MCP instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also applicable for Chart and Dataset Snapshots below.
return None | ||
|
||
ds_type, ds_uid = self._extract_datasource_info(panel.datasource) | ||
raw_sql = self._extract_raw_sql(panel.targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if we ingest raw query used in panel anywhere in model. I believe the language of query depends on the source it integrates with.
return None | ||
|
||
ds_type, ds_uid = self._extract_datasource_info(panel.datasource) | ||
raw_sql = self._extract_raw_sql(panel.targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if we ingest raw query used in panel anywhere in datahub model? I believe the language of query depends on the source it integrates with.
env=self.env, | ||
) | ||
|
||
def _process_platform_lineage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used ? Let's remove if this is not used
} | ||
|
||
|
||
class GrafanaTypeMapper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class used anywhere ?
service_account = grafana_client.create_service_account( | ||
name="example-service-account", role="Viewer" | ||
name="example-service-account", role="Admin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for changing role here ? Do we require elevated permissions for changes in this PR?
from_start = sql.index("from") | ||
select_part = sql[select_start:from_start].strip() | ||
|
||
columns = [col.strip().split()[-1].strip() for col in select_part.split(",")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that split by ,
will actually split by columns may be wrong assumption, eg
: SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users
Not sure if we need to address this complexity though. WDYT?
Adding functionality to the existing Grafana connector. The existing connector supports Dashboard identification only Changed implement the following:
Checklist