implement get_lineage tool for complete lineage across dbt resources #461

Muizzkolapo · 2025-11-27T23:37:28Z

Summary

A User building a catalog-like experience with dbt and Snowflake via Claude Desktop requested full lineage capabilities #110:

"Get the full lineage of a model, not only its children/parents. To parse the full lineage with the official MCP it would require quite a few calls to the tool to parse through all layers, which would be nice to get the server doing async instead providing the full list as a return."

What Changed

Discovery Phase: GraphQL Introspection

We used GraphQL introspection to validate design decisions and understand the Discovery API's capabilities.

Why we used Sequential Search Strategy

Question: Does LineageFilter support searching by name/identifier? LineageFilter

Finding: - LineageFilter only accepts full unique IDs (no identifier or name field). Due to this Sequential search across resource types is necessary.

Two API Calls for "both" Direction

Question: Does ModelLineageNode include direction metadata to categorize ancestors vs descendants?
View ModelLineageNode Introspection

Finding: - No direction metadata exists (no relationshipDirection, isAncestor, isDescendant, or dependsOn fields). Two separate API calls required for "both" direction.

Graphql Client-Side Pagination

Question: Does the lineage query support server-side pagination parameters?

Introspect

Finding: - No pagination parameters available (no first, after, offset, or limit arguments). Client-side 50-node limit is necessary.

List Of Resource Search Support

Question: Which resource types support searching by name (via identifier field)?

link Introspection

Finding: - 4 out of 7 resource types support name search:

✅ Models (via ModelAppliedFilter.identifier)
✅ Sources (via SourceAppliedFilter.identifier)
✅ Seeds (via GenericMaterializedFilter.identifier)
✅ Snapshots (via GenericMaterializedFilter.identifier)
❌ Exposures (no identifier field)
❌ Tests (no identifier field)
❌ Metrics (no filter fields)

Design Approach

Complete Resource Search Coverage

Based on introspection findings:

Resource Type	Filter Type	Has `identifier`?	Search Support
Models	`ModelAppliedFilter`	✅ Yes	✅ Implemented
Sources	`SourceAppliedFilter`	✅ Yes	✅ Implemented
Seeds	`GenericMaterializedFilter`	✅ Yes	✅ Implemented
Snapshots	`GenericMaterializedFilter`	✅ Yes	✅ Implemented
Exposures	`ExposureFilter`	❌ No	❌ Requires unique_id
Tests	`TestAppliedFilter`	❌ No	❌ Requires unique_id
Metrics	`MetricFilter`	❌ No fields	❌ Requires unique_id

Key Design Decisions

Decision	Choice	Rationale
Name Resolution	Internal (resolve inside `get_lineage` tool)	Better UX - single call instead of requiring users to manually resolve names to unique IDs
Search Order	Models → Sources → Seeds → Snapshots	Optimized by frequency: Models (~90%), Sources (~8%), Seeds (~1%), Snapshots (~1%) - fast path for common cases
Search Implementation	Sequential search across resource types	No unified search API exists - must query each filter type separately
Disambiguation	MCP Elicitation with clarifying fallback prompt	Try interactive selection first using elicitation to get the response from user if the client supports it and, fall back to structured response on timeout/error - guides both LLM and human users
Direction="both"	Two API calls + merge	No direction metadata in responses - we had to make separate `+uniqueId` and `uniqueId+` calls to categorize ancestors vs descendants
Pagination	Hard limit: 50 nodes/direction	No server-side pagination available - client-side limit protects LLM token budget
Unsearchable Types	Clarifying prompt like error messages with examples to guide LLM/User to next steps when users request for non supported resource types	When name search finds no matches, return helpful error that: (1) Lists which types ARE searchable (models, sources, seeds, snapshots), (2) Suggests the user might be searching for an exposure/test/metric, (3) Shows exact `unique_id` format examples for their input (e.g., `exposure.project.{name}`), (4) Explains API limitation so users understand why

Critical Test Cases

1. Disambiguation Flow (Which is one of the Important UX Feature in this change). This model was selected because it will trigger eliciation.

Prompt:

"Get lineage for customers"

Use mcp inspector to force elicitation

2. Prompt like Error Message (Resource Type Limitation Handling)

Prompt:

"Show lineage for customer_dashboard"

3. Full Lineage (Feature Completeness)

Prompt:

"Show full lineage for orders_snapshot"

Why

Implementing to close #110

Related Issues

Closes ##110
Related to ##110

Checklist

I have performed a self-review of my code
I have made corresponding changes to the documentation (in https://github.com/dbt-labs/docs.getdbt.com) if required -- Mention it here Add description for get_lineage Too docs.getdbt.com#8233
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

Related Issues

Closes #
Related to #

Checklist

I have performed a self-review of my code
I have made corresponding changes to the documentation (in https://github.com/dbt-labs/docs.getdbt.com) if required -- Mention it here Add description for get_lineage Too docs.getdbt.com#8233
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

DevonFulcher

Awesome work!

DevonFulcher · 2025-12-01T21:49:08Z

src/dbt_mcp/discovery/client.py

        }
    """)

+    GET_SEEDS = textwrap.dedent("""


I just merged a PR that adds GQL queries for seeds & snapshots. Can we reuse those queries here?

I like the new GQL folder pattern, I have placed the GQLs in a folder, but I dont think we can reuse the existing GQLs. The reason is because the queries use resources() with AppliedResourcesFilter, which requires knowing the unique_id upfront (via the uniqueIds array).

The new search gql queries we created for the lineage are lightweight searches using the specific seeds()/snapshots() endpoints with GenericMaterializedFilter's identifier field for name-based matching.

Through introspection, I noticed that GenericMaterializedFilter has native identifier support for name matching, so we use that for 1-call lookups instead of the package enumeration pattern.

DevonFulcher · 2025-12-01T21:52:46Z

src/dbt_mcp/discovery/tools.py

+    direction: str = "both",
+    types: list[str] | None = None,


Would it be better if these were enums or some other stronger type than strings?

Good catch! I have Implemented this.

DevonFulcher · 2025-12-02T14:43:34Z

src/dbt_mcp/discovery/tools.py

+        matches = await context.lineage_fetcher.search_all_resources(name)
+        if not matches:
+            raise InvalidParameterError(
+                f"No resource found with name '{name}' in searchable resource types "
+                f"(models, sources, seeds, snapshots).\n\n"
+                f"If this is an exposure, test, or metric, you must use the full unique_id instead:\n"
+                f"  • For exposures: get_lineage(unique_id='exposure.project.{name}')\n"
+                f"  • For tests: get_lineage(unique_id='test.project.{name}')\n"
+                f"  • For metrics: get_lineage(unique_id='metric.project.{name}')\n\n"
+                f"Note: The Discovery API does not support searching exposures, tests, or metrics by name. "
+                f"You can find unique IDs in your dbt Cloud project or manifest.json."
+            )
+        if len(matches) == 1:
+            resolved_unique_id = matches[0]["uniqueId"]
+        else:
+            # Multiple matches - try elicitation first, fallback to disambiguation
+            try:
+                # Format matches for display
+                match_descriptions = [
+                    f"{m['resourceType']}: {m['uniqueId']}" for m in matches
+                ]
+                message = (
+                    f"Multiple resources found with name '{name}':\n"
+                    + "\n".join(f"  {i+1}. {desc}" for i, desc in enumerate(match_descriptions))
+                    + "\n\nSelect the unique_id of the resource you want:"
+                )
+
+                result = await ctx.elicit(
+                    message=message,
+                    schema=ResourceSelection
+                )
+
+                if result.action == "accept":
+                    # Validate the selected unique_id is in matches
+                    selected_id = result.data.unique_id
+                    if selected_id in [m["uniqueId"] for m in matches]:
+                        resolved_unique_id = selected_id
+                    else:
+                        raise InvalidParameterError(
+                            f"Selected unique_id '{selected_id}' not in available matches"
+                        )


This is pretty interesting! How do you think it compares to the approach I took in my recent changes here?

I actually came up with this pattern after introspection of the Discovery API since ModelAppliedFilter, SourceAppliedFilter, and GenericMaterializedFilter all have an identifier field for name-based searching, I just used that native support instead which means we can just make fewer calls.

I think either approach works well, and I feel okay with trying this out. It will be our first use of elicitation, so that is exciting! It is a creative solution. Thanks for that.

At some point, we should consolidate our approaches, though. I would like the Discovery tools to work uniformly.

hmm yeah definitely, I think we should be able to get elicitation to work with either approaches. Question for you then can we assume the approach here would be the standard going forward https://github.com/dbt-labs/dbt-mcp/blob/main/src/dbt_mcp/discovery/client.py#L625-L650?

I think the approach could be: try elicitation. If the client doesn't support elicitation, fall back to the method you linked.

My point is that this could be the method for all discovery tools, not just lineage. That uniformity doesn't have to be done in this PR, though.

alright I will just raise a separate issue to standardize the approach for discovery tools

DevonFulcher · 2025-12-02T14:50:09Z

src/dbt_mcp/discovery/client.py

+            )
+            descendants_result = await self._fetch_lineage_single_direction(
+                unique_id, LineageDirection.DESCENDANTS, types
+            )


Is the reason for the two API calls to simplify labeling ancestors and descendants? The selector syntax allows for both directions here, but I assume the separate calls make it easier to label the nodes. Is that right?

One suggestion, if we keep two separate calls, they can be done concurrently with asyncio.gather().

yup! the reason for the two calls is so we can group them into ancestors vs descendants. When using +uniqueId+, the API returns everything mixed together, so we'd need extra logic to figure out which nodes are uancestors vs descendants. I have implemented the asyncio.gather() for parallel execution.

Copilot

Pull request overview

This PR adds a new get_lineage tool that retrieves complete upstream and downstream lineage for dbt resources in a single API call, addressing user feedback about needing multiple calls to traverse the lineage graph. The implementation includes name-based search with disambiguation support and fallback mechanisms for resource types that require unique IDs.

Key Changes:

New LineageFetcher class handles resource search and lineage retrieval across models, sources, seeds, and snapshots
Interactive disambiguation via MCP elicitation when multiple resources match a name
Client-side pagination limiting results to 50 nodes per direction

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/dbt_mcp/discovery/client.py`	Adds `LineageFetcher` class with search and lineage fetching capabilities, including GraphQL queries for seeds, snapshots, and lineage
`src/dbt_mcp/discovery/tools.py`	Implements `get_lineage` tool with parameter validation, name resolution, and elicitation-based disambiguation
`src/dbt_mcp/prompts/discovery/get_lineage.md`	Documents tool usage, parameters, limitations, and examples
`src/dbt_mcp/tools/tool_names.py`	Registers new `GET_LINEAGE` tool name
`src/dbt_mcp/tools/toolsets.py`	Adds `GET_LINEAGE` to discovery toolset
`tests/unit/discovery/test_lineage_fetcher.py`	Tests for resource search, selector building, lineage fetching, and pagination
`tests/unit/discovery/test_get_lineage_tool.py`	Tests for parameter validation, name resolution, and disambiguation flow
`tests/unit/discovery/conftest.py`	Adds fixtures for `LineageFetcher` and mock contexts
`.changes/unreleased/Enhancement or New Feature-20251129-121002.yaml`	Changelog entry for the new feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-02T15:42:48Z

src/dbt_mcp/discovery/tools.py

+                    "matches": matches,
+                    "instruction": "Please call get_lineage again with the unique_id parameter set to one of the matches above.",
+                }
+


The type ignore comment suggests that resolved_unique_id could potentially be None at runtime. Since the code paths ensure it's always assigned (either from unique_id parameter, single match, or elicitation), consider adding an assertion to make this guarantee explicit: assert resolved_unique_id is not None before line 291.

Suggested change

assert resolved_unique_id is not None, "resolved_unique_id must not be None at this point"

Copilot · 2025-12-02T15:42:48Z

tests/unit/discovery/test_get_lineage_tool.py

+            {"uniqueId": "source.test.raw.customers", "name": "customers", "resourceType": "Source"},
+        ]
+        # Simulate elicitation failure
+        mock_mcp_context.elicit.side_effect = Exception("Elicitation not supported")


The test simulates a generic Exception, but the actual code catches all exceptions at line 281 in tools.py. Consider testing with more specific exception types (e.g., TimeoutError, ConnectionError) to ensure the fallback behavior works correctly for different failure scenarios.

Copilot · 2025-12-02T15:42:48Z

src/dbt_mcp/discovery/client.py

+            "first": PAGE_SIZE,
+        }
+
+        # Execute query - already a direct reference to the query string


[nitpick] The comment on line 1026 'already a direct reference to the query string' is unclear about why this reassignment is necessary. Consider clarifying that this extracts the GraphQL query string from the configuration dictionary for readability.

Suggested change

# Execute query - already a direct reference to the query string

# Extract the GraphQL query string from the configuration dictionary for readability

Comment clarified added in branch

Copilot · 2025-12-02T15:42:49Z

src/dbt_mcp/discovery/tools.py

+                # Format matches for display
+                match_descriptions = [
+                    f"{m['resourceType']}: {m['uniqueId']}" for m in matches
+                ]
+                message = (
+                    f"Multiple resources found with name '{name}':\n"
+                    + "\n".join(f"  {i+1}. {desc}" for i, desc in enumerate(match_descriptions))
+                    + "\n\nSelect the unique_id of the resource you want:"


[nitpick] The message construction spans multiple operations (list comprehension, string concatenation, enumeration). Consider extracting this into a helper method _format_disambiguation_message(name: str, matches: list[dict]) -> str to improve readability and testability.

added _format_disambiguation_message

DevonFulcher · 2025-12-03T15:57:50Z

src/dbt_mcp/discovery/client.py

+# ============================================================================
+# Lineage Tool Classes and Configuration
+# ============================================================================


Rather than a demarcation like this, should we move this to a different file? It seems we could break this file up a bit more in general.

DevonFulcher · 2025-12-03T16:05:15Z

src/dbt_mcp/discovery/tools.py

+                logger.debug(f"Elicitation not completed: {type(e).__name__}: {e}")
+                return {
+                    "status": "disambiguation_required",
+                    "message": f"Multiple resources found with name '{name}'",
+                    "matches": matches,
+                    "instruction": "Please call get_lineage again with the unique_id parameter set to one of the matches above.",
+                }


I think we should handle this situation in a different way. Not all clients support elicitation, so we should fall back to a different method. Perhaps we should return a list of trees? Additionally, are you familiar with how elicitation is handled with remote MCP? We should ensure this tool is compatible with remote MCP.

I think I will remove elicitation for now as I am not sure how it would work in remote mcp, I will create a different issue to look into this approach.

Muizzkolapo marked this pull request as ready for review November 28, 2025 21:38

Muizzkolapo requested review from a team, b-per and jasnonaz as code owners November 28, 2025 21:38

Muizzkolapo mentioned this pull request Nov 28, 2025

Add description for get_lineage Too dbt-labs/docs.getdbt.com#8233

Open

3 tasks

Muizzkolapo changed the title ~~implement-get-lineage~~ implement get_lineage tool for complete lineage across dbt resources Nov 28, 2025

Muizzkolapo marked this pull request as draft November 29, 2025 11:44

Muizzkolapo marked this pull request as ready for review November 29, 2025 12:18

Muizzkolapo requested a review from a team as a code owner November 29, 2025 12:18

DevonFulcher reviewed Dec 2, 2025

View reviewed changes

DevonFulcher requested a review from Copilot December 2, 2025 15:41

Copilot AI reviewed Dec 2, 2025

View reviewed changes

Muizzkolapo added 15 commits December 2, 2025 18:58

conflict-resolve

faaaee2

fix-rebase

622efa1

refactor-long-chain

967b7f5

revert-change

a6f212c

revert-change

ab236e9

add-refactored-code

7f8c8d9

refactor-RESOURCE_SEARCH_CONFIG

7d0bbe8

remove-old-configs

9b3b4ff

add-new-fixez

d0c84ec

add-new-fixez

63a381a

add-changie

daf2015

rm-gitignore

456dda4

fix-pr-comments

12d2671

add-aysnc-query

b461a62

fix-copilot

62cf578

Muizzkolapo force-pushed the feat/110-get-lineage-tool branch from 461f5fc to 62cf578 Compare December 2, 2025 20:07

DevonFulcher reviewed Dec 3, 2025

View reviewed changes



	assert resolved_unique_id is not None, "resolved_unique_id must not be None at this point"

	# Execute query - already a direct reference to the query string
	# Extract the GraphQL query string from the configuration dictionary for readability

implement get_lineage tool for complete lineage across dbt resources #461

Are you sure you want to change the base?

implement get_lineage tool for complete lineage across dbt resources #461

Uh oh!

Conversation

Muizzkolapo commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Discovery Phase: GraphQL Introspection

Why we used Sequential Search Strategy

Two API Calls for "both" Direction

Graphql Client-Side Pagination

List Of Resource Search Support

Design Approach

Complete Resource Search Coverage

Key Design Decisions

Critical Test Cases

1. Disambiguation Flow (Which is one of the Important UX Feature in this change). This model was selected because it will trigger eliciation.

2. Prompt like Error Message (Resource Type Limitation Handling)

3. Full Lineage (Feature Completeness)

Why

Related Issues

Checklist

Additional Notes

Related Issues

Checklist

Additional Notes

Uh oh!

DevonFulcher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Muizzkolapo commented Nov 27, 2025 •

edited

Loading