SeaDatabricksClient: Add Metadata Commands #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

varun-edachali-dbx wants to merge 109 commits into sea-migration from metadata-sea

Collaborator

varun-edachali-dbx commented Jun 11, 2025 •

edited

Loading

What type of PR is this?

Feature

Description

Add metadata command implementations for the SeaDatabricksClient (execution phase) - get_catalogs, get_schemas, get_tables and get_columns.

How is this tested?

Unit tests
E2E Tests
Manually - using the test scripts to be introduced in Introduce manual SEA test scripts for Exec Phase #589
N/A

The coverage of the functionality added (by test_filters.py and the new tests in test_sea_backend.py) are as below:

Module	Statements	Missing	Coverage	Notes
`filters.py`	33	1	97%	Line 21: `from databricks.sql.result_set import ResultSet, SeaResultSet` (TYPE_CHECKING import)
`sea/backend.py` (metadata methods)	121	0	100%	Fully covered

Related Tickets & Documents

https://docs.google.com/document/d/1Y-eXLhNqqhrMVGnOlG8sdFrCxBTN1GdQvuKG4IfHmo0/edit?usp=sharing

varun-edachali-dbx added 30 commits

June 9, 2025 06:24


          [squash from exec-sea] bring over execution phase changes

138c2ae

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess test

3e3ab94

Signed-off-by: varun-edachali-dbx <[email protected]>


          add docstring

4a78165

Signed-off-by: varun-edachali-dbx <[email protected]>


          remvoe exec func in sea backend

0dac4aa

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess files

1b794c7

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess models

da5a6fe

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess sea backend tests

686ade4

Signed-off-by: varun-edachali-dbx <[email protected]>


          cleanup

31e6c83

Signed-off-by: varun-edachali-dbx <[email protected]>


          re-introduce get_schema_desc

69ea238

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove SeaResultSet

66d7517

Signed-off-by: varun-edachali-dbx <[email protected]>


          clean imports and attributes

71feef9

Signed-off-by: varun-edachali-dbx <[email protected]>


          pass CommandId to ExecResp

ae9862f

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove changes in types

d8aa69e

Signed-off-by: varun-edachali-dbx <[email protected]>


          add back essential types (ExecResponse, from_sea_state)

db139bc

Signed-off-by: varun-edachali-dbx <[email protected]>


          fix fetch types

b977b12

Signed-off-by: varun-edachali-dbx <[email protected]>


          excess imports

da615c0

Signed-off-by: varun-edachali-dbx <[email protected]>


          reduce diff by maintaining logs

0da04a6

Signed-off-by: varun-edachali-dbx <[email protected]>


          fix int test types

ea9d456

Signed-off-by: varun-edachali-dbx <[email protected]>


          [squashed from exec-sea] init execution func

8985c62

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove irrelevant changes

d9bcdbe

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove ResultSetFilter functionality

ee9fa1c

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove more irrelevant changes

24c6152

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove more irrelevant changes

67fd101

Signed-off-by: varun-edachali-dbx <[email protected]>


          even more irrelevant changes

271fcaf

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove sea response as init option

bf26ea3

Signed-off-by: varun-edachali-dbx <[email protected]>


          exec test example scripts

ed7cf91

Signed-off-by: varun-edachali-dbx <[email protected]>


          formatting (black)

dae15e3

Signed-off-by: varun-edachali-dbx <[email protected]>


          [squashed from sea-exec] merge sea stuffs

db5bbea

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess changes

d5d3699

Signed-off-by: varun-edachali-dbx <[email protected]>


          remove excess removed docstring

6137a3d

Signed-off-by: varun-edachali-dbx <[email protected]>

varun-edachali-dbx temporarily deployed to azure-prod

June 18, 2025 07:58

— with

GitHub Actions Inactive


          introduce unit tests for metadata methods

28675f5

Signed-off-by: varun-edachali-dbx <[email protected]>

varun-edachali-dbx temporarily deployed to azure-prod

June 18, 2025 08:28

— with

GitHub Actions Inactive

databricks deleted a comment from github-actions bot

varun-edachali-dbx marked this pull request as ready for review

June 18, 2025 08:29

varun-edachali-dbx requested review from deeksha-db, samikshya-db, jprakash-db, jackyhu-db, madhav-db, gopalldb, jayantsing-db, vikrantpuppala and shivam2680 as code owners

June 18, 2025 08:30

databricks deleted a comment from github-actions bot

jayantsing-db requested changes

View reviewed changes

Contributor

jayantsing-db left a comment •

edited

Loading

Added some comments

src/databricks/sql/backend/filters.py Outdated Show resolved Hide resolved

src/databricks/sql/backend/filters.py Outdated Show resolved Hide resolved

src/databricks/sql/backend/filters.py Outdated Show resolved Hide resolved

src/databricks/sql/backend/filters.py Outdated Show resolved Hide resolved

src/databricks/sql/backend/filters.py

+                  """
+                  @staticmethod
+                  def _filter_sea_result_set(

Contributor

jayantsing-db Jun 20, 2025

this is specific to SEA result set and can't be used for a generic result set class? let's try to make it generic for a result set

Collaborator Author

varun-edachali-dbx Jun 20, 2025

I think we need some service specific methods at some point during the filtering process to know what kind of result set to return, since our concrete instances are service specific. I tried to keep the root methods invoked (filter by table type) general, following which they invoke the service specific builders based on the type of the instance passed to them.

src/databricks/sql/backend/filters.py

+                          sea_client=cast(SeaDatabricksClient, result_set.backend),
+                          buffer_size_bytes=result_set.buffer_size_bytes,
+                          arraysize=result_set.arraysize,
+                          result_data=result_data,

Contributor

jayantsing-db Jun 20, 2025

could you remind me what is the significance of this result_data param in result set? is this present in the base class? Is this an optional param and is used to create a result set with hard-coded rows?

Collaborator Author

varun-edachali-dbx Jun 20, 2025

It is not present in the base class, it is an instance of a ResultData model which represents the results returned during SEA execution. In our case, we set the filtered rows in the data array of this ResultData to effectively create a filtered SeaResultSet.

src/databricks/sql/backend/filters.py Outdated

Comment on lines 50 to 85

+                      all_rows = result_set.results.remaining_rows()
+                      # Filter rows
+                      filtered_rows = [row for row in all_rows if filter_func(row)]
+                      # Import SeaResultSet here to avoid circular imports
+                      from databricks.sql.result_set import SeaResultSet
+                      # Reuse the command_id from the original result set
+                      command_id = result_set.command_id
+                      # Create an ExecuteResponse with the filtered data
+                      execute_response = ExecuteResponse(
+                          command_id=command_id,
+                          status=result_set.status,
+                          description=result_set.description,
+                          has_been_closed_server_side=result_set.has_been_closed_server_side,
+                          lz4_compressed=result_set.lz4_compressed,
+                          arrow_schema_bytes=result_set._arrow_schema_bytes,
+                          is_staging_operation=False,
+                      )
+                      # Create a new ResultData object with filtered data
+                      from databricks.sql.backend.sea.models.base import ResultData
+                      result_data = ResultData(data=filtered_rows, external_links=None)
+                      # Create a new SeaResultSet with the filtered data
+                      filtered_result_set = SeaResultSet(
+                          connection=result_set.connection,
+                          execute_response=execute_response,
+                          sea_client=cast(SeaDatabricksClient, result_set.backend),
+                          buffer_size_bytes=result_set.buffer_size_bytes,
+                          arraysize=result_set.arraysize,
+                          result_data=result_data,
+                      )

Contributor

jayantsing-db Jun 20, 2025

I think the whole implementation can be improved. you are essentially first downloading the complete result set and then initialising a new one. a filter method should ideally just take the object to be filtered and return true/false on it.

Collaborator Author

varun-edachali-dbx Jun 20, 2025

Introducing a filter method that is utilised in the fetch phase would lead to a lot of specialised code for the table queries during the fetch phase.

Currently, all that the execution relevant methods (execute_command and the metadata methods like get_tables, get_schemas, etc.) do is return a ResultSet that is set as the active result set of the Cursor.

The fetch phase from this step on is completely invariant of the kind of query that took place. If we want to use a separate filter method, then we would have to add custom logic during the fetch (if table metadata then filter result; return result;). Maintaining the generality of the fetch phase is likely worth the tradeoff involved in creating a new copy.

src/databricks/sql/backend/filters.py Outdated

Comment on lines 90 to 106

+                  def filter_by_column_values(
+                      result_set: "ResultSet",
+                      column_index: int,
+                      allowed_values: List[str],
+                      case_sensitive: bool = False,
+                  ) -> "ResultSet":
+                      """
+                      Filter a result set by values in a specific column.
+                      Args:
+                          result_set: The result set to filter
+                          column_index: The index of the column to filter on
+                          allowed_values: List of allowed values for the column
+                          case_sensitive: Whether to perform case-sensitive comparison
+                      Returns:
+                          A filtered result set

Contributor

jayantsing-db Jun 20, 2025

same as above

src/databricks/sql/backend/filters.py Outdated

Comment on lines 138 to 165

+                  def filter_tables_by_type(
+                      result_set: "ResultSet", table_types: Optional[List[str]] = None
+                  ) -> "ResultSet":
+                      """
+                      Filter a result set of tables by the specified table types.
+                      This is a client-side filter that processes the result set after it has been
+                      retrieved from the server. It filters out tables whose type does not match
+                      any of the types in the table_types list.
+                      Args:
+                          result_set: The original result set containing tables
+                          table_types: List of table types to include (e.g., ["TABLE", "VIEW"])
+                      Returns:
+                          A filtered result set containing only tables of the specified types
+                      """
+                      # Default table types if none specified
+                      DEFAULT_TABLE_TYPES = ["TABLE", "VIEW", "SYSTEM TABLE"]
+                      valid_types = (
+                          table_types if table_types and len(table_types) > 0 else DEFAULT_TABLE_TYPES
+                      )
+                      # Table type is the 6th column (index 5)
+                      return ResultSetFilter.filter_by_column_values(
+                          result_set, 5, valid_types, case_sensitive=True
+                      )

Contributor

jayantsing-db Jun 20, 2025

same as above

src/databricks/sql/backend/sea/backend.py Outdated Show resolved Hide resolved


          remove verbosity in ResultSetFilter docstring

Co-authored-by: jayant <[email protected]>

varun-edachali-dbx temporarily deployed to azure-prod

June 20, 2025 08:26

— with

GitHub Actions Inactive

github-actions bot commented Jun 20, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).


          remove un-necessary info in ResultSetFilter docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

varun-edachali-dbx temporarily deployed to azure-prod

June 20, 2025 08:28

— with

GitHub Actions Inactive


          remove explicit type checking, string literals around forward annotat…

22dc252

…ions

Signed-off-by: varun-edachali-dbx <[email protected]>

varun-edachali-dbx temporarily deployed to azure-prod

June 20, 2025 08:33

— with

GitHub Actions Inactive


          house SQL commands in constants

390f592

Signed-off-by: varun-edachali-dbx <[email protected]>

varun-edachali-dbx had a problem deploying to azure-prod

June 20, 2025 08:56

— with

GitHub Actions Failure

databricks deleted a comment from github-actions bot

databricks deleted a comment from github-actions bot

databricks deleted a comment from github-actions bot

databricks deleted a comment from github-actions bot

varun-edachali-dbx requested a review from jayantsing-db

June 20, 2025 10:49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

deeksha-db Awaiting requested review from deeksha-db deeksha-db is a code owner

samikshya-db Awaiting requested review from samikshya-db samikshya-db is a code owner

jprakash-db Awaiting requested review from jprakash-db jprakash-db is a code owner

jackyhu-db Awaiting requested review from jackyhu-db jackyhu-db is a code owner

madhav-db Awaiting requested review from madhav-db madhav-db is a code owner

gopalldb Awaiting requested review from gopalldb gopalldb is a code owner

vikrantpuppala Awaiting requested review from vikrantpuppala vikrantpuppala is a code owner

shivam2680 Awaiting requested review from shivam2680 shivam2680 is a code owner

jayantsing-db Awaiting requested review from jayantsing-db jayantsing-db is a code owner

Labels

None yet