Skip to content

Conversation

@dreadatour
Copy link
Contributor

Improve UDF error handling: return raw error from UDF to handle it properly in the parent process.

@dreadatour dreadatour requested a review from Copilot November 17, 2025 15:41
@dreadatour dreadatour self-assigned this Nov 17, 2025
@dreadatour dreadatour marked this pull request as draft November 17, 2025 15:42
Copilot finished reviewing on behalf of dreadatour November 17, 2025 15:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves UDF error handling by removing the intermediate process_safe wrapper and allowing raw exceptions from UDFs to propagate to the parent process, while adding proper stacktrace support for parallel execution.

Key Changes

  • Introduced UdfRunError exception class to wrap errors from parallel UDF execution with stacktrace information
  • Removed the process_safe method that was wrapping exceptions in DataChainError, allowing raw exceptions to propagate
  • Updated exception handling in parallel workers to capture and propagate stacktraces
  • Updated tests to expect the actual exception types raised by UDFs instead of DataChainError

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/datachain/lib/udf.py Introduced UdfRunError class, removed process_safe wrapper method and unused imports, replaced all process_safe calls with direct process calls
src/datachain/query/dispatch.py Added stacktrace capture in worker error handling and UdfRunError wrapping for parallel execution failures
tests/unit/lib/test_checkpoints.py Updated tests to expect CustomMapperError instead of DataChainError, added custom exception class for testing
tests/func/test_udf.py Updated test to expect RuntimeError instead of DataChainError for UDF execution errors
tests/func/test_to_database.py Updated test to expect ValueError instead of DataChainError for processing failures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datachain/query/dispatch.py 40.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dreadatour dreadatour force-pushed the job-errors-observability branch from f166140 to 0f85aaa Compare November 18, 2025 01:53
@dreadatour dreadatour force-pushed the job-errors-observability branch from 6a51332 to ba760ea Compare November 20, 2025 04:05
Copy link
Contributor Author

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few highlights for the changes in this PR.

return self.__class__, (self.message,)


class UdfRunError(Exception):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new exception type that is caught by the parent process, capturing both the traceback and the UDF name for display in the UI.

Comment on lines -315 to -327
def process_safe(self, obj_rows):
try:
result_objs = self.process(*obj_rows)
except Exception as e: # noqa: BLE001
msg = f"============== Error in user code: '{self.name}' =============="
print(msg)
exc_type, exc_value, exc_traceback = sys.exc_info()
traceback.print_exception(exc_type, exc_value, exc_traceback.tb_next)
print("=" * len(msg))
raise DataChainError(
f"Error in user code in class '{self.name}': {e!s}"
) from None
return result_objs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to catch and log this here — the parent process now catches the exception and handles logging and user-facing output.

{
"status": FAILED_STATUS,
"exception": e,
"stacktrace": traceback.format_exc(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send also the traceback.

raise exc
if isinstance(exc, KeyboardInterrupt):
raise exc
raise UdfRunError(exc, stacktrace=result.get("stacktrace"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UdfRunError wrapper for exc exception is needed to preserve the traceback.


# The export should fail during processing, after table creation
with pytest.raises(dc.DataChainError, match="Processing failed on third item"):
with pytest.raises(ValueError, match="Processing failed on third item"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct exception is now raised.

Comment on lines +8 to +13
class CustomMapperError(Exception):
pass


def mapper_fail(num: int) -> int:
raise CustomMapperError("Error")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to check if we are raising and catching any exceptions as is.

@dreadatour dreadatour marked this pull request as ready for review November 20, 2025 11:10
Copilot finished reviewing on behalf of dreadatour November 20, 2025 11:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ref: ${{ env.STUDIO_BRANCH }}
token: ${{ secrets.ITERATIVE_STUDIO_READ_ACCESS_TOKEN }}

- uses: astral-sh/setup-uv@v7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this trick to test datachain and studio related changes from the git branch in case both repos have branches with the same name.

  • Here, in datachain, we are already checking out changes from the desired branch.
  • Next, we do check if Studio have the branch with the same name — we will checkout the same branch from Studio.
  • But Studio itself have datachain as a dependency, and we need to make sure it will use the same datachain branch as in this PR.

Before Studio tests in datachain repo was broken sometimes because of this — rarely, only if API changes, but still. Same happens in this PR.

This change is not related directly to this PR, let me separate it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants