Skip to content

Test server sometimes fails to include signal in first WFT #2127

Open
@dandavison

Description

@dandavison

Using the Python SDK, I did

  1. handle = await start_workflow()
  2. await handle.signal()
  3. run worker

Expected Behavior

I expect Python to process a signal_workflow job and then a start_workflow in the activation for the first WFT.

Actual Behavior

Nearly always, we see the expected behavior. Occasionally (on macos-intel builds) Python processes a start_workflow activation job first. Almost certainly this is because the first WFT has no signal in it, although I have not yet investigated further and actually proved that (the test in question exits immediately if it sees start_workflow before signal_workflow).

Steps to Reproduce the Problem

Run the sdk-python test tests/worker/test_workflow.py::test_unfinished_signal_handler_with_workflow_failure applying job under --workflow-environment=time-skipping multiple times on a GitHub macos-intel runner until you see this failure.

Note: There are two variants of the python test; one involves the workflow throwing ApplicationError, and the other involves the client sending a cancel request, again before starting the worker. Interestingly, I've only seen the error described in this ticket for the ApplicationError variant of the test, suggesting that handling the cancel request somehow causes the test server to include all of them in the first WFT, whereas without the cancel request sometimes the signal event is omitted.

See failures in build history of temporalio/sdk-python#556

Metadata

Metadata

Assignees

No one assigned

    Labels

    test serverRelated to the test server

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions