Skip to content

live tests broken after nexus lockstep API change #9327

@davepacheco

Description

@davepacheco

I tried running the live tests on main and found:

root@oxz_switch:~# TMPDIR=/var/tmp ./cargo-nextest nextest run --profile=live-tests          --archive-file live-tests-archive/omicron-live-tests.tar.zst          --workspace-remap live-tests-archive
  Extracting 2 binaries, 1 build script output directory, and 5 linked paths to /var/tmp/nextest-archive-DdPJCV
   Extracted 79 files to /var/tmp/nextest-archive-DdPJCV in 2.62s
warning: this repository recommends nextest version 0.9.108, but the current version is 0.9.98
info: experimental features enabled: setup-scripts
------------
 Nextest run ID 22d911c0-349d-40de-8a33-0571c046ebb0 with nextest profile: live-tests
    Starting 2 tests across 2 binaries
        SLOW [> 60.000s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
        FAIL [ 102.539s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
  stdout ---

    running 1 test
    test test_nexus_add_remove has been running for over 60 seconds
    test test_nexus_add_remove ... FAILED

    failures:

    failures:
        test_nexus_add_remove

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 102.47s
    
  stderr ---
    log file: /var/tmp/test_nexus_add_remove-8099c2d419da57f2-test_nexus_add_remove.18929.0.log
    note: configured to log to "/var/tmp/test_nexus_add_remove-8099c2d419da57f2-test_nexus_add_remove.18929.0.log"
    note: using DNS server for subnet fd00:1122:3344::/48

    thread 'test_nexus_add_remove' panicked at live-tests/tests/test_nexus_add_remove.rs:165:6:
    new Nexus to be usable: TimedOut(90.02520805s)
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

  Cancelling due to test failure
------------
     Summary [ 102.563s] 1/2 tests run: 0 passed, 1 failed, 0 skipped
        FAIL [ 102.539s] omicron-live-tests::test_nexus_add_remove test_nexus_add_remove
warning: 1/2 tests were not run due to test failure (run with --no-fail-fast to run all tests, or run with --max-fail)
warning: this repository recommends nextest version 0.9.108, but the current version is 0.9.98
info: update nextest with cargo nextest self update, or bypass check with --override-version-check
error: test run failed

The log file ended with a bunch of:

17:40:24.532Z DEBG test_nexus_add_remove: client request
    body = None
    method = GET
    nexus_internal_url = http://[fd00:1122:3344:103::21]:12221
    uri = http://[fd00:1122:3344:103::21]:12221/sagas
17:40:24.533Z DEBG test_nexus_add_remove: client response
    nexus_internal_url = http://[fd00:1122:3344:103::21]:12221
    result = Ok(Response { url: "http://[fd00:1122:3344:103::21]:12221/sagas", status: 404, headers: {"content-type": "application/json", "x-request-id": "d739f6e6-9de7-4eb2-9260-c9d77f08b222", "content-length": "84", "date": "Fri, 31 Oct 2025 17:40:24 GMT"} })
17:40:24.533Z DEBG test_nexus_add_remove: waiting for new Nexus to be available: listing sagas: listing sagas: Error Response: status: 404 Not Found; headers: {"content-type": "application/json", "x-request-id": "d739f6e6-9de7-4eb2-9260-c9d77f08b222", "content-length": "84", "date": "Fri, 31 Oct 2025 17:40:24 GMT"}; value: Error { error_code: None, message: "Not Found", request_id: "d739f6e6-9de7-4eb2-9260-c9d77f08b222" }
17:40:24.585Z DEBG test_nexus_add_remove: client request
    body = None
    method = GET
    nexus_internal_url = http://[fd00:1122:3344:103::21]:12221
    uri = http://[fd00:1122:3344:103::21]:12221/sagas
17:40:24.586Z DEBG test_nexus_add_remove: client response
    nexus_internal_url = http://[fd00:1122:3344:103::21]:12221
    result = Ok(Response { url: "http://[fd00:1122:3344:103::21]:12221/sagas", status: 404, headers: {"content-type": "application/json", "x-request-id": "9f581f42-d0c0-4c20-ad7d-0eedd3a66f62", "content-length": "84", "date": "Fri, 31 Oct 2025 17:40:24 GMT"} })
17:40:24.586Z DEBG test_nexus_add_remove: waiting for new Nexus to be available: listing sagas: listing sagas: Error Response: status: 404 Not Found; headers: {"content-type": "application/json", "x-request-id": "9f581f42-d0c0-4c20-ad7d-0eedd3a66f62", "content-length": "84", "date": "Fri, 31 Oct 2025 17:40:24 GMT"}; value: Error { error_code: None, message: "Not Found", request_id: "9f581f42-d0c0-4c20-ad7d-0eedd3a66f62" }
17:40:24.667Z ERRO test_nexus_add_remove: Pool dropped without invoking `terminate`. qorb background tasks

It looks like test_nexus_add_remove.rs is explicitly using NEXUS_INTERNAL_PORT instead of NEXUS_LOCKSTEP_PORT to construct its clients. If that's all it is (testing it now), I've got a fix forthcoming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions