Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Unit Test Failures #4485

Open
ShohamBit opened this issue Jan 12, 2025 · 6 comments
Open

Intermittent Unit Test Failures #4485

ShohamBit opened this issue Jan 12, 2025 · 6 comments
Labels

Comments

@ShohamBit
Copy link
Collaborator

Description

While running make test-unit on the main branch to validate my PR, I noticed that some tests are failing intermittently. This issue appears to affect the following tests:

  • TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector
  • Test_setupOutput/happy_path_with_tracee_event_and_default_output

I observed that sometimes one of these tests passes while the other fails, indicating an inconsistency in the test suite.

Steps to Reproduce

  1. Start the Docker environment:

    make -f ./builder/Makefile.tracee-make ubuntu-shell
  2. Run the unit tests:

    make test-unit 

Expected Behavior

All unit tests should pass consistently.

Actual Behavior

The tests mentioned above fail intermittently, producing the following output:

=== NAME  TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector
    engine_test.go:380: 
        	Error Trace:	/tracee/pkg/signatures/engine/engine_test.go:380
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector
        	Messages:   	happy path - with one matching selector
=== RUN   TestEngine_LoadSignature/load_two_signatures
--- PASS: TestEngine_LoadSignature (0.02s)
    --- PASS: TestEngine_LoadSignature/load_one_signature (0.01s)
    --- PASS: TestEngine_LoadSignature/load_two_signatures (0.00s)
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_no_matching_event_selector
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_all_events_selector
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_all_events_selector,_no_name
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_including_event_origin_from_container
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_with_mismatching_event_origin_from_container
=== RUN   TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_including_event_origin_from_host
=== RUN   TestEngine_ConsumeSources/sad_path_-_with_all_events_selector,_no_source
=== RUN   TestEngine_ConsumeSources/sad_path_-_signature_init_fails
=== RUN   TestEngine_ConsumeSources/sad_path_-_getSelectedEvents_returns_an_error
=== RUN   TestEngine_ConsumeSources/sad_path_-_getMetadata_returns_an_error
=== RUN   TestEngine_ConsumeSources/happy_path_-_signature_receives_a_non_tracee_event
=== RUN   TestEngine_ConsumeSources/happy_path_-_signature_with_partial_selector_and_non_tracee_source
--- FAIL: TestEngine_ConsumeSources (14.14s)
    --- FAIL: TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector (1.00s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_no_matching_event_selector (1.19s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_all_events_selector (1.23s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_all_events_selector,_no_name (1.00s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_including_event_origin_from_container (1.53s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_with_mismatching_event_origin_from_container (1.00s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_with_one_matching_selector_including_event_origin_from_host (1.02s)
    --- PASS: TestEngine_ConsumeSources/sad_path_-_with_all_events_selector,_no_source (1.01s)
    --- PASS: TestEngine_ConsumeSources/sad_path_-_signature_init_fails (1.01s)
    --- PASS: TestEngine_ConsumeSources/sad_path_-_getSelectedEvents_returns_an_error (1.02s)
    --- PASS: TestEngine_ConsumeSources/sad_path_-_getMetadata_returns_an_error (1.01s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_signature_receives_a_non_tracee_event (1.08s)
    --- PASS: TestEngine_ConsumeSources/happy_path_-_signature_with_partial_selector_and_non_tracee_source (1.00s)
FAIL
coverage: 11.9% of statements
FAIL	github.com/aquasecurity/tracee/pkg/signatures/engine	14.765s
=== NAME   Test_setupOutput/happy_path_with_tracee_event_and_default_output
    output_test.go:115: 
            Error Trace:     /tracee/cmd/tracee-rules/output_test.go:115
                                    /tracee/cmd/tracee-rules/output_test.go:104
            Error:          Received unexpected error:
                            parsing time "" as "2006-01-02T15:04:05Z": cannot parse "" as "2006"
            Test:           Test_setupOutput/happy_path_with_tracee_event_and_default_output
            Messages:       happy path with tracee event and default output
=== RUN   Test_setupOutput/happy_path_with_tracee_event_and_simple_custom_output_template
=== CONT  Test_sendToWebhook/sad_path,_no_--webhook-template_flag_specified
=== CONT  Test_sendToWebhook/sad_path,_error_reaching_webhook
{"level":"warn","ts":1736716196.6274612,"msg":"Content-type was not set for the custom template: templates/simple.tmpl"}
=== RUN   Test_setupOutput/sad_path_with_unknown_event
{"level":"warn","ts":1736716196.6328578,"msg":"Unsupported event detected: something wrong"}
=== RUN   Test_setupOutput/sad_path_with_invalid_custom_template
--- PASS: Test_sendToWebhook (0.21s)
    --- PASS: Test_sendToWebhook/sad_path,_with_an_invalid_template (0.06s)
    --- PASS: Test_sendToWebhook/sad_path,_with_missing_template (0.06s)
    --- PASS: Test_sendToWebhook/sad_path,_no_--webhook-template_flag_specified (0.02s)
    --- PASS: Test_sendToWebhook/sad_path,_error_reaching_webhook (0.03s)
    --- PASS: Test_sendToWebhook/happy_path_with_functions_from_sprig_template (0.14s)
{"level":"error","ts":1736716196.6840057,"msg":"Writing to output: template: broken.tmpl:1:3: executing \"broken.tmpl\" at <.InvalidField>: can't evaluate field InvalidField in type *detect.Finding"}
--- FAIL: Test_setupOutput (0.36s)
    --- FAIL: Test_setupOutput/happy_path_with_tracee_event_and_default_output (0.24s)
    --- PASS: Test_setupOutput/happy_path_with_tracee_event_and_simple_custom_output_template (0.05s)
    --- PASS: Test_setupOutput/sad_path_with_unknown_event (0.01s)
    --- PASS: Test_setupOutput/sad_path_with_invalid_custom_template (0.05s)
FAIL
coverage: 12.4% of statements
FAIL     [github.com/aquasecurity/tracee/cmd/tracee-rules [invalid URL removed]     1.090s

Output of tracee version:

Tracee version: main-4cdea40dce

Output of uname -a:

Linux shoham 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Additional details

  • Potential Impact: This intermittent test failure could indicate an underlying race condition or timing issue within the codebase, potentially leading to unpredictable behavior in Tracee.
  • Troubleshooting Steps: I have not yet been able to pinpoint the exact cause of this failure. Further investigation is needed to determine if it's related to specific environment settings, resource constraints, or concurrency issues.
  • Call to Action: It would be beneficial to have someone with more familiarity with the Tracee codebase review these tests and investigate the potential causes of this intermittent failure.
@ShohamBit ShohamBit changed the title Test_setupOutput/happy_path_with_tracee_event_and_default_output fails Intermittent Unit Test Failures Jan 12, 2025
@geyslan
Copy link
Member

geyslan commented Jan 13, 2025

Tks @ShohamBit.

@NDStrahilevitz if you're able (time) would you mind checking those?

The first is already documented:

// FIXME We should use another Go concurrency pattern to make this test logically correct.
// The gotNumEvents variable is causing data race, because it's accessed
// by two goroutines. Temporary workaround is to change from int to uint32 and
// use atomic.AddUint32 and atomic.LoadUint32 methods.
// go test -v -run=TestConsumeSources -race ./pkg/signatures/engine/...

tracee-rules one is also well known, but it was decided to leave it as is as a reminder of its deprecation (expected to be removed soon).

@ShohamBit
Copy link
Collaborator Author

@geyslan could it be that there is also an issue in the TestMetricsandPprofExist in performance tests?

@geyslan
Copy link
Member

geyslan commented Jan 13, 2025

@geyslan could it be that there is also an issue in the TestMetricsandPprofExist in performance tests?

Have you seen something new of it?

The last were:

#4281
#3993

@ShohamBit
Copy link
Collaborator Author

no, I didn't see any change. I checked the performance tests in the main and they failed his is the output of make test-performance:

-test.shuffle 1736783919560271707
=== RUN   TestMetricsandPprofExist
    metrics_test.go:121: tracee timedout to start
    panic.go:626: found unexpected goroutines:
        [Goroutine 9 in state select, with github.com/aquasecurity/tracee/tests/testutils.ExecCmdBgWithSudoAndCtx.func2 on top of the stack:
        github.com/aquasecurity/tracee/tests/testutils.ExecCmdBgWithSudoAndCtx.func2(0xc0003f6ee8)
        	/tracee/tests/testutils/exec.go:165 +0xdd
        created by github.com/aquasecurity/tracee/tests/testutils.ExecCmdBgWithSudoAndCtx in goroutine 7
        	/tracee/tests/testutils/exec.go:164 +0x866
         Goroutine 19 in state syscall, with syscall.Syscall6 on top of the stack:
        syscall.Syscall6(0xf7, 0x1, 0x2c394, 0xc000082d90, 0x1000004, 0x0, 0x0)
        	/home/tracee/go/pkg/mod/golang.org/[email protected]/src/syscall/syscall_linux.go:91 +0x45
        os.(*Process).blockUntilWaitable(0xc0000d4000)
        	/home/tracee/go/pkg/mod/golang.org/[email protected]/src/os/wait_waitid.go:32 +0x9f
        os.(*Process).wait(0xc0000d4000)
        	/home/tracee/go/pkg/mod/golang.org/[email protected]/src/os/exec_unix.go:22 +0x4b
        os.(*Process).Wait(...)
        	/home/tracee/go/pkg/mod/golang.org/[email protected]/src/os/exec.go:134
        os/exec.(*Cmd).Wait(0xc000337b80)
        	/home/tracee/go/pkg/mod/golang.org/[email protected]/src/os/exec/exec.go:897 +0xb9
        github.com/aquasecurity/tracee/tests/testutils.ExecCmdBgWithSudoAndCtx.func1.1()
        	/tracee/tests/testutils/exec.go:143 +0x8f
        created by github.com/aquasecurity/tracee/tests/testutils.ExecCmdBgWithSudoAndCtx.func1 in goroutine 8
        	/tracee/tests/testutils/exec.go:140 +0x31a
        ]
--- FAIL: TestMetricsandPprofExist (10.99s)
FAIL
FAIL	github.com/aquasecurity/tracee/tests/perftests	71.452s
FAIL

@geyslan
Copy link
Member

geyslan commented Jan 13, 2025

no, I didn't see any change. I checked the performance tests in the main and they failed his is the output of make test-performance:

It seems a flaky test, see that tracee timedout in your env: metrics_test.go:121: tracee timedout to start. A possible fix is by undoing all the test steps on the test and, for sure, increase the timeout.

@ShohamBit
Copy link
Collaborator Author

no, I didn't see any change. I checked the performance tests in the main and they failed his is the output of make test-performance:

It seems a flaky test, see that tracee timedout in your env: metrics_test.go:121: tracee timedout to start. A possible fix is by undoing all the test steps on the test and, for sure, increase the timeout.

let me check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants