Skip to content

docs(policy)_: Reintroduced the testing policy for full review #6413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions _docs/policies/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# `status-go` Test Policy

## Purpose

This policy defines the requirements and guidelines for testing within the `status-go` repository. It aims to ensure all new functionality is introduced with robust, reliable, and repeatable tests
while preventing flaky tests from being merged into the codebase. This aligns with Policy Zero’s principles of transparency, collaboration, and quality assurance in the development process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while preventing flaky tests from being merged into the codebase

In my experience, many flaky tests are tricky and end up in the main branch because they resist verification attempts. Beyond addressing flaky tests during PRs, an important process is to define how/who/when to handle flaky tests after they infect the main branch. But perhaps the details of such a process can be defined in a non-policy doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most lines don't have spurious line breaks, but some do like this one. For consistency with the other policy introduce-policy.md, it's better to fill all paragraphs at ~80 chars.


## Submitting and Maintaining the Test Policy

- Any individual MAY propose amendments to this policy.
- Proposed changes MUST be submitted as a pull request (PR) to the `_docs/policies` directory in the `status-go` repository.
- The PR MUST include a brief justification for the amendment, answering: "Why is this change needed?"
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero.
Comment on lines +8 to +13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, as this is defined in the Policy Zero.


## Creating Tests

- All new functionality MUST be introduced with tests that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to the quality of tests, there are other characteristics not covered in the policy and we could go in more detail, but then, by going in more detail, I think we are already failing to use policies adequately. The policies are not great documents to iterate in my opinion and are too rigid and require too much coordination.

A few other characteristics:

  • How sociable is the test?
  • How does the test perform?
  • How is the test structured in terms of arrange/act/assert?
  • How much coverage should a test provide at a minimum?
  • Is the test written using the best test pyramid level? Example: could the integration test be rewritten to be one or more unit tests, or a combination of different levels?
  • Is the test verifying possible security threats?
  • This can be achieved using the -count or -test.count flag with the test command, e.g., -count 1000 / -test.count 1000.

An example (to me) of something that seems too detailed to live in the policy.

- Prove that the functionality performs as described
- Can be falsified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be falsified

I think it can be elaborated. Maybe it's just me, but I think these 2 possible interpretations are different:

  • Tests SHOULD not only verify the correct ("happy path"). Edge-case and failure-scenario testing proactively validates the new feature resilience to unusual or invalid input

  • Falsifiable testing retroactively ensures that a bug is demonstrably captured by a test, ensuring that fixes address concrete, reproducible problems rather than hypothetical scenarios.

- Are resistant to fuzzing
Comment on lines +18 to +20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think we need to specify the quality of the tests. I find our developers reasonable enough to implement tests that makes sense and cover the feature. This basically means "be a good developer, write good tests" 😄 We are good developers here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the point is: IMO just specifying that "the new functionality MUST be covered with tests" is enough here.

- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know functional tests that well, but integration tests are the usual culprits of flakiness, so either this is a swap or we need to add integration there too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably apply to unit/integration (go) tests, not functional (python). The latter have not yet been noted to be flaky, so I wouldn't apply such strict rules there.

We have tests types definition here: status-go/Test Types.

- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary detail

Suggested change
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- This can be achieved using the `-count` or `-test.count` flag with the `go test` command

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, there's also stress tool, which is helpful: status-go/Detect flaky tests

- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the PR author prove the N number of test runs were executed? Should the CI be responsible for detecting new functional tests and increase their test count by default for the duration of the PR? Is it always feasible to mandate a minimum of 1000 test runs when some tests can take a significant amount of time to finish?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a CI check would be nice, but the effort to implement it might not be worth the results.

There're 3 cases when a test can become flaky, and each reason requires different effort to implement a CI check:

Reason How to CI check Implementation difficulty
(out of 5)
Warnings
When implementing a new test Check if this test exists on develop 🔴 🔴 -
When changing and old test code Check tests diff and find amended tests 🔴 🔴 🔴 🔴 -
When changing tested code (without changing the test itself) Use external tools (e.g. Sentry Carryforward flags) 🔴 🔴 🔴 🔴 Some code may be used in ALL tests, which would make CI to verify all tests a 1000 times

I think we could some day implement (1),
but I would not bother with (2),
and there will always be (3), which is wrong to do (note the warning above).

👉 So for now I believe it's fine to require devs to run new/amended test locally and perhaps attach the execution report to the PR description/comment. At the same time teach devs to prevent flaky tests, so that we won't even need to run tests a 1000 times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a feeling CI detection would be tricky, but I wasn't expecting such a detailed answer. Thanks @igor-sirotin 💯 I completely agree with your conclusions.

One idea to improve the effectiveness of the test suite is to randomly select a subset of tests and increase their run count (randomly as well if we like). Essentially, we'd extend the total test run time by a controlled amount. This would raise the chances of catching flaky tests before they hit develop. As the suite gets more stable, we could gradually reduce the randomness factor. We could also increase the number of affected tests by the random selection in CI by using dedicated jobs (e.g. twice per day with 10% of the test suite running between 10-200 times).

Is the idea above practical and perhaps straightforward?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests locally.

- `TODO` Add link to issue for CI automation of validation test runs of new `functional tests`.
- Ensuring that the test passes consistently every time gives confidence that the test is not flaky.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know one source of flakyness is external network dependency. I wonder if we can just define in status-go that certain tests (e.g. within a certain standardized directory like ./unit/ or following a specific naming pattern) should automatically block outbound traffic. I wonder if a hard block using a firewall in the CI could be practical, but it would be better I think if we could do from Go directly.

I'm just looking for ways to rely less on conventions and trust.


## Flaky Tests

Flaky tests are defined as tests that fail intermittently.

- All flaky tests / failing tests MUST be resolved.
- No flaky tests may be introduced into the codebase.

### Steps to Resolving or Reporting Flaky Tests

#### Is it Me?
Determine who caused the flaky test.

- Is a new test you’ve written flaky or failing?
- It was you.
- You MUST fix the test before merge is acceptable.
- Has an existing test become flaky?
- Check rerun reports. `TODO` Add link to rerun reports.
- If the test does not appear in https://github.com/status-im/status-go/labels/E%3AFlaky%20Test or in the last three nightly test runs, it is most likely that the flakiness was introduced by
your changes and needs to be addressed before proceeding with the merge.
- Otherwise, the test is already documented as a flaky test (appears in the GitHub issues or in the nightly test runs), proceed to below.

```mermaid
flowchart TB
A([PR ready for merge]) --> B{Have any test failed?}
B -->|No| C[🎉 Proceed with merge 🪄]
B -->|Yes| D{
Is the failing test introduced
or altered by this PR?
}
D -->|No| E[Check rerun reports.]
D -->|Yes| F[
It is likely your changes introduced the flakiness.
You MUST fix the test before merge is acceptable.
]
F --> A
E --> G{Does the test appear in `E:Flaky Test` issues<br/> or in the last three nightly test runs?<br/>}
G -->|Yes| I[The flakiness needs reporting]
G -->|No| F
I --> J([Proceed to Reporting flow])
```

#### Reporting Flaky Tests
If an old test fails and/or seems flaky either locally or in CI, you MUST report the event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lack of a blank line after the headline intentional? The other one at level 2 ## Flaky Tests above has a blank line.


- Check the `status-go` GitHub repo issues for the test name(s) failing.
- If the test appears in the list of flaky test issues:
- If the issue is open:
- Add a comment to the issue.
- Detail that you have experienced the test being flaky and in what context (local vs CI, link to the PR or branch).
- If the issue is closed:
- Reopen the issue OR create a new issue referencing the previous issue.
- Either is fine, use your best judgment in this case.
- Detail that you have experienced the test being flaky and in what context (local vs CI, link to the PR or branch).
- If the test does not appear in the list of flaky test issues:
- Create a new issue.
- The issue title should include the flaky test name.
- The issue should use the https://github.com/status-im/status-go/labels/E%3AFlaky%20Test label.
- Detail that you have experienced the test being flaky and in what context (local vs CI, link to the PR or branch).

```mermaid
flowchart TB
A([Ready to report a flaky test]) --> B[Check the `status-go` GitHub repo<br/>issues for the test name failing.]
B --> C{Does the test appear in<br/>the list of `E: Flaky Test` issues?}
C -->|No| D[
Create a new issue
- The issue title should include the flaky test name
- The issue should use the `E:Flaky Test` label
]
D --> E[
Detail which test is flaky and in what context:
local vs CI, link to the PR or branch.
]
E --> J
C -->|Yes| F{Is the issue open?}
F -->|No| G((Either))
H --> E
G --> I[Reopen the issue]
G --> D
I --> H
F -->|Yes| H[Add a comment to the issue]
J([End])
```

## Policy Amendments and Archival

- Amendments:
- This policy MAY be amended at any time.
- Amendments MUST be submitted via a PR to the `status-go` repository.
- Archival:
- This policy MAY be archived if it becomes obsolete or replaced by newer policies.
- Archival MUST be performed by submitting a PR that moves the policy to `_docs/policies/archived`.
- The PR MUST include a rationale for the proposed amendment or archival in the PR description.
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero.
Comment on lines +110 to +119
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I believe is also redundant, as it is specified in the policy zero, which applies to all policies.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two extra lines.