Skip to content

docs(policy)_: Reintroduced the testing policy for full review #6413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Samyoul
Copy link
Member

@Samyoul Samyoul commented Mar 12, 2025

PR Description: Reintroducing the status-go Testing Policy

Summary

This PR reintroduces the status-go Testing Policy, providing clear guidelines for test creation, validation, and handling flaky tests. The policy has been updated to align with Policy Zero, ensuring a structured, transparent, and inclusive approach to test management in the repository.

Background

The status-go Testing Policy previously existed in the repository but was removed to allow all Core Contributors (CCs) to review, discuss, and provide input before formal reintroduction. This aligns with Policy Zero, which establishes a framework for collaboratively developing policies in key areas of status-go development. The goal is to ensure that all policies reflect the needs and perspectives of contributors across the project.

What's Changed?*

  • Reintroduces clear guidelines for creating tests, ensuring they are falsifiable, reliable, and resistant to fuzzing.
  • Defines flaky tests and mandates their resolution before merging.
  • Introduces structured steps for identifying, resolving, and reporting flaky tests.
  • Ensures compliance with Policy Zero’s submission, review, and approval processes.
  • Includes mermaid flowcharts for visual guidance on test resolution and reporting.

Why Change?

  • Establishes a consistent testing framework for status-go.
  • Prevents flaky or unreliable tests from being merged into the codebase.
  • Ensures that testing policies are maintained in an inclusive and transparent manner, following Policy Zero’s principles.
  • Gives all Core Contributors (CCs) an opportunity to engage with and shape this policy.

@Samyoul Samyoul self-assigned this Mar 12, 2025
@Samyoul Samyoul added the policy label Mar 12, 2025
@status-im-auto
Copy link
Member

status-im-auto commented Mar 12, 2025

Jenkins Builds

Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 8f0b579 #1 2025-03-12 14:16:08 ~3 min ios 📦zip
✔️ 8f0b579 #1 2025-03-12 14:16:16 ~3 min android 📦aar
✔️ 8f0b579 #1 2025-03-12 14:17:43 ~4 min macos 📦zip
✔️ 8f0b579 #1 2025-03-12 14:18:17 ~5 min macos 📦zip
✔️ 8f0b579 #1 2025-03-12 14:18:42 ~5 min windows 📦zip
✔️ 8f0b579 #1 2025-03-12 14:18:50 ~5 min linux 📦zip
✔️ 8f0b579 #1 2025-03-12 14:33:58 ~20 min tests-rpc 📄log
✔️ 8f0b579 #1 2025-03-12 14:49:22 ~36 min tests 📄log

Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.81%. Comparing base (068537e) to head (8f0b579).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6413      +/-   ##
===========================================
- Coverage    60.83%   60.81%   -0.02%     
===========================================
  Files          874      874              
  Lines       112420   112420              
===========================================
- Hits         68389    68368      -21     
- Misses       36085    36101      +16     
- Partials      7946     7951       +5     
Flag Coverage Δ
functional 23.03% <ø> (+0.01%) ⬆️
unit 59.09% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 29 files with indirect coverage changes


- All new functionality MUST be introduced with tests that:
- Prove that the functionality performs as described
- Can be falsified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be falsified

I think it can be elaborated. Maybe it's just me, but I think these 2 possible interpretations are different:

  • Tests SHOULD not only verify the correct ("happy path"). Edge-case and failure-scenario testing proactively validates the new feature resilience to unusual or invalid input

  • Falsifiable testing retroactively ensures that a bug is demonstrably captured by a test, ensuring that fixes address concrete, reproducible problems rather than hypothetical scenarios.

@Samyoul
Copy link
Member Author

Samyoul commented Mar 13, 2025

Hey, as we refine this Testing Policy, I wanted to raise some questions for further discussion. Let me know what you think?

API level Functional testing

Should API-level functional tests be explicitly included in this policy?
Right now, the policy covers functional testing in general, requiring tests to be falsifiable, reliable, and resistant to fuzzing, but it doesn’t explicitly differentiate API-level functional tests from other types of functional tests (e.g., integration or system tests).

Should API-level functional tests have their own specific requirements in this policy?
Are there any different considerations for API-level tests compared to other functional tests?
Do we need additional tooling, automation, or CI support to ensure API tests are properly covered in this policy?

Performance & Load Testing

Should we introduce any policy items about performance benchmarks for functional tests, particularly API tests?
Should functional tests measure response times and resource consumption, or should that be handled separately in a benchmarking policy for example?

Handling Test Debt & Maintenance

Do we need a process for reviewing and cleaning up outdated tests? 😬
How do we prevent an accumulation of unreliable or useless tests over time?

Fuzzing & Security Considerations

This policy mentions resistance to fuzzing, but should we also consider security testing within our tests?
Are we able to mock threats and attacks that are automatable via testing?
Are there specific tooling or best practices we should standardise for security resilience?

Automation & CI Integration

Should we mention in this policy the CI/CD automation steps currently in place that enforce test coverage?
Should we define requirements for test parallelisation, isolation, or dependencies?

Flaky Test Management Process

Should we set a maximum timeframe for how long a test can remain flaky before it must be resolved or rewritten?

Copy link
Member

@jrainville jrainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this back

- Prove that the functionality performs as described
- Can be falsified
- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know functional tests that well, but integration tests are the usual culprits of flakiness, so either this is a swap or we need to add integration there too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably apply to unit/integration (go) tests, not functional (python). The latter have not yet been noted to be flaky, so I wouldn't apply such strict rules there.

We have tests types definition here: status-go/Test Types.

Copy link
Contributor

@ilmotta ilmotta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a welcome policy!

My impression while reading it is that it contains a bit too much detail, which leaves room for more amendments, something we mentioned in the introductory policy that we'd like to minimize.

For example, I don't think we need the policy to describe in detail how to resolve or report flaky tests (like in section Steps to Resolving or Reporting Flaky Tests). Another approach could be to link to external guidelines that can evolve independently of the policy. Essentially, I'm questioning what belongs in the policy versus what should live in non-policy docs.

## Purpose

This policy defines the requirements and guidelines for testing within the `status-go` repository. It aims to ensure all new functionality is introduced with robust, reliable, and repeatable tests
while preventing flaky tests from being merged into the codebase. This aligns with Policy Zero’s principles of transparency, collaboration, and quality assurance in the development process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while preventing flaky tests from being merged into the codebase

In my experience, many flaky tests are tricky and end up in the main branch because they resist verification attempts. Beyond addressing flaky tests during PRs, an important process is to define how/who/when to handle flaky tests after they infect the main branch. But perhaps the details of such a process can be defined in a non-policy doc.

- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the PR author prove the N number of test runs were executed? Should the CI be responsible for detecting new functional tests and increase their test count by default for the duration of the PR? Is it always feasible to mandate a minimum of 1000 test runs when some tests can take a significant amount of time to finish?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a CI check would be nice, but the effort to implement it might not be worth the results.

There're 3 cases when a test can become flaky, and each reason requires different effort to implement a CI check:

Reason How to CI check Implementation difficulty
(out of 5)
Warnings
When implementing a new test Check if this test exists on develop 🔴 🔴 -
When changing and old test code Check tests diff and find amended tests 🔴 🔴 🔴 🔴 -
When changing tested code (without changing the test itself) Use external tools (e.g. Sentry Carryforward flags) 🔴 🔴 🔴 🔴 Some code may be used in ALL tests, which would make CI to verify all tests a 1000 times

I think we could some day implement (1),
but I would not bother with (2),
and there will always be (3), which is wrong to do (note the warning above).

👉 So for now I believe it's fine to require devs to run new/amended test locally and perhaps attach the execution report to the PR description/comment. At the same time teach devs to prevent flaky tests, so that we won't even need to run tests a 1000 times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a feeling CI detection would be tricky, but I wasn't expecting such a detailed answer. Thanks @igor-sirotin 💯 I completely agree with your conclusions.

One idea to improve the effectiveness of the test suite is to randomly select a subset of tests and increase their run count (randomly as well if we like). Essentially, we'd extend the total test run time by a controlled amount. This would raise the chances of catching flaky tests before they hit develop. As the suite gets more stable, we could gradually reduce the randomness factor. We could also increase the number of affected tests by the random selection in CI by using dedicated jobs (e.g. twice per day with 10% of the test suite running between 10-200 times).

Is the idea above practical and perhaps straightforward?

- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
- `TODO` Add link to issue for CI automation of validation test runs of new `functional tests`.
- Ensuring that the test passes consistently every time gives confidence that the test is not flaky.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know one source of flakyness is external network dependency. I wonder if we can just define in status-go that certain tests (e.g. within a certain standardized directory like ./unit/ or following a specific naming pattern) should automatically block outbound traffic. I wonder if a hard block using a firewall in the CI could be practical, but it would be better I think if we could do from Go directly.

I'm just looking for ways to rely less on conventions and trust.

```

#### Reporting Flaky Tests
If an old test fails and/or seems flaky either locally or in CI, you MUST report the event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lack of a blank line after the headline intentional? The other one at level 2 ## Flaky Tests above has a blank line.

- Archival MUST be performed by submitting a PR that moves the policy to `_docs/policies/archived`.
- The PR MUST include a rationale for the proposed amendment or archival in the PR description.
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two extra lines.

## Purpose

This policy defines the requirements and guidelines for testing within the `status-go` repository. It aims to ensure all new functionality is introduced with robust, reliable, and repeatable tests
while preventing flaky tests from being merged into the codebase. This aligns with Policy Zero’s principles of transparency, collaboration, and quality assurance in the development process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most lines don't have spurious line breaks, but some do like this one. For consistency with the other policy introduce-policy.md, it's better to fill all paragraphs at ~80 chars.


## Creating Tests

- All new functionality MUST be introduced with tests that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to the quality of tests, there are other characteristics not covered in the policy and we could go in more detail, but then, by going in more detail, I think we are already failing to use policies adequately. The policies are not great documents to iterate in my opinion and are too rigid and require too much coordination.

A few other characteristics:

  • How sociable is the test?
  • How does the test perform?
  • How is the test structured in terms of arrange/act/assert?
  • How much coverage should a test provide at a minimum?
  • Is the test written using the best test pyramid level? Example: could the integration test be rewritten to be one or more unit tests, or a combination of different levels?
  • Is the test verifying possible security threats?
  • This can be achieved using the -count or -test.count flag with the test command, e.g., -count 1000 / -test.count 1000.

An example (to me) of something that seems too detailed to live in the policy.

Copy link
Collaborator

@igor-sirotin igor-sirotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Samyoul!

I believe it is essential for this policy to exist, and we should agree on common rules to introduce new tests, otherwise we will continue to introduce more and more flaky tests, and just keep skipping them when they become unbearably annoying.

P.S. This is just part of my review, I will come back with more comments on this. Just leaving this meanwhile.

Comment on lines +8 to +13
## Submitting and Maintaining the Test Policy

- Any individual MAY propose amendments to this policy.
- Proposed changes MUST be submitted as a pull request (PR) to the `_docs/policies` directory in the `status-go` repository.
- The PR MUST include a brief justification for the amendment, answering: "Why is this change needed?"
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, as this is defined in the Policy Zero.

Comment on lines +18 to +20
- Prove that the functionality performs as described
- Can be falsified
- Are resistant to fuzzing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think we need to specify the quality of the tests. I find our developers reasonable enough to implement tests that makes sense and cover the feature. This basically means "be a good developer, write good tests" 😄 We are good developers here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the point is: IMO just specifying that "the new functionality MUST be covered with tests" is enough here.

- Prove that the functionality performs as described
- Can be falsified
- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably apply to unit/integration (go) tests, not functional (python). The latter have not yet been noted to be flaky, so I wouldn't apply such strict rules there.

We have tests types definition here: status-go/Test Types.

- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a CI check would be nice, but the effort to implement it might not be worth the results.

There're 3 cases when a test can become flaky, and each reason requires different effort to implement a CI check:

Reason How to CI check Implementation difficulty
(out of 5)
Warnings
When implementing a new test Check if this test exists on develop 🔴 🔴 -
When changing and old test code Check tests diff and find amended tests 🔴 🔴 🔴 🔴 -
When changing tested code (without changing the test itself) Use external tools (e.g. Sentry Carryforward flags) 🔴 🔴 🔴 🔴 Some code may be used in ALL tests, which would make CI to verify all tests a 1000 times

I think we could some day implement (1),
but I would not bother with (2),
and there will always be (3), which is wrong to do (note the warning above).

👉 So for now I believe it's fine to require devs to run new/amended test locally and perhaps attach the execution report to the PR description/comment. At the same time teach devs to prevent flaky tests, so that we won't even need to run tests a 1000 times.

- Can be falsified
- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary detail

Suggested change
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- This can be achieved using the `-count` or `-test.count` flag with the `go test` command

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, there's also stress tool, which is helpful: status-go/Detect flaky tests

- Are resistant to fuzzing
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs.
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests locally.

Comment on lines +110 to +119
## Policy Amendments and Archival

- Amendments:
- This policy MAY be amended at any time.
- Amendments MUST be submitted via a PR to the `status-go` repository.
- Archival:
- This policy MAY be archived if it becomes obsolete or replaced by newer policies.
- Archival MUST be performed by submitting a PR that moves the policy to `_docs/policies/archived`.
- The PR MUST include a rationale for the proposed amendment or archival in the PR description.
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I believe is also redundant, as it is specified in the policy zero, which applies to all policies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants