-
Notifications
You must be signed in to change notification settings - Fork 254
docs(policy)_: Reintroduced the testing policy for full review #6413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Jenkins Builds
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #6413 +/- ##
===========================================
- Coverage 60.83% 60.81% -0.02%
===========================================
Files 874 874
Lines 112420 112420
===========================================
- Hits 68389 68368 -21
- Misses 36085 36101 +16
- Partials 7946 7951 +5
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
||
- All new functionality MUST be introduced with tests that: | ||
- Prove that the functionality performs as described | ||
- Can be falsified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be falsified
I think it can be elaborated. Maybe it's just me, but I think these 2 possible interpretations are different:
-
Tests SHOULD not only verify the correct ("happy path"). Edge-case and failure-scenario testing proactively validates the new feature resilience to unusual or invalid input
-
Falsifiable testing retroactively ensures that a bug is demonstrably captured by a test, ensuring that fixes address concrete, reproducible problems rather than hypothetical scenarios.
Hey, as we refine this Testing Policy, I wanted to raise some questions for further discussion. Let me know what you think? API level Functional testingShould API-level functional tests be explicitly included in this policy? Should API-level functional tests have their own specific requirements in this policy? Performance & Load TestingShould we introduce any policy items about performance benchmarks for functional tests, particularly API tests? Handling Test Debt & MaintenanceDo we need a process for reviewing and cleaning up outdated tests? 😬 Fuzzing & Security ConsiderationsThis policy mentions resistance to fuzzing, but should we also consider security testing within our tests? Automation & CI IntegrationShould we mention in this policy the CI/CD automation steps currently in place that enforce test coverage? Flaky Test Management ProcessShould we set a maximum timeframe for how long a test can remain flaky before it must be resolved or rewritten? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this back
- Prove that the functionality performs as described | ||
- Can be falsified | ||
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know functional tests that well, but integration tests
are the usual culprits of flakiness, so either this is a swap or we need to add integration
there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably apply to unit/integration (go) tests, not functional (python). The latter have not yet been noted to be flaky, so I wouldn't apply such strict rules there.
We have tests types definition here: status-go/Test Types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a welcome policy!
My impression while reading it is that it contains a bit too much detail, which leaves room for more amendments, something we mentioned in the introductory policy that we'd like to minimize.
For example, I don't think we need the policy to describe in detail how to resolve or report flaky tests (like in section Steps to Resolving or Reporting Flaky Tests
). Another approach could be to link to external guidelines that can evolve independently of the policy. Essentially, I'm questioning what belongs in the policy versus what should live in non-policy docs.
## Purpose | ||
|
||
This policy defines the requirements and guidelines for testing within the `status-go` repository. It aims to ensure all new functionality is introduced with robust, reliable, and repeatable tests | ||
while preventing flaky tests from being merged into the codebase. This aligns with Policy Zero’s principles of transparency, collaboration, and quality assurance in the development process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while preventing flaky tests from being merged into the codebase
In my experience, many flaky tests are tricky and end up in the main branch because they resist verification attempts. Beyond addressing flaky tests during PRs, an important process is to define how/who/when to handle flaky tests after they infect the main branch. But perhaps the details of such a process can be defined in a non-policy doc.
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. | ||
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. | ||
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the PR author prove the N number of test runs were executed? Should the CI be responsible for detecting new functional tests and increase their test count by default for the duration of the PR? Is it always feasible to mandate a minimum of 1000 test runs when some tests can take a significant amount of time to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a CI check would be nice, but the effort to implement it might not be worth the results.
There're 3 cases when a test can become flaky, and each reason requires different effort to implement a CI check:
Reason | How to CI check | Implementation difficulty (out of 5) |
Warnings |
---|---|---|---|
When implementing a new test | Check if this test exists on develop |
🔴 🔴 | - |
When changing and old test code | Check tests diff and find amended tests | 🔴 🔴 🔴 🔴 | - |
When changing tested code (without changing the test itself) | Use external tools (e.g. Sentry Carryforward flags) | 🔴 🔴 🔴 🔴 | Some code may be used in ALL tests, which would make CI to verify all tests a 1000 times |
I think we could some day implement (1),
but I would not bother with (2),
and there will always be (3), which is wrong to do (note the warning above).
👉 So for now I believe it's fine to require devs to run new/amended test locally and perhaps attach the execution report to the PR description/comment. At the same time teach devs to prevent flaky tests, so that we won't even need to run tests a 1000 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a feeling CI detection would be tricky, but I wasn't expecting such a detailed answer. Thanks @igor-sirotin 💯 I completely agree with your conclusions.
One idea to improve the effectiveness of the test suite is to randomly select a subset of tests and increase their run count (randomly as well if we like). Essentially, we'd extend the total test run time by a controlled amount. This would raise the chances of catching flaky tests before they hit develop
. As the suite gets more stable, we could gradually reduce the randomness factor. We could also increase the number of affected tests by the random selection in CI by using dedicated jobs (e.g. twice per day with 10% of the test suite running between 10-200 times).
Is the idea above practical and perhaps straightforward?
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. | ||
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing. | ||
- `TODO` Add link to issue for CI automation of validation test runs of new `functional tests`. | ||
- Ensuring that the test passes consistently every time gives confidence that the test is not flaky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know one source of flakyness is external network dependency. I wonder if we can just define in status-go that certain tests (e.g. within a certain standardized directory like ./unit/
or following a specific naming pattern) should automatically block outbound traffic. I wonder if a hard block using a firewall in the CI could be practical, but it would be better I think if we could do from Go directly.
I'm just looking for ways to rely less on conventions and trust.
``` | ||
|
||
#### Reporting Flaky Tests | ||
If an old test fails and/or seems flaky either locally or in CI, you MUST report the event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the lack of a blank line after the headline intentional? The other one at level 2 ## Flaky Tests
above has a blank line.
- Archival MUST be performed by submitting a PR that moves the policy to `_docs/policies/archived`. | ||
- The PR MUST include a rationale for the proposed amendment or archival in the PR description. | ||
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two extra lines.
## Purpose | ||
|
||
This policy defines the requirements and guidelines for testing within the `status-go` repository. It aims to ensure all new functionality is introduced with robust, reliable, and repeatable tests | ||
while preventing flaky tests from being merged into the codebase. This aligns with Policy Zero’s principles of transparency, collaboration, and quality assurance in the development process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most lines don't have spurious line breaks, but some do like this one. For consistency with the other policy introduce-policy.md
, it's better to fill all paragraphs at ~80 chars.
|
||
## Creating Tests | ||
|
||
- All new functionality MUST be introduced with tests that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to the quality of tests, there are other characteristics not covered in the policy and we could go in more detail, but then, by going in more detail, I think we are already failing to use policies adequately. The policies are not great documents to iterate in my opinion and are too rigid and require too much coordination.
A few other characteristics:
- How sociable is the test?
- How does the test perform?
- How is the test structured in terms of arrange/act/assert?
- How much coverage should a test provide at a minimum?
- Is the test written using the best test pyramid level? Example: could the integration test be rewritten to be one or more unit tests, or a combination of different levels?
- Is the test verifying possible security threats?
- This can be achieved using the
-count
or-test.count
flag with the test command, e.g.,-count 1000
/-test.count 1000
.
An example (to me) of something that seems too detailed to live in the policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Samyoul!
I believe it is essential for this policy to exist, and we should agree on common rules to introduce new tests, otherwise we will continue to introduce more and more flaky tests, and just keep skipping them when they become unbearably annoying.
P.S. This is just part of my review, I will come back with more comments on this. Just leaving this meanwhile.
## Submitting and Maintaining the Test Policy | ||
|
||
- Any individual MAY propose amendments to this policy. | ||
- Proposed changes MUST be submitted as a pull request (PR) to the `_docs/policies` directory in the `status-go` repository. | ||
- The PR MUST include a brief justification for the amendment, answering: "Why is this change needed?" | ||
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant, as this is defined in the Policy Zero.
- Prove that the functionality performs as described | ||
- Can be falsified | ||
- Are resistant to fuzzing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't think we need to specify the quality of the tests. I find our developers reasonable enough to implement tests that makes sense and cover the feature. This basically means "be a good developer, write good tests" 😄 We are good developers here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point is: IMO just specifying that "the new functionality MUST be covered with tests" is enough here.
- Prove that the functionality performs as described | ||
- Can be falsified | ||
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably apply to unit/integration (go) tests, not functional (python). The latter have not yet been noted to be flaky, so I wouldn't apply such strict rules there.
We have tests types definition here: status-go/Test Types.
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. | ||
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. | ||
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a CI check would be nice, but the effort to implement it might not be worth the results.
There're 3 cases when a test can become flaky, and each reason requires different effort to implement a CI check:
Reason | How to CI check | Implementation difficulty (out of 5) |
Warnings |
---|---|---|---|
When implementing a new test | Check if this test exists on develop |
🔴 🔴 | - |
When changing and old test code | Check tests diff and find amended tests | 🔴 🔴 🔴 🔴 | - |
When changing tested code (without changing the test itself) | Use external tools (e.g. Sentry Carryforward flags) | 🔴 🔴 🔴 🔴 | Some code may be used in ALL tests, which would make CI to verify all tests a 1000 times |
I think we could some day implement (1),
but I would not bother with (2),
and there will always be (3), which is wrong to do (note the warning above).
👉 So for now I believe it's fine to require devs to run new/amended test locally and perhaps attach the execution report to the PR description/comment. At the same time teach devs to prevent flaky tests, so that we won't even need to run tests a 1000 times.
- Can be falsified | ||
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. | ||
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary detail
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. | |
- This can be achieved using the `-count` or `-test.count` flag with the `go test` command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, there's also stress
tool, which is helpful: status-go/Detect flaky tests
- Are resistant to fuzzing | ||
- All new `functional tests` MUST BE validated via a minimum of 1000 test runs. | ||
- This can be achieved using the `-count` or `-test.count` flag with the test command, e.g., `-count 1000` / `-test.count 1000`. | ||
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests via local testing. | |
- Where the CI cannot support this workflow automatically, the developer MUST perform validation tests locally. |
## Policy Amendments and Archival | ||
|
||
- Amendments: | ||
- This policy MAY be amended at any time. | ||
- Amendments MUST be submitted via a PR to the `status-go` repository. | ||
- Archival: | ||
- This policy MAY be archived if it becomes obsolete or replaced by newer policies. | ||
- Archival MUST be performed by submitting a PR that moves the policy to `_docs/policies/archived`. | ||
- The PR MUST include a rationale for the proposed amendment or archival in the PR description. | ||
- The PR MUST follow the [Review and Approval Process](./introduce-policy.md#review-and-approval-process) outlined in Policy Zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I believe is also redundant, as it is specified in the policy zero, which applies to all policies.
PR Description: Reintroducing the
status-go
Testing PolicySummary
This PR reintroduces the
status-go
Testing Policy, providing clear guidelines for test creation, validation, and handling flaky tests. The policy has been updated to align with Policy Zero, ensuring a structured, transparent, and inclusive approach to test management in the repository.Background
The
status-go
Testing Policy previously existed in the repository but was removed to allow all Core Contributors (CCs) to review, discuss, and provide input before formal reintroduction. This aligns with Policy Zero, which establishes a framework for collaboratively developing policies in key areas ofstatus-go
development. The goal is to ensure that all policies reflect the needs and perspectives of contributors across the project.What's Changed?*
Why Change?
status-go
.