-
Notifications
You must be signed in to change notification settings - Fork 327
Antithesis blogpost #1051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Antithesis blogpost #1051
Conversation
@serathius: GitHub didn't allow me to request PR reviews from the following users: marcus-hodgson-antithesis. Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, @serathius. I left a couple of comments :)
a8344d7
to
784068f
Compare
The platform works by running the entire etcd cluster inside a deterministic hypervisor. | ||
This specialized environment gives the testing software complete control over every source of non-determinism, | ||
such as network behavior, thread scheduling, and system clocks. | ||
This means any bug it discovers can be perfectly and reliably reproduced. | ||
|
||
Within this simulated environment, the testing methodology shifts away from traditional, scenario-based tests. | ||
Instead of writing tests with strict assertions that check for one specific outcome, | ||
this approach relies on flexible, property-based assertions. | ||
These properties are high-level invariants about the system that must always hold true. For example, | ||
"data consistency is never violated" or "a watch event is never dropped." | ||
|
||
The platform then treats these properties not as passive checks, but as targets to break. | ||
It combines automated exploration with targeted fault injection, | ||
actively searching for the precise sequence of events and failures that will cause a property to be violated. | ||
This active search for violations is what allows the platform to uncover subtle bugs that result from complex combinations of factors. | ||
Antithesis refers to this approach as autonomous testing. | ||
|
||
This builds upon etcd's existing robustness tests, which also use a property-based approach. | ||
However, without a deterministic environment or automated exploration, | ||
the original framework resembled throwing darts while blindfolded and hoping to hit the bullseye. | ||
A bug might be found, but the process relies heavily on random chance and is difficult to reproduce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CraigAlfieri please review, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jberkus Serdar (@sbenderli) and Akshay (@akshayjshah) are on the Antithesis Team and will be contributing thoughts here too.
1da93c7
to
2c82898
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It's great to see using system-level failpoints for simulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple minor changes. This looks ready to submit to CNCF marketing, if you agree.
bf94e3c
to
03f85b8
Compare
Sure, go ahead. Could we merge this PR as post is marked draft? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fuweid, ivanvc, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
03f85b8
to
bd03cb8
Compare
/lgtm |
/hold Ready, but on hold until the CNCF blog decides if they want it. |
Please note that this is configured as a draft. Do we still want to hold it? |
@jberkus any update from CNCF here? |
| [Watch on future revision might receive old events][bug-1-issue] | [Triage Report][bug-1-report] | Fixed in 3.6.2 ([\#20281][bug-1-fix]) | Medium | New bug discovered by Atithesis | | ||
| [Watch on future revision might receive old notifications][bug-2-issue] | [Triage Report][bug-2-report] | Fixed in 3.6.2 ([\#20221][bug-2-fix]) | Medium | New bug discovered by both Antithesis and robustness tests | | ||
| [Panic when two snapshots are received in short period][bug-3-issue] | [Triage Report][bug-3-report] | Open | Low | Previously discovered by robustness | | ||
| [Panic from db page expected to be 5][bug-4-issue] | [Triage Report][bug-4-report] | Open | Low | New bug discovered by Antithesis | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one has been fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
New changes are detected. LGTM label has been removed. |
Co-authored-by: Iván Valdés Castillo <[email protected]> Co-authored-by: Josh Berkus <[email protected]> Signed-off-by: Marek Siarkowicz <[email protected]>
70cba96
to
d13b1a1
Compare
@jberkus fixed, can we move forward with merging? |
https://www.cncf.io/blog/2025/09/25/autonomous-testing-of-etcds-robustness/ it has been published 🤣 |
Blogpost to summarize the Antithesis engagement.
/cc @jberkus @nate-double-u @fuweid @siyuanfoundation @ahrtr @ivanvc @marcus-hodgson-antithesis