Skip to content

Conversation

serathius
Copy link
Member

Blogpost to summarize the Antithesis engagement.

/cc @jberkus @nate-double-u @fuweid @siyuanfoundation @ahrtr @ivanvc @marcus-hodgson-antithesis

@k8s-ci-robot
Copy link

@serathius: GitHub didn't allow me to request PR reviews from the following users: marcus-hodgson-antithesis.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Blogpost to summarize the Antithesis engagement.

/cc @jberkus @nate-double-u @fuweid @siyuanfoundation @ahrtr @ivanvc @marcus-hodgson-antithesis

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, @serathius. I left a couple of comments :)

@serathius serathius force-pushed the antithesis branch 9 times, most recently from a8344d7 to 784068f Compare August 27, 2025 09:16
Comment on lines 21 to 42
The platform works by running the entire etcd cluster inside a deterministic hypervisor.
This specialized environment gives the testing software complete control over every source of non-determinism,
such as network behavior, thread scheduling, and system clocks.
This means any bug it discovers can be perfectly and reliably reproduced.

Within this simulated environment, the testing methodology shifts away from traditional, scenario-based tests.
Instead of writing tests with strict assertions that check for one specific outcome,
this approach relies on flexible, property-based assertions.
These properties are high-level invariants about the system that must always hold true. For example,
"data consistency is never violated" or "a watch event is never dropped."

The platform then treats these properties not as passive checks, but as targets to break.
It combines automated exploration with targeted fault injection,
actively searching for the precise sequence of events and failures that will cause a property to be violated.
This active search for violations is what allows the platform to uncover subtle bugs that result from complex combinations of factors.
Antithesis refers to this approach as autonomous testing.

This builds upon etcd's existing robustness tests, which also use a property-based approach.
However, without a deterministic environment or automated exploration,
the original framework resembled throwing darts while blindfolded and hoping to hit the bullseye.
A bug might be found, but the process relies heavily on random chance and is difficult to reproduce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CraigAlfieri please review, thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jberkus Serdar (@sbenderli) and Akshay (@akshayjshah) are on the Antithesis Team and will be contributing thoughts here too.

@serathius serathius force-pushed the antithesis branch 3 times, most recently from 1da93c7 to 2c82898 Compare August 28, 2025 09:45
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It's great to see using system-level failpoints for simulation.

Copy link
Contributor

@jberkus jberkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor changes. This looks ready to submit to CNCF marketing, if you agree.

@serathius serathius force-pushed the antithesis branch 2 times, most recently from bf94e3c to 03f85b8 Compare September 4, 2025 08:16
@serathius
Copy link
Member Author

serathius commented Sep 4, 2025

This looks ready to submit to CNCF marketing, if you agree.

Sure, go ahead. Could we merge this PR as post is marked draft?

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fuweid, ivanvc, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@siyuanfoundation
Copy link
Contributor

/lgtm

@jberkus
Copy link
Contributor

jberkus commented Sep 5, 2025

/hold

Ready, but on hold until the CNCF blog decides if they want it.

@ivanvc
Copy link
Member

ivanvc commented Sep 5, 2025

/hold

Ready, but on hold until the CNCF blog decides if they want it.

Please note that this is configured as a draft. Do we still want to hold it?

@marcus-hodgson-antithesis

@jberkus any update from CNCF here?

| [Watch on future revision might receive old events][bug-1-issue] | [Triage Report][bug-1-report] | Fixed in 3.6.2 ([\#20281][bug-1-fix]) | Medium | New bug discovered by Atithesis |
| [Watch on future revision might receive old notifications][bug-2-issue] | [Triage Report][bug-2-report] | Fixed in 3.6.2 ([\#20221][bug-2-fix]) | Medium | New bug discovered by both Antithesis and robustness tests |
| [Panic when two snapshots are received in short period][bug-3-issue] | [Triage Report][bug-3-report] | Open | Low | Previously discovered by robustness |
| [Panic from db page expected to be 5][bug-4-issue] | [Triage Report][bug-4-report] | Open | Low | New bug discovered by Antithesis |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one has been fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@k8s-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 27, 2025
Co-authored-by: Iván Valdés Castillo <[email protected]>
Co-authored-by: Josh Berkus <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
@serathius
Copy link
Member Author

@jberkus fixed, can we move forward with merging?

@fuweid
Copy link
Member

fuweid commented Sep 29, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.