Skip to content

kvnemesis: introduce swarm testing #150495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

miraradeva
Copy link
Contributor

This commit adds a variant of multi-node kvnemesis that uses swarm testing. The idea is that, at each step of kvnemesis, instead of choosing an operation randomly from the full set of possible operations (Get, Put, etc.), a run of kvnemesis can use a smaller subset of operations, and still choose each step randomly from that smaller set. This has been shown increase the test coverage and expose certain types of bugs within fewer test runs. Swarm testing paper: https://users.cs.utah.edu/~regehr/papers/swarm12.pdf.

For example, consider a stack implementation with Push and Pop APIs. If we always generate test cases by randomly choosing some Pushes and some Pops, it may take many test runs to generate a case where there are enough Pushes (without Pops) to reach the maximum size of the stack, and catch a potential stack overflow bug. Instead, we can generate subsets of APIs, {Push}, {Pop}, {Push, Pop}, and run the same randmized testing. The subset {Push} is guaranteed to hit the stack overflow bug quickly.

In kvnemesis, we have a lot of operations to choose from, so it's not feasible to generate all possible subsets. The swarm testing paper shows that generating random configurations (e.g. decide if each operation is part of the config with probability 1/2) is simple and effective in improving test coverage.

Release note: None

This commit adds a variant of multi-node kvnemesis that uses swarm
testing. The idea is that, at each step of kvnemesis, instead of
choosing an operation randomly from the full set of possible
operations (Get, Put, etc.), a run of kvnemesis can use a smaller
subset of operations, and still choose each step randomly from that
smaller set. This has been shown increase the test coverage and expose
certain types of bugs within fewer test runs. Swarm testing paper:
https://users.cs.utah.edu/~regehr/papers/swarm12.pdf.

For example, consider a stack implementation with Push and Pop APIs. If
we always generate test cases by randomly choosing some Pushes and some
Pops, it may take many test runs to generate a case where there are
enough Pushes (without Pops) to reach the maximum size of the stack,
and catch a potential stack overflow bug. Instead, we can generate
subsets of APIs, `{Push}, {Pop}, {Push, Pop}`, and run the same
randmized testing. The subset `{Push}` is guaranteed to hit the stack
overflow bug quickly.

In kvnemesis, we have a lot of operations to choose from, so it's not
feasible to generate all possible subsets. The swarm testing paper
shows that generating random configurations (e.g. decide if each
operation is part of the config with probability 1/2) is simple and
effective in improving test coverage.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna
Copy link
Collaborator

Maybe worth adding to the extra-stress nightly too?

@miraradeva
Copy link
Contributor Author

Maybe worth adding to the extra-stress nightly too?

I'm stressing it on my GCE worker, and so far nothing. I'll see if I can get it to find something (at least a sysbytes issue) before I un-draft it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants