-
Notifications
You must be signed in to change notification settings - Fork 180
chore: Otter chaos bot #21937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
netopyr
wants to merge
7
commits into
main
Choose a base branch
from
21921-chaos-bot-v1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+861
−13
Open
chore: Otter chaos bot #21937
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
42f6fcb
First version of chaos bot
netopyr 7729b08
Merge branch 'main' into 21921-chaos-bot-v1
netopyr eca2bef
Review comments
netopyr 54dcc88
Merge branch 'main' into 21921-chaos-bot-v1
netopyr c310b06
Spotless
netopyr 6058bb7
Disable chaos test by default
netopyr 083a69f
Merge branch 'main' into 21921-chaos-bot-v1
netopyr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
...nsensus-otter-tests/src/testFixtures/java/org/hiero/otter/fixtures/chaosbot/ChaosBot.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| package org.hiero.otter.fixtures.chaosbot; | ||
|
|
||
| import edu.umd.cs.findbugs.annotations.NonNull; | ||
| import java.time.Duration; | ||
|
|
||
| /** | ||
| * A chaos bot that introduces randomized faults into the network. | ||
| */ | ||
| public interface ChaosBot { | ||
|
|
||
| /** | ||
| * Run chaos experiments for the specified duration. | ||
| * | ||
| * @param duration the duration to run chaos experiments | ||
| */ | ||
| void runChaos(@NonNull Duration duration); | ||
| } |
148 changes: 148 additions & 0 deletions
148
...-tests/src/testFixtures/java/org/hiero/otter/fixtures/chaosbot/internal/ChaosBotImpl.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| package org.hiero.otter.fixtures.chaosbot.internal; | ||
|
|
||
| import static java.util.Objects.requireNonNull; | ||
| import static org.hiero.otter.fixtures.chaosbot.internal.RandomUtil.randomGaussianDuration; | ||
|
|
||
| import com.swirlds.common.test.fixtures.Randotron; | ||
| import edu.umd.cs.findbugs.annotations.NonNull; | ||
| import java.time.Duration; | ||
| import java.time.Instant; | ||
| import java.util.Comparator; | ||
| import java.util.HashMap; | ||
| import java.util.Map; | ||
| import java.util.PriorityQueue; | ||
| import org.apache.logging.log4j.LogManager; | ||
| import org.apache.logging.log4j.Logger; | ||
| import org.hiero.otter.fixtures.Network; | ||
| import org.hiero.otter.fixtures.Node; | ||
| import org.hiero.otter.fixtures.TestEnvironment; | ||
| import org.hiero.otter.fixtures.TimeManager; | ||
| import org.hiero.otter.fixtures.chaosbot.ChaosBot; | ||
| import org.hiero.otter.fixtures.result.SingleNodeConsensusResult; | ||
|
|
||
| /** | ||
| * Implementation of a chaos bot that creates random failures in the test environment. | ||
| */ | ||
| public class ChaosBotImpl implements ChaosBot { | ||
|
|
||
| private static final Logger log = LogManager.getLogger(); | ||
|
|
||
| // These values will become configurable in the future. | ||
| private static final Duration CHAOS_INTERVAL = Duration.ofMinutes(3L); | ||
| private static final Duration CHAOS_DEVIATION = Duration.ofMinutes(2L); | ||
|
|
||
| private final TestEnvironment env; | ||
| private final Randotron randotron; | ||
| private final ExperimentFactory factory; | ||
| private final Map<Class<?>, Integer> statistics = new HashMap<>(); | ||
|
|
||
| /** | ||
| * Create a new chaos bot. | ||
| * | ||
| * @param env the test environment | ||
| */ | ||
| public ChaosBotImpl(@NonNull final TestEnvironment env) { | ||
| this(env, Randotron.create()); | ||
| } | ||
|
|
||
| /** | ||
| * Create a new chaos bot with a specific random seed. | ||
| * | ||
| * @param env the test environment | ||
| * @param seed the random seed | ||
| */ | ||
| public ChaosBotImpl(@NonNull final TestEnvironment env, final long seed) { | ||
| this(env, Randotron.create(seed)); | ||
| } | ||
|
|
||
| private ChaosBotImpl(@NonNull final TestEnvironment env, @NonNull final Randotron randotron) { | ||
| this.env = requireNonNull(env); | ||
| this.randotron = requireNonNull(randotron); | ||
| this.factory = new ExperimentFactory(env, randotron); | ||
| } | ||
|
|
||
| /** | ||
| * {@inheritDoc} | ||
| */ | ||
| @Override | ||
| public void runChaos(@NonNull final Duration duration) { | ||
| final Network network = env.network(); | ||
| final TimeManager timeManager = env.timeManager(); | ||
| final Instant endTime = timeManager.now().plus(duration); | ||
|
|
||
| final PriorityQueue<Experiment> runningExperiments = | ||
| new PriorityQueue<>(Comparator.comparing(Experiment::endTime)); | ||
| Instant nextStart = calculateNextStart(randotron, timeManager.now()); | ||
|
|
||
| while (timeManager.now().isBefore(endTime)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Some simple comments in this method would help readers understand more quickly |
||
| final Instant nextBreak = findEarliestInstant(endTime, nextStart, nextExperimentEnd(runningExperiments)); | ||
| timeManager.waitFor(Duration.between(timeManager.now(), nextBreak)); | ||
|
|
||
| while (nextExperimentEnd(runningExperiments) | ||
| .isBefore(timeManager.now().plusNanos(1L))) { | ||
| final Experiment finishedExperiment = runningExperiments.poll(); | ||
| assert finishedExperiment != null; // nextExperimentEnd would have returned Instant.MAX if empty | ||
| finishedExperiment.end(); | ||
| } | ||
|
|
||
| if (nextStart.isBefore(timeManager.now().plusNanos(1L))) { | ||
| final Experiment experiment = factory.createExperiment(); | ||
| if (experiment != null) { | ||
| statistics.merge(experiment.getClass(), 1, Integer::sum); | ||
| } | ||
| if (experiment != null) { | ||
| runningExperiments.add(experiment); | ||
| } | ||
| nextStart = calculateNextStart(randotron, timeManager.now()); | ||
| } | ||
| } | ||
|
|
||
| log.info("Chaos bot finished. Statistics of experiments run:"); | ||
| for (final Map.Entry<Class<?>, Integer> entry : statistics.entrySet()) { | ||
| log.info(" {}: {}", entry.getKey().getSimpleName(), entry.getValue()); | ||
| } | ||
|
|
||
| // End any remaining experiments. | ||
| network.restoreConnectivity(); | ||
| for (final Node node : network.nodes()) { | ||
| if (!node.isAlive()) { | ||
| node.start(); | ||
| } | ||
| } | ||
|
|
||
| // Wait until all nodes are active again. | ||
| timeManager.waitForCondition( | ||
| network::allNodesAreActive, | ||
| Duration.ofMinutes(5L), | ||
| "Not all nodes became active again after chaos bot finished"); | ||
|
|
||
| // Check that all nodes make progress | ||
| for (final Node node : network.nodes()) { | ||
| final SingleNodeConsensusResult consensusResult = node.newConsensusResult(); | ||
| final long currentRound = consensusResult.lastRoundNum(); | ||
| timeManager.waitForCondition( | ||
| () -> consensusResult.lastRoundNum() > currentRound, | ||
| Duration.ofSeconds(30L), | ||
| "Node " + node.selfId() + " did not make progress after chaos bot finished"); | ||
| } | ||
| } | ||
|
|
||
| @NonNull | ||
| private static Instant nextExperimentEnd(@NonNull final PriorityQueue<Experiment> runningExperiments) { | ||
| return runningExperiments.isEmpty() | ||
| ? Instant.MAX | ||
| : runningExperiments.peek().endTime(); | ||
| } | ||
|
|
||
| @NonNull | ||
| private static Instant findEarliestInstant( | ||
| @NonNull final Instant i1, @NonNull final Instant i2, @NonNull final Instant i3) { | ||
| return i1.isBefore(i2) ? (i1.isBefore(i3) ? i1 : i3) : (i2.isBefore(i3) ? i2 : i3); | ||
| } | ||
|
|
||
| @NonNull | ||
| private Instant calculateNextStart(@NonNull final Randotron randotron, @NonNull final Instant now) { | ||
| return now.plus(randomGaussianDuration(randotron, CHAOS_INTERVAL, CHAOS_DEVIATION)); | ||
| } | ||
| } | ||
43 changes: 43 additions & 0 deletions
43
...er-tests/src/testFixtures/java/org/hiero/otter/fixtures/chaosbot/internal/Experiment.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| package org.hiero.otter.fixtures.chaosbot.internal; | ||
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| import edu.umd.cs.findbugs.annotations.NonNull; | ||
| import java.time.Instant; | ||
| import org.hiero.otter.fixtures.Network; | ||
|
|
||
| /** | ||
| * An experiment that modifies the network or individual nodes in some way for a limited time. | ||
| */ | ||
| public abstract class Experiment { | ||
|
|
||
| protected final Network network; | ||
| protected final Instant endTime; | ||
|
|
||
| /** | ||
| * Create a new experiment. | ||
| * | ||
| * @param network the network of the test environment | ||
| * @param endTime the moment this experiment should end | ||
| */ | ||
| protected Experiment(@NonNull final Network network, @NonNull final Instant endTime) { | ||
| this.network = requireNonNull(network); | ||
| this.endTime = requireNonNull(endTime); | ||
| } | ||
|
|
||
| /** | ||
| * The moment this experiment should end. | ||
| * | ||
| * @return the end time of the experiment | ||
| */ | ||
| @NonNull | ||
| public Instant endTime() { | ||
| return endTime; | ||
| } | ||
|
|
||
| /** | ||
| * End the experiment, reverting any changes. | ||
| */ | ||
| public abstract void end(); | ||
| } |
48 changes: 48 additions & 0 deletions
48
...s/src/testFixtures/java/org/hiero/otter/fixtures/chaosbot/internal/ExperimentFactory.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| package org.hiero.otter.fixtures.chaosbot.internal; | ||
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| import com.swirlds.common.test.fixtures.Randotron; | ||
| import edu.umd.cs.findbugs.annotations.NonNull; | ||
| import edu.umd.cs.findbugs.annotations.Nullable; | ||
| import org.hiero.otter.fixtures.TestEnvironment; | ||
|
|
||
| /** | ||
| * Factory for creating random experiments. | ||
| */ | ||
| public class ExperimentFactory { | ||
|
|
||
| private final TestEnvironment env; | ||
| private final Randotron randotron; | ||
|
|
||
| /** | ||
| * Create a new experiment factory. | ||
| * | ||
| * @param env the test environment | ||
| * @param randotron the random number generator | ||
| */ | ||
| public ExperimentFactory(@NonNull final TestEnvironment env, @NonNull final Randotron randotron) { | ||
| this.env = requireNonNull(env); | ||
| this.randotron = requireNonNull(randotron); | ||
| } | ||
|
|
||
| /** | ||
| * Create a new random experiment. | ||
| * | ||
| * @return the created experiment, or {@code null} if no suitable experiment could be created | ||
| */ | ||
| @Nullable | ||
| public Experiment createExperiment() { | ||
| // For now, we assume all experiments are equally likely. Will become configurable in the future. | ||
| final int experimentType = randotron.nextInt(5); | ||
| return switch (experimentType) { | ||
| case 0 -> HighLatencyNodeExperiment.create(env, randotron); | ||
| case 1 -> LowBandwidthNodeExperiment.create(env, randotron); | ||
| case 2 -> NetworkPartitionExperiment.create(env, randotron); | ||
| case 3 -> NodeFailureExperiment.create(env, randotron); | ||
| case 4 -> NodeIsolationExperiment.create(env, randotron); | ||
| default -> throw new IllegalStateException("Unreachable code reached in ExperimentFactory"); | ||
netopyr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }; | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suggest calling this
chaosEndTimeso as to distinguish it more from theendTimeof each experiment.