-
Notifications
You must be signed in to change notification settings - Fork 235
Add BDD-based rules engine trait #2703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
25c0e7f
to
16503fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done reviewing, but I thought I should at least post what I have. I still need to look at the bdd sifting and tests.
overall looks great
...ware/amazon/smithy/rulesengine/language/syntax/expressions/functions/FunctionDefinition.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/Bdd.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/Bdd.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/cfg/Cfg.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/cfg/Cfg.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/resources/META-INF/smithy/smithy.rules.smithy
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddBuilder.java
Show resolved
Hide resolved
...gine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/DefaultOrderingStrategy.java
Outdated
Show resolved
Hide resolved
...engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddEquivalenceChecker.java
Outdated
Show resolved
Hide resolved
...-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddNodeHelpers.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When are we going to be running the BDD optimizations? I think it would make sense to do either prior to code generation, or better as a sort of pre-compile/formatting step. The latter would make sure it's only done once, but maybe a generator wouldn't want to trust that
...ules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/OrderConstraints.java
Outdated
Show resolved
Hide resolved
...ules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/OrderConstraints.java
Outdated
Show resolved
Hide resolved
I don't think anyone do code generation from a Bdd trait will want to optimize at all. We'll only ship already optimized BDDs. In the future, I want us to eventually ship just the BDD trait and not the current decision tree trait. We'd do the optimizations at the end of the build process that computes the BDD (sifting, reversal, etc). When building BDDs manually because you just have the decision tree and no BDD trait, you can choose to either optimize or not based on your "budget". |
@range(min: 0) | ||
nodeCount: Integer | ||
|
||
/// Base64-encoded array of BDD nodes representing the decision graph structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the base64 encoding will make this trait difficult to write and review. How do you envision the development process for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to embed thousands of numbers in arrays of arrays, which is just as unreadable and significantly more JSON to parse. The zig-zag encoding of the binary of the numbers gives a much more compact representation and lets consumers of the trait parse it directly into whatever data structure they want (e.g., in Java, we'd use int[][] instead of List for performance).
I don't envision people authoring BDDs by hand. They're going to typically generate them from something else. I will probably add some code in future PRs to make that easier too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the expectation is that people write an endpointRuleset
and then use some transformer like convertRulesetToBdd
that also filters the endpointRuleset
trait? That should be fine.
We'd also want some other tooling to make it easy to work with, like being able to compile/optimize from the command line, e.g. smithy rules optimize --timeout X --exhaustiveness X ...
. Back-porting the optimizations to the endpointRuleset
trait would also be cool. We do have one in the CFG that we use while optimizing. And something to pretty-print the BDD.
All that can be done later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I plan to add an API that contributes paths to results to a CFG, and then combines them all into one big ITE chain that the BDD can turn into a compressed DAG representation.
8ed7d56
to
f453664
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will continue tomorrow but posting comments so they can be discussed/addressed.
...e/src/main/java/software/amazon/smithy/rulesengine/language/syntax/parameters/Parameter.java
Show resolved
Hide resolved
...rules-engine/src/main/java/software/amazon/smithy/rulesengine/language/syntax/rule/Rule.java
Outdated
Show resolved
Hide resolved
c648fa5
to
219e5e1
Compare
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddBuilder.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddCompiler.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddCompiler.java
Outdated
Show resolved
Hide resolved
...engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddEquivalenceChecker.java
Outdated
Show resolved
Hide resolved
...engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddEquivalenceChecker.java
Outdated
Show resolved
Hide resolved
smithy-rules-engine/src/main/java/software/amazon/smithy/rulesengine/logic/bdd/BddTrait.java
Outdated
Show resolved
Hide resolved
This commit updates the smithy-rules-engine package to support binary decision diagrams (BDD) to more efficiently resolve endpoints. We create the BDD by converting the decision tree into a control flow graph (CFG), then compile the CFG to a BDD. The CFG canonicalizes conditions for better sharing (e.g., sorts commutative functions, expands simple string templates, etc), and strips all conditions from results and hash-conses them as well. Later, we'll migrate to emitting the BDD directly in order to shave off many conditions and results that can be simplified. Our decision-tree based rules engine requires deep branching logic to find results. When evaluating the path to a result based on given input, decision trees require descending into a branch, and if at any point a condition in the branch fails, you bail out and go back up to the next branch. This can cause pathological searches of a tree (e.g., 60+ repeated checks on things like isset and booleanEquals to resolve S3 endpoints). In fact, there are currently ~73,000 unique paths through the current decision tree for S3 rules. Using a BDD (a fully reduced one at least) guarantees that we only evaluate any given condition at most once, and only when that condition actually discriminates the result. This is achieved by recursively converting the CFG into BDD nodes using ITE (if-then-else) operations, choosing a variable ordering that honors dependencies between conditions and variable bindings. The BDD builder applies Shannon expansion during ITE operations and uses hash-consing to share common subgraphs. The "bdd" trait has most of the same information as the endpointRuleset trait, but doesn't include "rules". Instead it contains a base64 encoded "nodes" value that contains the zig-zag variable-length encoded node triples, one after the other (this is much more compact and efficient to decode than 1000+ JSON array nodes). The BDD implementation uses CUDD-style complement edges where negative node references represent logical NOT, further reducing BDD size.
Rather than have the Bdd class contain Condition, Results, Parameters, etc, it now just deals with nodes. It also now hides the implementation detail of how the BDD nodes are laid out internally. BDD evaluation is internalized to the BDD as well rather than a separate BddEvaluator. This change provides faster evaluation, makes it possible to change the internal node data layout if necessary, and cleans up all the interacts we had with BddTrait (no need to always reach into Bdd).
We were using the wrong condition ordering in BddTrait after compiling a Bdd from the CFG, leading to a totally broken BDD. Also adds some tests, fixes, and generalizes BddTrait transforms
This also revealed a bug in the BDD compilation process that was causing negated nodes to get added twice.
The varint encoding does help compact the binary node array, but adds maybe a bit to much decoding complexity for only a 20-30% size reduction, and most of the size comes from conditions and results.
Our previous initial ordering could result in pathalogical orderings if it decided to moving something very early from the CFG to very late. This is in fact what happened when I added a coalesce method: it moved an early discriminating condition to very late, which blew up the BDD from ~40K nodes to 5.1M. This taught me that we really shouldn't throw away the ordering found in the CFG, and instead should leverage it when determining the initial ordering since it inherently gates logic and keeps related conditions together. So now the initial ordering is based on the CFG ordering and also on cone analysis (basically how many downstream nodes a node affects). We now get an initial ordering ~3K nodes, and with the coalesce method, we can now sift S3 down to ~800 nodes instead of ~1000. The coalesce function is added here so that we can fold bind-then-test conditions into a single condition. The current endpoints type system has strict nullability requirements. So you can't do a substring test and pass that directly into something that expects a non-null value. You have to first do the nullable function, then assign that to a value, then the next condition is inherently guarded and only called if the assigned value is non-null (the assignment acts as an implicit guard). The coalsce function allows us to identify these patterns and inline the test into a single condition by defaulting null to the zero value of the return type (integer=0, string="", array=[]). We only coalesce when the comparison is not to literally the zero value. When coalesce was added, it uncovered the original brittle ordering, leading to the much improved ordering in this PR.
This commit updates the smithy-rules-engine package to support binary decision diagrams (BDD) to more efficiently resolve endpoints.
We create the BDD by converting the decision tree into a control flow graph (CFG), then compile the CFG to a BDD. The CFG canonicalizes conditions for better sharing (e.g., sorts commutative functions, expands simple string templates, etc), and strips all conditions from results and hash-conses them as well. Later, we'll migrate to emitting the BDD directly in order to shave off many conditions and results that can be simplified.
Our decision-tree based rules engine requires deep branching logic to find results. When evaluating the path to a result based on given input, decision trees require descending into a branch, and if at any point a condition in the branch fails, you bail out and go back up to the next branch. This can cause pathological searches of a tree (e.g., 60+ repeated checks on things like isset and booleanEquals to resolve S3 endpoints). In fact, there are currently ~73,000 unique paths through the current decision tree for S3 rules.
Using a BDD (a fully reduced one at least) guarantees that we only evaluate any given condition at most once, and only when that condition actually discriminates the result. This is achieved by recursively converting the CFG into BDD nodes using ITE (if-then-else) operations, choosing a variable ordering that honors dependencies between conditions and variable bindings. The BDD builder applies Shannon expansion during ITE operations and uses hash-consing to share common subgraphs.
The "bdd" trait has most of the same information as the endpointRuleset trait, but doesn't include "rules". Instead it contains a base64 encoded "nodes" value that contains the zig-zag variable-length encoded node triples, one after the other (this is much more compact and efficient to decode than 1000+ JSON array nodes).
The BDD implementation uses CUDD-style complement edges where negative node references represent logical NOT, further reducing BDD size.
BDD output examples
AWS Connect BDD output
bdd trait
Endpoint rules: BDD vs Decision tree size comparison
Regional service
S3
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.