-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
as part of #12688,
I'm trying to develop an algorithm that can intersect FST and FSA efficiently. For FSA this means we can leverage the sorted transitions of a given state and perform binary search in order to advance to a target.
So the algorithm uses TransitionAccessor
and there are three relevant APIs
int initTransition(int state, Transition t);
/** Iterate to the next transition after the provided one */
void getNextTransition(Transition t);
/**
* Fill the provided {@link Transition} with the index'th transition leaving the specified state.
*/
void getTransition(int state, int index, Transition t);
One could call initTransition
followed by many getNextTransition
to iterate the transitions one by one. Or in my case, Binary search needs to use getTransition
to randomly access the middle point.
I debugged for hours and realized for a given test case my algo works as expected but the transitions I got were messed up. The original tests uses quite complicated automatons so I tried to find a small and simple enough to exhibit the behavior. Here is the test
public void testAutomaton() {
RegExp regExp = new RegExp("+*.", RegExp.NONE);
Automaton a = regExp.toAutomaton();
CompiledAutomaton compiledAutomaton = new CompiledAutomaton(a);
System.out.println("isFinite: " + compiledAutomaton.finite);
var byteRunnable = compiledAutomaton.getByteRunnable();
var transitionAccessor = compiledAutomaton.getTransitionAccessor();
// dfsAutomaton(byteRunnable, transitionAccessor, 0, "");
// dumpTransitionsViaNext(byteRunnable, transitionAccessor, 0, new HashSet<>());
dumpTransitionsViaRA(byteRunnable, transitionAccessor, 0, new HashSet<>());
}
void dumpTransitionsViaNext(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) {
if (seenStates.contains(currentState)) {
return;
}
seenStates.add(currentState);
var t = new Transition();
var numStates = transitionAccessor.initTransition(currentState, t);
for (int i = 0; i < numStates; i++) {
transitionAccessor.getNextTransition(t);
System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] "
+ "dest: " + t.dest + " is dest accept: "
+ (a.isAccept(t.dest) ? "yes" : "no"));
dumpTransitionsViaNext(a, transitionAccessor, t.dest, seenStates);
}
}
void dumpTransitionsViaRA(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) {
if (seenStates.contains(currentState)) {
return;
}
seenStates.add(currentState);
var t = new Transition();
var numStates = transitionAccessor.initTransition(currentState, t);
// transitionAccessor.getTransition(currentState, numStates - 1, t);
for (int i = 0; i < numStates; i++) {
transitionAccessor.getTransition(currentState, i, t);
System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] "
+ "dest: " + t.dest + " is dest accept: "
+ (a.isAccept(t.dest) ? "yes" : "no"));
dumpTransitionsViaRA(a, transitionAccessor, t.dest, seenStates);
}
}
dumpTransitionsViaNext
gives expected transitions but dumpTransitionsViaRA
produces a mess.
Via next
At: src: 0 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes
At: src: 2 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes
At: src: 2 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 3 [194, 223] dest: 7 is dest accept: no
At: src: 7 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 4 [224, 239] dest: 8 is dest accept: no
At: src: 8 arcIdx: 0 [128, 142] dest: 11 is dest accept: no
At: src: 11 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 8 arcIdx: 1 [143, 143] dest: 11 is dest accept: no
At: src: 8 arcIdx: 2 [144, 190] dest: 11 is dest accept: no
At: src: 8 arcIdx: 3 [191, 191] dest: 11 is dest accept: no
At: src: 2 arcIdx: 5 [240, 243] dest: 9 is dest accept: no
At: src: 9 arcIdx: 0 [128, 142] dest: 12 is dest accept: no
At: src: 12 arcIdx: 0 [128, 142] dest: 13 is dest accept: no
At: src: 13 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 12 arcIdx: 1 [143, 143] dest: 13 is dest accept: no
At: src: 12 arcIdx: 2 [144, 190] dest: 13 is dest accept: no
At: src: 12 arcIdx: 3 [191, 191] dest: 13 is dest accept: no
At: src: 9 arcIdx: 1 [143, 143] dest: 12 is dest accept: no
At: src: 9 arcIdx: 2 [144, 190] dest: 12 is dest accept: no
At: src: 9 arcIdx: 3 [191, 191] dest: 12 is dest accept: no
At: src: 2 arcIdx: 6 [244, 244] dest: 10 is dest accept: no
At: src: 10 arcIdx: 0 [128, 142] dest: 14 is dest accept: no
At: src: 14 arcIdx: 0 [128, 142] dest: 16 is dest accept: no
At: src: 16 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 14 arcIdx: 1 [143, 143] dest: 16 is dest accept: no
At: src: 14 arcIdx: 2 [144, 190] dest: 16 is dest accept: no
At: src: 14 arcIdx: 3 [191, 191] dest: 16 is dest accept: no
At: src: 10 arcIdx: 1 [143, 143] dest: 15 is dest accept: no
At: src: 15 arcIdx: 0 [128, 142] dest: 17 is dest accept: no
At: src: 17 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 15 arcIdx: 1 [143, 143] dest: 17 is dest accept: no
At: src: 15 arcIdx: 2 [144, 190] dest: 17 is dest accept: no
At: src: 15 arcIdx: 3 [191, 191] dest: 18 is dest accept: no
At: src: 18 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 3 [194, 223] dest: 3 is dest accept: no
At: src: 3 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 4 [224, 239] dest: 4 is dest accept: no
At: src: 4 arcIdx: 0 [128, 142] dest: 19 is dest accept: no
At: src: 19 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 4 arcIdx: 1 [143, 143] dest: 19 is dest accept: no
At: src: 4 arcIdx: 2 [144, 190] dest: 19 is dest accept: no
At: src: 4 arcIdx: 3 [191, 191] dest: 19 is dest accept: no
At: src: 0 arcIdx: 5 [240, 243] dest: 5 is dest accept: no
At: src: 5 arcIdx: 0 [128, 142] dest: 20 is dest accept: no
At: src: 20 arcIdx: 0 [128, 142] dest: 21 is dest accept: no
At: src: 21 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 20 arcIdx: 1 [143, 143] dest: 21 is dest accept: no
At: src: 20 arcIdx: 2 [144, 190] dest: 21 is dest accept: no
At: src: 20 arcIdx: 3 [191, 191] dest: 21 is dest accept: no
At: src: 5 arcIdx: 1 [143, 143] dest: 20 is dest accept: no
At: src: 5 arcIdx: 2 [144, 190] dest: 20 is dest accept: no
At: src: 5 arcIdx: 3 [191, 191] dest: 20 is dest accept: no
At: src: 0 arcIdx: 6 [244, 244] dest: 6 is dest accept: no
At: src: 6 arcIdx: 0 [128, 142] dest: 22 is dest accept: no
At: src: 22 arcIdx: 0 [128, 142] dest: 24 is dest accept: no
At: src: 24 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 22 arcIdx: 1 [143, 143] dest: 24 is dest accept: no
At: src: 22 arcIdx: 2 [144, 190] dest: 24 is dest accept: no
At: src: 22 arcIdx: 3 [191, 191] dest: 24 is dest accept: no
At: src: 6 arcIdx: 1 [143, 143] dest: 23 is dest accept: no
At: src: 23 arcIdx: 0 [128, 142] dest: 25 is dest accept: no
At: src: 25 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 23 arcIdx: 1 [143, 143] dest: 25 is dest accept: no
At: src: 23 arcIdx: 2 [144, 190] dest: 25 is dest accept: no
At: src: 23 arcIdx: 3 [191, 191] dest: 26 is dest accept: no
At: src: 26 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
Via RA
isFinite: false
At: src: 0 arcIdx: 0 [0, 42] dest: 0 is dest accept: no
At: src: 0 arcIdx: 1 [43, 43] dest: 0 is dest accept: no
At: src: 0 arcIdx: 2 [44, 127] dest: 0 is dest accept: no
At: src: 0 arcIdx: 3 [194, 223] dest: 0 is dest accept: no
At: src: 0 arcIdx: 4 [224, 239] dest: 0 is dest accept: no
At: src: 0 arcIdx: 5 [240, 243] dest: 0 is dest accept: no
At: src: 0 arcIdx: 6 [244, 244] dest: 0 is dest accept: no
I shared this with @zhaih and he seemed to have an idea of the fix.
Version and environment details
No response