-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Currently, in generate_exec_code, with :goto-generators, the expression bound to each action is NOT spliced into the generated code once. Instead, it is spliced into the generated code once for every edge in the FSM it appears on.
This is due to simplicity. Now, every edge generates its own little code snippet. For example, if the N'th edge leading to state X contains the action :foo with the Julia code do_foo(), the generated code will be
@label state_X_action_N
begin
do_foo()
end
@goto state_X
The problem, of course, is that the action foo can be present on many edges. So instead of using the gotos to go to the SAME code section with do_foo(), do_foo() is spliced in multiple times leading to huge codegen.
Worse, the codegen potentially scales with the number of states squared.
Here is a MWE that makes the codegen blow up
import Automa
import Automa.RegExp: @re_str
const re = Automa.RegExp
const LETTERS = 'a':'m'
machine = let
letters = []
for i in eachindex(LETTERS)
pat = re.parse("[$(join(deleteat!(collect(LETTERS), i)))]")
pat.actions[:enter] = [Symbol(LETTERS[i])]
push!(letters, pat)
end
states = []
for i in eachindex(letters)
push!(states, re.alt(deleteat!(copy(letters), i)...))
end
Automa.compile(re.cat(states...))
end
context = Automa.CodeGenContext(generator=:goto)
actions = Dict(Symbol(L) => :(foo[$i] += 0x01) for (i,L) in enumerate(LETTERS))
# Have a look at code now:
code = string(Automa.generate_exec_code(context, machine, actions));
It can get much worse. Setting LETTERS to e.g. 'a':'z' will cause each of the 26 actions to be duplicated 26^2 times, leading to 5.4 MiB of native code.
And here is a counter of how many times each action is multiplied in the code in FASTX.jl's FASTA machine:
- :countline = 17
- :mark = 8
- :letters = 8
- :record = 7
- :identifier = 6
- :header = 6
- :pos = 5
- :description = 2
We really, really should fix this.