Skip to content

Generated code is highly redundant #53

@jakobnissen

Description

@jakobnissen

Currently, in generate_exec_code, with :goto-generators, the expression bound to each action is NOT spliced into the generated code once. Instead, it is spliced into the generated code once for every edge in the FSM it appears on.

This is due to simplicity. Now, every edge generates its own little code snippet. For example, if the N'th edge leading to state X contains the action :foo with the Julia code do_foo(), the generated code will be

@label state_X_action_N
begin
    do_foo()
end
@goto state_X

The problem, of course, is that the action foo can be present on many edges. So instead of using the gotos to go to the SAME code section with do_foo(), do_foo() is spliced in multiple times leading to huge codegen.
Worse, the codegen potentially scales with the number of states squared.

Here is a MWE that makes the codegen blow up

import Automa
import Automa.RegExp: @re_str
const re = Automa.RegExp
const LETTERS = 'a':'m'

machine = let
    letters = []
    for i in eachindex(LETTERS)
        pat = re.parse("[$(join(deleteat!(collect(LETTERS), i)))]")
        pat.actions[:enter] = [Symbol(LETTERS[i])]
        push!(letters, pat)
    end

    states = []
    for i in eachindex(letters)
        push!(states, re.alt(deleteat!(copy(letters), i)...))
    end

    Automa.compile(re.cat(states...))
end

context = Automa.CodeGenContext(generator=:goto)
actions = Dict(Symbol(L) => :(foo[$i] += 0x01) for (i,L) in enumerate(LETTERS))

# Have a look at code now:
code = string(Automa.generate_exec_code(context, machine, actions));

It can get much worse. Setting LETTERS to e.g. 'a':'z' will cause each of the 26 actions to be duplicated 26^2 times, leading to 5.4 MiB of native code.

And here is a counter of how many times each action is multiplied in the code in FASTX.jl's FASTA machine:

  • :countline = 17
  • :mark = 8
  • :letters = 8
  • :record = 7
  • :identifier = 6
  • :header = 6
  • :pos = 5
  • :description = 2

We really, really should fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions