Skip to content

Audit codebase for duplication and refactor #1412

@milkyskies

Description

@milkyskies

While reviewing #1407 we found real duplication in the parser. The same pattern likely exists across the codebase — this issue is to do a systematic audit of the entire repo and refactor.

Scope: the whole repo

Rust crates

crates/floe-core/ (the largest):

  • cst/ — covered by Refactor parser duplication in cst.rs #1411, can skip
  • lexer.rs, parser.rs
  • lower/ — visitor/lowering passes
  • checker/ — type checking, many check_* methods
  • codegen/typescript/ — emission (49 unused_self allows from chore: [#969] add strict clippy lints and fix all violations #1405 are concentrated here, smells like dispatch repetition)
  • formatter/ — formatting passes (similar shape to codegen)
  • desugar.rs, exhaustiveness.rs, resolve.rs, walk.rs, sourcemap.rs, type_layout.rs
  • interop/dts.rs + interop/tsgo/probe_gen.rs — large files
  • stdlib/ (21 files) — likely have repeated registration boilerplate

Other crates:

  • crates/floe-cli/ — CLI commands
  • crates/floe-lsp/ — LSP handlers (handler-per-feature pattern often duplicates)
  • crates/floe-doc-check/
  • crates/floe-test-helpers/
  • crates/floe-wasm/

Non-Rust

  • integrations/ — npm packages (@floeorg/core, vite-plugin, esbuild-plugin, register, hono)
  • editors/ — VS Code extension, tree-sitter grammar, neovim queries (highlights.scm copies are explicitly maintained dupes per syntax-sources.md, but check for accidental ones)
  • docs/site/ — docs (likely fine, but check for repeated example boilerplate)
  • tests/lsp/ + tests/cst/

What to look for

  1. Byte-for-byte identical functions — like expect/expect_kind from cst.rs (Refactor parser duplication in cst.rs #1411). Grep for suspiciously similar names.
  2. Template duplication — N functions with identical structure differing only in operator/variant/string. Replace with one generic helper.
  3. Repeated boolean disjunctionsself.x(A) || self.x(B) || self.x(C) patterns that beg for an any/in helper.
  4. Copy-paste branchesmatch arms that look slightly different but do the same thing under different names.
  5. Repeated registration/setup boilerplate — common in stdlib registration, codegen tables, lexer keyword maps.

Suggested approach

  1. Use tokei or cargo tree to find the largest files.
  2. For each large file, scan for repetitive structure (visual inspection works; an LLM agent can be useful here).
  3. File one PR per cluster of duplication so reviews stay manageable. Don't roll everything into one giant refactor.
  4. Track sub-issues under this epic if the work is large.

Tests

Existing tests should cover all behavior. After each refactor:

  • cargo clippy --profile ci --workspace -- -D warnings zero errors
  • cargo test --workspace all pass
  • Frontend / integration tests pass where applicable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions