Skip to content

Latest commit

 

History

History
266 lines (192 loc) · 21.2 KB

File metadata and controls

266 lines (192 loc) · 21.2 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project

OpenLess is a menu-bar/tray voice-input layer. Hold or toggle a global hotkey, speak, and the dictated text is polished and inserted at the current cursor in any app. Product principles, state machine, and module list live in docs/openless-development.md and docs/openless-overall-logic.md — read those before changing product behavior.

The active codebase lives at openless-all/app/ and is Tauri 2 + Rust backend + React/TS frontend, targeting macOS 12+ and Windows. The legacy Swift implementation (Sources/, Tests/, Package.swift, appcast.xml, Sparkle pipeline) was removed in commit 34d2823; do not resurrect it.

UI must match openless-all/design_handoff_openless/*.jsx pixel-for-pixel; the JSX is reference-only, never imported.

Adjacent docs:

  • AGENTS.md is the parallel of this file for Codex sessions; the research-before-coding rules at the bottom of this file delegate to it.
  • README.md / README.zh.md (root) are user-facing install + feature guides; USAGE.md covers runtime usage. Update them when shipping user-visible features, not for internal refactors.

Build, Run, Test

Tauri (current — start here)

cd "openless-all/app"
npm ci

# Dev: vite at :1420 + tauri shell
npm run tauri dev

# Build .app (+ DMG) — use this script, not `tauri build` directly,
# because it threads Apple signing env vars and validates Info.plist.
./scripts/build-mac.sh           # build, sign, install to /Applications, reset TCC
INSTALL=0 ./scripts/build-mac.sh # build only

# Frontend-only TS check
npm run build   # = tsc && vite build

# Rust type-check without full compile
cargo check --manifest-path src-tauri/Cargo.toml

Windows (cross-check only — no macOS runner in CI)

# Preflight: verify toolchain
.\scripts\windows-preflight.ps1

# Build (requires Windows host or cross-compile target)
.\scripts\windows-build-gnu.ps1

Generated artifacts:

  • openless-all/app/src-tauri/target/release/bundle/macos/OpenLess.app
  • openless-all/app/src-tauri/target/release/bundle/dmg/OpenLess_<version>_aarch64.dmg

Logs: ~/Library/Logs/OpenLess/openless.log (macOS) / %LOCALAPPDATA%\OpenLess\Logs\openless.log (Windows).

There is no test runner wired in for the frontend. src/lib/providerSetup.test.ts is a hand-rolled assertion script — run with npx tsx src/lib/providerSetup.test.ts if you need it. Rust backend unit tests are run with cargo test --manifest-path src-tauri/Cargo.toml --lib; hardware / OS-integration behavior is still verified by running the app.

Architecture

coordinator::Coordinator is the single owner of all session state — both the dictation phase machine (Idle → Starting → Listening → Processing → Inserting → Done) and the parallel QA phase machine (Idle → Recording → Processing). Hotkey edges drive both. Recorder, ASR, polish, insertion, selection capture, and history are wired here and nowhere else. Leaf modules never call across each other — they each depend only on types.rs.

The coordinator was split into a module: coordinator.rs is the public entry; coordinator/{dictation,qa,resources}.rs carry per-pipeline logic; coordinator_state.rs is the pure (no Tauri / audio / clipboard) state-transition layer that makes phase decisions unit-testable.

Rust (openless-all/app/src-tauri/src)            Purpose
──────────────────────────────────────────       ────────────────────────────────
types.rs                                          Pure value types: sessions, PolishMode, HotkeyBinding, errors, QaChatMessage
coordinator.rs                                    Public entry; owns Inner, hotkey wiring, capsule emits
coordinator/{dictation,qa,resources}.rs           Dictation pipeline / QA pipeline / shared helpers (begin/end/cancel)
coordinator_state.rs                              Pure state transitions — Tauri-free, unit-testable
commands.rs + lib.rs + main.rs                    IPC surface (`invoke_handler!`), tray icon, window plumbing, entry
permissions.rs                                    TCC checks (Accessibility / Microphone / AppleEvents)

— Hotkeys (three parallel monitors) —
hotkey.rs                                         Modifier-only hotkey via native CGEventTap (macOS) / rdev (Win/Linux)
combo_hotkey.rs                                   Custom-combo dictation hotkey (when user picks combo over modifier-only)
qa_hotkey.rs                                      QA toggle hotkey (default Cmd/Ctrl+Shift+;) via `global-hotkey` crate
global_hotkey_runtime.rs                          Shared `global-hotkey` Carbon/Win event runtime (combo + QA share it)
shortcut_binding.rs                               Shared parse/validate of user-configurable bindings

— Audio / ASR / LLM —
recorder.rs                                       Mic → 16 kHz mono Int16 PCM, RMS callback
audio_mute.rs                                     System-output mute guard while recording (RAII)
asr/{mod,frame,volcengine,whisper}.rs + asr/local/* ASR providers: Volcengine streaming WS, Whisper HTTP, Bailian, local Foundry
polish.rs                                         OpenAI-compatible chat completions (Ark / DeepSeek / Codex OAuth reuse)
llm_gemini.rs                                     Native Google Gemini client — NOT OpenAI-compatible (separate auth, thinkingConfig, role:model)
correction.rs                                     User-defined correction rules (separate from vocab dictionary)

— Insertion (two paths) —
insertion.rs                                      AX focused-element write → clipboard + paste shortcut → copy-only fallback
windows_ime_{ipc,profile,protocol,session}.rs    Windows IME-side text injection over IPC (parallel insertion path; activates OpenLess TSF profile and submits text via named pipe)
selection.rs                                      Cross-platform selection capture for QA: macOS AX → Cmd/Ctrl+C simulate-copy → Linux PRIMARY (best-effort)

persistence.rs                                    history.json / preferences.json / dictionary.json + platform credential vault

Frontend (openless-all/app/src)
src/components/Capsule.tsx                        Capsule view + state enum
src/ (React)                                      Main window UI: Overview / History / Vocab / Style / Settings
src/i18n/                                         react-i18next init + zh-CN / en resources (zh-CN is source of truth)
src/pages/_atoms.tsx                              Recoil atoms — global frontend state
src/state/HotkeySettingsContext.tsx               HotkeySettings React context (capability + binding from backend)

Dictation pipeline

hotkey edge (1st)  →  beginSession:  Recorder.start → ASR.openSession → BufferingAudioConsumer.attach
hotkey edge (2nd)  →  endSession:    Recorder.stop → ASR.sendLastFrame → awaitFinal → Polish → Insert → History.save
.cancelled         →  ASR.cancel, Recorder.stop, capsule .cancelled

Invariants:

  • Polish/ASR fallbacks are silent. Missing Ark creds → insert raw transcript. Missing Volcengine creds → mock pipeline copies a placeholder. The contract is "the user's words don't get lost" — don't add hard errors here.
  • BufferingAudioConsumer queues PCM until the WebSocket is ready, then drains. Recorder always pushes to it; ASR is attached after openSession resolves.
  • Hotkey is toggle-only, not press-and-hold. The monitor yields one edge per modifier-key keydown; the coordinator interprets odd/even.

Q&A pipeline (selection-based ask-the-LLM)

Parallel state machine, lives in coordinator/qa.rs + qa_hotkey.rs + selection.rs. Default trigger: Cmd+Shift+; (macOS) / Ctrl+Shift+; (Win/Linux).

QA hotkey edge          →  toggle panel:    open → capture front_app, clear messages, show QA window
                                            close → cancel session, hide window, sweep capsule
Option/dictation edge   →  routed by panel_visible flag (see below):
  while panel_visible & dictation Idle → handle_qa_option_edge:
    QaPhase::Idle      →  begin_qa_session: capture_selection() → Recorder.start → ASR.openSession
    QaPhase::Recording →  end_qa_session:   Recorder.stop → ASR final → LLM (with selection as context) → emit qa:state
    QaPhase::Processing→  ignored (LLM in flight)
  otherwise                                  handle_pressed (normal dictation)

Invariants & gotchas:

  • Hotkey routing. When the QA panel is visible, the dictation hotkey edge routes to QA — unless a dictation session is already mid-flight (Starting/Listening/Processing/Inserting), in which case the edge stays with dictation. Otherwise QA's begin_qa_session would race for the same mic device (cpal rejects the second build_input_stream on macOS/Win, PipeWire opens two streams on Linux — neither is recoverable from the QA panel UI). See audit 3.3.1 in coordinator/dictation.rs.
  • Capsule sweep on panel open. Open emits a fresh CapsuleState::Idle only if dictation is Idle. If dictation is Recording/Polishing/Inserting/Done, the sweep is suppressed so the user's in-flight feedback isn't wiped. See audit 3.3.4.
  • Selection capture is a 3-tier fallback (selection.rs): (1) macOS AX kAXSelectedTextAttribute direct read, no clipboard touched; (2) macOS/Windows simulate Cmd/Ctrl+C → snapshot + restore original clipboard, 80 ms read window; (3) Linux PRIMARY via wl-paste / xclip / xsel, best-effort. Returns None when the user genuinely selected nothing.
  • Selection truncation. Hard cap 4000 chars; over → keep first 2000 + […truncated…] + last 2000. Don't raise this without checking LLM context budgeting — Gemini and Ark have different limits.
  • Multi-turn memory. QaSessionState.messages accumulates user→assistant pairs across turns within a single panel session; closing the panel clears them.

Insertion paths

insertion.rs is the cross-platform default. On Windows there is a second insertion path in windows_ime_{ipc,profile,protocol,session}.rs that activates a TSF profile (CLSID + GUID baked in windows_ime_profile.rs) and submits text over a named-pipe IPC. The coordinator picks one based on user preference / fallback status; both routes return the same InsertStatus (Inserted / CopiedFallback). When changing insertion behavior, decide which path you're touching — they don't share code.

Permissions, credentials, on-disk state

  • Bundle ID com.openless.app is hard-coded in openless-all/app/src-tauri/tauri.conf.json and CredentialsVault.serviceName. Changing it breaks system credential vault lookups and every existing TCC grant.
  • TCC: Microphone + Accessibility + AppleEvents. NSMicrophoneUsageDescription / NSAccessibilityUsageDescription / NSAppleEventsUsageDescription live in openless-all/app/src-tauri/Info.plist. After a fresh build that resets TCC, the app must be fully quit and relaunched after granting Accessibility before the global hotkey tap installs.
  • Credentials live in the OS credential vault (macOS Keychain, Windows Credential Manager, Linux keyring) under service com.openless.app. The legacy plaintext JSON (~/.openless/credentials.json on macOS/Linux, %APPDATA%\OpenLess\credentials.json on Windows) is only a migration source and is removed after a successful vault write. Never hard-code keys or include legacy credential files in logs, exports, build artifacts, or bug reports.
  • Per-user data:
    • macOS: ~/Library/Application Support/OpenLess/{history.json, preferences.json, dictionary.json} — capped at 200 history entries. Do not rename dictionary.json to vocab.json (drops user data).
    • Windows: %APPDATA%\OpenLess\
    • Linux: $XDG_DATA_HOME/OpenLess

Release pipeline

Push a v*-tauri tag → .github/workflows/release-tauri.yml builds macOS arm64 .dmg and Windows x64 .msi. macOS Developer ID signing + notarization runs only when APPLE_CERTIFICATE / APPLE_CERTIFICATE_PASSWORD / APPLE_ID / APPLE_PASSWORD / APPLE_TEAM_ID secrets are set; otherwise it falls back to ad-hoc signing with a CI warning.

When bumping versions, update both version fields: openless-all/app/package.json and openless-all/app/src-tauri/tauri.conf.json (and Cargo.toml).

Branch & release-channel workflow

Two-channel branching. Branch name = release channel.

  • betaBeta channel (开发版). Default branch, integration buffer. All PRs target beta (never main). Beta builds may exist but are not pushed to general users — only opt-in users on the Beta channel see them.
  • mainStable channel (正式版). Always-releasable. Updated only by beta → main merges performed by maintainers after a two-platform smoke build. Release tags v<version>-tauri are pushed on main and trigger release-tauri.yml (tag-driven; unaffected by branch renames).

Per-PR contract:

  • Run the change locally on your target platform before opening the PR (build green + manual verification of the affected feature).
  • pr-agent.yml runs one AI review pass per PR — treat it as advisory, do not iterate on it.
  • Keep AI rework rounds tight (1–2). If a fix resists, escalate to a human or restart with fresh context.
  • ci.yml runs on push/PR for both main and beta; no extra wiring needed when adding new branches off beta.

For maintainers:

  • Merge beta → main only after the two-platform (macOS + Windows) smoke build passes. Beta work must not leak to Stable — that gate exists for a reason.
  • Tag v<version>-tauri on main, not on beta. The release workflow keys off the tag, but tagging on main keeps the release commit linear with the always-releasable line.
  • Avoid direct pushes to main outside the beta → main merge — it bypasses the smoke-test gate.

Channel distribution (manual-download opt-in):

  • Tag convention. v<v>-tauri → Stable release (GitHub prerelease=false, manifest latest-{tgt}-{arch}.json). v<v>-beta-tauri → Beta release (GitHub prerelease=true, manifest latest-{tgt}-{arch}-beta.json). The two manifest filenames never overlap, so the in-app updater endpoint (which is fixed at compile time to the no-suffix file) cannot pick up Beta releases. This is the physical isolation that guarantees Beta does not leak to Stable users.
  • Why not auto-update for Beta. tauri-plugin-updater 2.10's Builder does not expose endpoints() — endpoints are only readable from tauri.conf.json at build time and cannot be swapped at runtime. Rather than fork the plugin or write a custom updater (~500 lines, high risk), Beta opt-in is implemented as a manual-download flow: Settings → About has a "Join Beta channel" toggle that, when on, calls fetch_latest_beta_release (GitHub Releases API), shows the latest pre-release tag, and routes the user to the GitHub release page to download manually. No installer signing/install path needs to be re-implemented.
  • Where the wiring lives. Pref field: UserPreferences::update_channel (types.rs). IPC: get_update_channel / set_update_channel / fetch_latest_beta_release (commands.rs). UI: BetaChannelControl inside AboutMini (SettingsModal.tsx). i18n: settings.about.betaChannel* keys.

Release verification checklist (run after every tag push)

Run after pushing either a v*-tauri or v*-beta-tauri tag, before announcing the release:

  1. GitHub Release page matches expectation:
    • Stable tag: not marked Pre-release, in the releases/latest redirect.
    • Beta tag: marked Pre-release, not the target of releases/latest.
  2. Release assets are channel-correct:
    • Stable tag includes latest-{darwin,windows,linux}-{aarch64,x86_64}.json + their -mirror.json siblings, without -beta suffix.
    • Beta tag includes latest-{tgt}-{arch}-beta.json + -beta-mirror.json, without the no-suffix variant.
  3. Stable user flow. Install a Stable build, click Settings → About → Check for updates. After a Stable release: should offer the new version. After a Beta release only: should report "up to date" (Beta must not appear).
  4. Beta user flow. In the same Stable build, toggle on Join Beta channel. The latest Beta tag should appear (or "no Beta released yet"). Clicking the download button should open the corresponding GitHub release page.
  5. Updater endpoint sanity. curl -fsSL https://github.com/appergb/openless/releases/latest/download/latest-darwin-aarch64.json returns the Stable manifest (version field matches the latest Stable tag). It should never return a Beta version, regardless of which tag was pushed most recently.

If any step fails, do not announce the release; investigate release-tauri.yml channel detection (endsWith(github.ref_name, '-beta-tauri')) and the OPENLESS_RELEASE_CHANNEL env propagation in the run logs.

Repo conventions

  • Comments, log messages, user-facing strings, and most docs are in Simplified Chinese. UI strings additionally route through react-i18next (src/i18n/{zh-CN,en}.ts) so we ship English alongside; zh-CN.ts is source of truth.
  • macOS hotkey monitor must use native CGEventTap, never rdev. rdev synchronously calls TSMGetInputSourceProperty from non-main threads, which macOS 14+ aborts via dispatch_assert_queue_fail → SIGTRAP. macOS uses CGEventTap; rdev is only used on Linux/Windows.
  • Don't NSApp.activate on the dictation path — it steals focus and breaks insertion. Only call set_activation_policy(Regular) + activateIgnoringOtherApps from show_main_window / mic-permission prompts, never from start_dictation.
  • Rust modules wrap shared mutable state with Arc<Mutex<...>> (parking_lot). Keep that locking discipline when adding fields.
  • Rust modules depend only on types.rs. New cross-module wiring goes in coordinator.rs, not in the leaf modules.

Adding a new module

  1. Add a <name>.rs (or directory) under openless-all/app/src-tauri/src/, importing only from types.
  2. Register it in lib.rs (mod <name>;).
  3. Wire it into coordinator.rs and expose any frontend-callable surface via commands.rs + invoke_handler!.
  4. Add the matching TS wrapper in openless-all/app/src/lib/ipc.ts (with a mock branch for browser dev).

调研先于编码:派子 agent 查 API / 库 / 平台文档

完整规则在 AGENTS.md Third-party service integrations & library / platform API research 段落(line 171-191)。 这里列的是 Claude Code 入场后用得上的具体工具映射。

触发条件 — 命中任一项都先派子 agent 调研,再下笔

  • 第三方 HTTP API(ASR 厂家 / LLM 端点 / GitHub API / Tauri plugin 服务等)
  • 不熟的 Rust crate / npm 包:连签名和 feature flag 都不确定时
  • 平台 API:Apple Security framework / CoreFoundation / Win32 / Carbon / AppKit
  • 仓库 lock 文件锁着的某版本到底支持什么 — 训练记忆和 Cargo.lock / package-lock.json 实际版本可能不一致
  • 任何跟「训练 cutoff 之后才迭代过」相关的接口

不需要派子 agent

  • 仓库代码里已有现成调用 → rg / grep 找参考即可(仓库即文档)
  • 通用编程 / 算法 / 自己能推导的语言特性
  • 单文件 surgical 改动且改动点的 API 已有用例
  • 查本仓库已有模块(types.rs / coordinator.rs 等)— 直接 Read

工具优先级

1. Context7 MCP(最高优先 — 主流库覆盖广,version-aware)
   - mcp__context7__resolve-library-id  → 拿 library id
   - mcp__context7__query-docs          → 当前版本的官方文档片段

2. documentation-lookup skill
   /skill documentation-lookup —— 包装 Context7,含路由 + 缓存。

3. Agent 子 agent(subagent_type=general-purpose)
   场景:Context7 没覆盖(小众 crate / 新 SDK / 非英文文档),
   或需多源交叉(官方文档 + GitHub README + Stack Overflow)。
   子 agent 用 WebFetch / WebSearch / Context7 综合,回 200-400 字结构化结果。

4. 单点兜底:直接 WebFetch 单页文档(只读最权威一篇时)

子 agent prompt 必备字段

  1. 目标问题:一句话讲清要解决的具体技术问题(不要"了解一下 X"这种空靶)
  2. 本仓库现状:当前 lock 着的版本(Cargo.lock / package-lock.json 拉一下)+ 现有调用点 file:line(若有)
  3. 必须返回的结构:函数/端点签名 → 最小可运行示例(≤20 行)→ 版本兼容范围(vs 训练记忆的核心校验点)→ 已知坑 / 平台差异 / 弃用计划
  4. 禁令:不改本仓库代码;不贴文档原文(distill 关键部分,避免上下文撑爆);多个独立服务分别派 agent — 一个服务一个 agent

反例

  • ✗ 凭训练记忆写第三方 API 调用,假定参数签名就这样
  • ✗ 把整段官方文档 paste 进主上下文
  • ✗ 先写代码再查文档
  • ✗ 单子 agent 同时调研 5 个不相关库(每个独立 prompt + 独立上下文)
  • ✗ 子 agent 返回后跳过 cross-verify 直接写代码 — AGENTS.md 第 4 步要求至少用一次 WebFetch 直接命中官方源核对一项关键事实