Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- **Firmware: SPI flash cache crash under high CSI callback pressure** (RuView#396, #397) — ESP32-S3 nodes crashed in `cache_ll_l1_resume_icache` / `wDev_ProcessFiq` after ~2400 callbacks when the promiscuous filter admitted DATA frames at 100–500 Hz. Fixed by narrowing the filter mask to `WIFI_PROMIS_FILTER_MASK_MGMT` (~10 Hz beacons), adding a 50 Hz early callback rate gate (`CSI_MIN_PROCESS_INTERVAL_US`) that drops excess callbacks before any processing work, and enabling `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` as defense-in-depth. Stability validated with a 4-min-per-node soak.
- **Firmware: `filter_mac` / `node_id` clobber by WiFi driver init** (#232, #375, #385, #386, #390, #397) — `g_nvs_config` can be corrupted during `wifi_init_sta()` on some devices (confirmed on `80:b5:4e:c1:be:b8`), reverting `node_id` to the Kconfig default and producing garbage MAC-filter reads in the CSI callback (100–500 Hz). New `csi_collector_set_node_id()` API called from `app_main()` **before** `wifi_init_sta()` captures both fields into module-local statics (`s_node_id`, `s_filter_mac`, `s_filter_mac_set`). `csi_collector_init()` now runs a canary that distinguishes "early≠g_nvs_config" (corruption confirmed) from a no-op match. All CSI runtime paths use the defensive copies exclusively.
- **Firmware: `edge_processing` sample rate mismatch** (#397) — `estimate_bpm_zero_crossing()` was called with a hard-coded `sample_rate = 20.0f`, but MGMT-only promiscuous delivers ~10 Hz. Breathing and heart-rate reports were 2× too high. Corrected to `10.0f` with an explicit comment tying it to the callback rate.
- **`provision.py` esptool command form** (#391, #397) — ESP-IDF v5.4 bundles `esptool 4.10.0`, which only accepts `write_flash` (underscore). Standalone `pip install esptool` v5.x accepts both forms but prefers `write-flash`. #391 switched to `write-flash` which broke the documented ESP-IDF Python venv flow; #397 reverts to `write_flash` (works with both esptool 4.x and 5.x) with an inline comment warning future maintainers not to "re-fix" it.
- **`provision.py` esptool v5 dry-run hint** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
- **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped.
- **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame.

### Docs
- **CHANGELOG catch-up** (#367) — Added missing entries for v0.5.5, v0.6.0, and v0.7.0 releases.

## [v0.6.2-esp32] — 2026-04-20

Firmware release cutting ADR-081 and the Timer Svc stack fix discovered during
Expand Down Expand Up @@ -121,14 +133,6 @@ firing cleanly, HEALTH mesh packets sent.
conservative (`enable_channel_switch` and `enable_role_change` off);
Kconfig surface added under "Adaptive Controller (ADR-081)".

### Fixed
- **`provision.py` esptool v5 compat** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
- **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped.
- **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame.

### Docs
- **CHANGELOG catch-up** (#367) — Added missing entries for v0.5.5, v0.6.0, and v0.7.0 releases.

## [v0.7.0] — 2026-04-06

Model release (no new firmware binary). Firmware remains at v0.6.0-esp32.
Expand Down
133 changes: 106 additions & 27 deletions firmware/esp32-csi-node/main/csi_collector.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,20 @@
/* ADR-060: Access the global NVS config for MAC filter and channel override. */
extern nvs_config_t g_nvs_config;

/* Defensive fix (#232, #375, #385, #386, #390): capture node_id at init-time
* into a module-local static. Using the global g_nvs_config.node_id directly
* at every callback is vulnerable to any memory corruption that clobbers the
* struct (which users have reported reverting node_id to the Kconfig default
* of 1). The local copy is set once at csi_collector_init() and then used
* exclusively by csi_serialize_frame(). */
/* Defensive fix (#232, #375, #385, #386, #390): capture NVS config fields into
* module-local statics BEFORE wifi_init_sta() runs, because WiFi driver init
* can corrupt g_nvs_config (confirmed on device 80:b5:4e:c1:be:b8).
* main.c calls csi_collector_set_node_id() immediately after nvs_config_load(),
* and all runtime paths use the local copies exclusively. */
static uint8_t s_node_id = 1;
static bool s_node_id_early_set = false;

/* Defensive copy of MAC filter config — the CSI callback fires at 100-500 Hz
* and reads filter_mac_set + filter_mac on every invocation. If wifi_init_sta()
* corrupts g_nvs_config, the callback would read garbage, potentially causing
* LoadProhibited panics (observed: Core 0 panic after ~2400 callbacks). */
static uint8_t s_filter_mac[6] = {0};
static bool s_filter_mac_set = false;

/* ADR-057: Build-time guard — fail early if CSI is not enabled in sdkconfig.
* Without this, the firmware compiles but crashes at runtime with:
Expand Down Expand Up @@ -60,6 +67,24 @@ static uint32_t s_rate_skip = 0;
#define CSI_MIN_SEND_INTERVAL_US (20 * 1000)
static int64_t s_last_send_us = 0;

/**
* Minimum interval between processing ANY CSI callback in microseconds.
* Promiscuous MGMT+DATA can fire 100-500+ times/sec. At rates above ~50 Hz,
* the WiFi FIQ handler (wDev_ProcessFiq) races with SPI flash cache operations,
* causing Core 0 LoadProhibited panics in cache_ll_l1_resume_icache.
*
* This early gate drops excess callbacks BEFORE any processing (serialization,
* UDP, edge enqueue), keeping the effective callback rate at ~50 Hz while
* preserving the full MGMT+DATA promiscuous filter and HT-LTF/STBC CSI quality.
*
* The WiFi hardware still captures all frames and the CSI data is generated,
* but we simply discard the excess in software. This reduces the time spent
* in callback context per second, giving the WiFi ISR more headroom.
*/
#define CSI_MIN_PROCESS_INTERVAL_US (20 * 1000) /* 50 Hz */
static int64_t s_last_process_us = 0;
static uint32_t s_early_drop = 0;

/* ---- ADR-029: Channel-hop state ---- */

/** Channel hop table (populated from NVS at boot or via set_hop_table). */
Expand Down Expand Up @@ -165,9 +190,20 @@ static void wifi_csi_callback(void *ctx, wifi_csi_info_t *info)
{
(void)ctx;

/* ADR-060: MAC address filtering — drop frames from non-matching sources. */
if (g_nvs_config.filter_mac_set) {
if (memcmp(info->mac, g_nvs_config.filter_mac, 6) != 0) {
/* Early rate gate: drop excess callbacks to ~50 Hz to prevent
* SPI flash cache crash in WiFi ISR (wDev_ProcessFiq). */
int64_t now_us = esp_timer_get_time();
if ((now_us - s_last_process_us) < CSI_MIN_PROCESS_INTERVAL_US) {
s_early_drop++;
return;
}
s_last_process_us = now_us;

/* ADR-060: MAC address filtering — drop frames from non-matching sources.
* Uses defensively-copied s_filter_mac instead of g_nvs_config (which can
* be corrupted by wifi_init_sta — same root cause as the node_id clobber). */
if (s_filter_mac_set) {
if (memcmp(info->mac, s_filter_mac, 6) != 0) {
return; /* Source MAC doesn't match filter — skip frame. */
}
}
Expand Down Expand Up @@ -222,14 +258,60 @@ static void wifi_promiscuous_cb(void *buf, wifi_promiscuous_pkt_type_t type)
(void)type;
}

void csi_collector_set_node_id(uint8_t node_id)
{
s_node_id = node_id;
s_node_id_early_set = true;
ESP_LOGI(TAG, "Early capture node_id=%u (before WiFi init, #232/#390)",
(unsigned)node_id);

/* Also capture MAC filter config now — same struct, same corruption risk.
* The CSI callback reads filter_mac_set on every invocation (100-500 Hz),
* so a corrupted value could cause erratic filtering or crash. */
s_filter_mac_set = (g_nvs_config.filter_mac_set != 0);
if (s_filter_mac_set) {
memcpy(s_filter_mac, g_nvs_config.filter_mac, 6);
ESP_LOGI(TAG, "Early capture filter_mac=%02x:%02x:%02x:%02x:%02x:%02x",
s_filter_mac[0], s_filter_mac[1], s_filter_mac[2],
s_filter_mac[3], s_filter_mac[4], s_filter_mac[5]);
}
}

void csi_collector_init(void)
{
/* Capture node_id into module-local static at init time. After this point
* csi_serialize_frame() uses s_node_id exclusively, isolating the UDP
* frame node_id field from any memory corruption of g_nvs_config. */
s_node_id = g_nvs_config.node_id;
ESP_LOGI(TAG, "Captured node_id=%u at init (defensive copy for #232/#375/#385/#390)",
(unsigned)s_node_id);
if (!s_node_id_early_set) {
/* Fallback: no early capture — use current g_nvs_config (may be clobbered). */
s_node_id = g_nvs_config.node_id;
ESP_LOGW(TAG, "Late capture node_id=%u (no early set_node_id call)",
(unsigned)s_node_id);
} else if (g_nvs_config.node_id != s_node_id) {
/* Canary: early capture disagrees with current g_nvs_config — corruption
* happened between nvs_config_load() and here (likely wifi_init_sta). */
ESP_LOGW(TAG, "node_id clobber CONFIRMED: early=%u g_nvs_config=%u "
"(WiFi init likely corrupted struct, using early value)",
(unsigned)s_node_id, (unsigned)g_nvs_config.node_id);
} else {
ESP_LOGI(TAG, "node_id=%u verified (early capture matches g_nvs_config)",
(unsigned)s_node_id);
}

/* Canary for filter_mac: check if WiFi init corrupted the filter fields. */
if (s_node_id_early_set) {
bool mac_set_now = (g_nvs_config.filter_mac_set != 0);
if (mac_set_now != s_filter_mac_set) {
ESP_LOGW(TAG, "filter_mac_set clobber CONFIRMED: early=%d g_nvs_config=%d",
(int)s_filter_mac_set, (int)mac_set_now);
} else if (s_filter_mac_set &&
memcmp(s_filter_mac, g_nvs_config.filter_mac, 6) != 0) {
ESP_LOGW(TAG, "filter_mac clobber CONFIRMED: bytes differ after WiFi init");
}
} else {
/* No early capture — grab filter config now (may already be corrupted). */
s_filter_mac_set = (g_nvs_config.filter_mac_set != 0);
if (s_filter_mac_set) {
memcpy(s_filter_mac, g_nvs_config.filter_mac, 6);
}
}

/* ADR-060: Determine the CSI channel.
* Priority: 1) NVS override (--channel), 2) connected AP channel, 3) Kconfig default. */
Expand Down Expand Up @@ -260,12 +342,19 @@ void csi_collector_init(void)
ESP_ERROR_CHECK(esp_wifi_set_promiscuous(true));
ESP_ERROR_CHECK(esp_wifi_set_promiscuous_rx_cb(wifi_promiscuous_cb));

/* MGMT-only promiscuous filter + active probe injection (RuView#396).
*
* DATA frames cause 100-500+ WiFi HW interrupts/sec which crashes Core 0
* in wDev_ProcessFiq (SPI flash cache race in ESP-IDF WiFi blob).
* MGMT-only gives ~10 Hz (beacons). Probe request injection at 10 Hz
* adds ~10 Hz probe responses from APs → ~20 Hz total, matching the
* edge processing designed sample rate of 20 Hz. */
wifi_promiscuous_filter_t filt = {
.filter_mask = WIFI_PROMIS_FILTER_MASK_MGMT | WIFI_PROMIS_FILTER_MASK_DATA,
.filter_mask = WIFI_PROMIS_FILTER_MASK_MGMT,
};
ESP_ERROR_CHECK(esp_wifi_set_promiscuous_filter(&filt));

ESP_LOGI(TAG, "Promiscuous mode enabled for CSI capture");
ESP_LOGI(TAG, "Promiscuous mode enabled (MGMT-only, RuView#396)");

wifi_csi_config_t csi_config = {
.lltf_en = true,
Expand All @@ -290,16 +379,6 @@ void csi_collector_init(void)

ESP_LOGI(TAG, "CSI collection initialized (node_id=%u, channel=%u)",
(unsigned)s_node_id, (unsigned)csi_channel);

/* Clobber-detection canary: if g_nvs_config.node_id no longer matches the
* value we captured, something corrupted the struct between nvs_config_load
* and here. This is the historic #232/#375 symptom. */
if (g_nvs_config.node_id != s_node_id) {
ESP_LOGW(TAG, "node_id clobber detected: captured=%u but g_nvs_config=%u "
"(frames will use captured value %u). Please report to #390.",
(unsigned)s_node_id, (unsigned)g_nvs_config.node_id,
(unsigned)s_node_id);
}
}

/* Accessor for other modules that need the authoritative runtime node_id. */
Expand Down
22 changes: 16 additions & 6 deletions firmware/esp32-csi-node/main/csi_collector.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,24 @@
void csi_collector_init(void);

/**
* Get the runtime node_id captured at csi_collector_init().
* Capture node_id BEFORE wifi_init_sta() or any other heavy init.
*
* This is a defensive copy of g_nvs_config.node_id taken at init time. Other
* modules (edge_processing, wasm_runtime, display_ui) should prefer this
* accessor over reading g_nvs_config.node_id directly, because the global
* struct can be clobbered by memory corruption (see #232, #375, #385, #390).
* Must be called from app_main() immediately after nvs_config_load().
* WiFi driver initialization can corrupt g_nvs_config.node_id (confirmed
* on device 80:b5:4e:c1:be:b8, NVS=3 but post-WiFi reads as 1).
* This early capture shields s_node_id from that corruption window.
*
* @return Node ID (0-255) as loaded from NVS or Kconfig default at boot.
* @param node_id Value from g_nvs_config.node_id, read right after NVS load.
*/
void csi_collector_set_node_id(uint8_t node_id);

/**
* Get the runtime node_id (early capture if available, otherwise init-time).
*
* Other modules (edge_processing, wasm_runtime, display_ui) should prefer
* this accessor over reading g_nvs_config.node_id directly.
*
* @return Node ID (0-255) as loaded from NVS at boot.
*/
uint8_t csi_collector_get_node_id(void);

Expand Down
7 changes: 5 additions & 2 deletions firmware/esp32-csi-node/main/edge_processing.c
Original file line number Diff line number Diff line change
Expand Up @@ -714,8 +714,11 @@ static void process_frame(const edge_ring_slot_t *slot)
s_frame_count++;
s_latest_rssi = slot->rssi;

/* Assumed CSI sample rate (~20 Hz for typical ESP32 CSI). */
const float sample_rate = 20.0f;
/* CSI sample rate. MGMT-only promiscuous filter (RuView#396, csi_collector.c)
* yields ~10 Hz from beacons; keep this value aligned with csi_collector's
* effective callback rate or estimate_bpm_zero_crossing() reports the wrong
* BPM (2× rate mismatch → 2× wrong breathing/HR). */
const float sample_rate = 10.0f;

/* --- Step 1-2: Phase extraction + unwrapping per subcarrier --- */
float phases[EDGE_MAX_SUBCARRIERS];
Expand Down
5 changes: 5 additions & 0 deletions firmware/esp32-csi-node/main/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,11 @@ void app_main(void)
/* Load runtime config (NVS overrides Kconfig defaults) */
nvs_config_load(&g_nvs_config);

/* Capture node_id IMMEDIATELY — before wifi_init_sta() can corrupt
* g_nvs_config. See #232/#375/#390: WiFi driver init clobbers the struct
* on some devices, reverting node_id to the Kconfig default of 1. */
csi_collector_set_node_id(g_nvs_config.node_id);

const esp_app_desc_t *app_desc = esp_app_get_description();
ESP_LOGI(TAG, "ESP32-S3 CSI Node (ADR-018) — v%s — Node ID: %d",
app_desc->version, g_nvs_config.node_id);
Expand Down
5 changes: 4 additions & 1 deletion firmware/esp32-csi-node/provision.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,10 @@ def flash_nvs(port, baud, nvs_bin):
"--chip", "esp32s3",
"--port", port,
"--baud", str(baud),
"write-flash",
# Keep underscore form — ESP-IDF v5.4 bundles esptool 4.10.0 which only
# accepts "write_flash". pip's esptool >=5.x accepts both (hyphenated
# form preferred) but keeps underscore working. Do not "correct" this.
"write_flash",
hex(NVS_PARTITION_OFFSET), bin_path,
]
print(f"Flashing NVS partition ({len(nvs_bin)} bytes) to {port}...")
Expand Down
3 changes: 3 additions & 0 deletions firmware/esp32-csi-node/sdkconfig.defaults
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ CONFIG_LWIP_SO_RCVBUF=y
# FreeRTOS: increase task stack for CSI processing
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192

# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race)
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y

# ADR-081: adaptive_controller runs emit_feature_state + stream_sender
# network I/O inside Timer Svc callbacks, exceeding the 2 KiB default.
CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192
3 changes: 3 additions & 0 deletions firmware/esp32-csi-node/sdkconfig.defaults.4mb
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ CONFIG_LOG_DEFAULT_LEVEL_INFO=y
CONFIG_LWIP_SO_RCVBUF=y
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192

# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race)
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y

# ADR-081: adaptive_controller runs emit_feature_state + stream_sender
# network I/O inside Timer Svc callbacks, exceeding the 2 KiB default.
CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192
3 changes: 3 additions & 0 deletions firmware/esp32-csi-node/sdkconfig.defaults.template
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ CONFIG_LWIP_SO_RCVBUF=y
# FreeRTOS: increase task stack for CSI processing
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192

# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race)
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y

# ADR-081: adaptive_controller runs emit_feature_state + stream_sender
# network I/O inside Timer Svc callbacks, exceeding the 2 KiB default.
CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192
2 changes: 1 addition & 1 deletion firmware/esp32-csi-node/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.6.2
0.6.3