Skip to content

fix: retry parental control rule writes through firewall settle window#1264

Open
paul43210 wants to merge 1 commit into
Vaskivskyi:devfrom
paul43210:fix/pc-rule-retry-during-firewall-settle
Open

fix: retry parental control rule writes through firewall settle window#1264
paul43210 wants to merge 1 commit into
Vaskivskyi:devfrom
paul43210:fix/pc-rule-retry-during-firewall-settle

Conversation

@paul43210

Copy link
Copy Markdown

Problem

After any parental control rule write, BT8 returns:

{ "modify": "1", "run_service": "restart_firewall", "restart_needed_time": 53 }

The restart_needed_time value (typically 53 seconds on BT8) is the window the router needs to settle a firewall restart before it'll accept another PC rule change. Writes inside that window are silently rejected at the firewall enforcement layer: the underlying asusrouter library's async_set_state(rule) returns False, even though the previous write went through correctly.

Today the integration's response on rejection is to log a WARNING and move on:

# bridge.py async_pc_rule
result = await self.api.async_set_state(rule)
if result is True:
    _LOGGER.debug("Parental control rule set: %s", rule)
else:
    _LOGGER.warning("Cannot set parental control rule: %s", rule)

ClientInternetSwitch._set_state has the same shape (silent failure on the switch entity's own async_turn_off).

Consumers (the device_internet_access service, the switch entity, downstream automations) see this as a permanent failure when the write would have succeeded if retried after the settle window.

In practice this surfaces as:

  • Stuck switch.X_block_internet entities (on after a remove that "didn't take")
  • Downstream automations declaring failure on what should have been a transient race
  • Repeated user retries that each fail the same way until ~60s has passed

Fix

Add a helper async_set_pc_rule_with_retry() on ARDevice (router.py) that:

  1. Calls bridge.api.async_set_state(rule). On success, returns True.
  2. On rejection, waits settle_seconds (default 60s — the observed 53s + margin).
  3. Refreshes via self.update_pc_rules() (the library's 5s cache is long expired after the sleep).
  4. If the cache now reflects the requested state (router applied the rule late), returns True without retry.
  5. Otherwise retries the write once. If THAT fails too, returns False — this is a genuine refusal.

Wired in three places:

  • bridge.async_pc_rule() — accepts an optional router= kwarg. When provided, routes each rule's write through the helper. Without it, falls back to the current direct api.async_set_state call (backward-compat for any external caller).
  • router.async_service_device_internet_access() — passes router=self when calling bridge.async_pc_rule().
  • switch.ClientInternetSwitch._set_state() — calls self._router.async_set_pc_rule_with_retry(state) instead of going direct to the API.

Why a helper on the router

The retry needs access to BOTH bridge.api (to do the write) AND _pc_rules (to verify post-settle state). The router owns both. Placing the helper on the bridge would require a backref the bridge currently doesn't have, and breaks the bridge's "pure adapter to the library" role.

Settle window source

The restart_needed_time value is currently logged at DEBUG by the underlying asusrouter library (asusrouter.modules.service) but not surfaced to the integration. Reading it through the public library API would require a separate upstream change.

This PR uses a hardcoded 60s default (observed 53s on BT8 + ~7s margin). A follow-up could surface restart_needed_time from the library and feed it dynamically — out of scope here to keep the PR small.

Empirical evidence

Captured during a live debug session:

# Successful BLOCK (no preceding write, fresh write window):
DEBUG asusrouter.modules.service Service `restart_firewall` run with arguments
  {'MULTIFILTER_MAC': 'A0:92:08:B6:F5:56>90:39:5F:DF:FF:16>A0:F2:62:85:B2:20', ...}
  Result: {'modify': '1', 'run_service': 'restart_firewall', 'restart_needed_time': 53}

# REMOVE fired 43s later (inside the 53s window):
WARNING custom_components.asusrouter.bridge Cannot set parental control rule:
  ParentalControlRule(mac='A0:F2:62:85:B2:20', ..., type=<PCRuleType.REMOVE: -1>)

# Same REMOVE fired again 3+ hours later (outside the window):
DEBUG asusrouter.modules.service Service `restart_firewall` run with arguments
  {'MULTIFILTER_MAC': 'A0:92:08:B6:F5:56>90:39:5F:DF:FF:16', 'MULTIFILTER_ENABLE': '2>2', ...}
  Result: {'modify': '1', 'restart_needed_time': 53}
DEBUG custom_components.asusrouter.bridge Parental control rule set:
  ParentalControlRule(mac='A0:F2:62:85:B2:20', ..., type=<PCRuleType.REMOVE: -1>)

Same rule, same payload, same MULTIFILTER format. Only difference: time-since-previous-write. Confirms the rejection is a transient settle-window issue, not a malformed-payload or auth issue.

Testing

Manual reproducer

  1. Pick a throwaway MAC (e.g., an ESP32 you don't mind blocking briefly).

  2. Fire a BLOCK from HA developer tools:

    service: asusrouter.device_internet_access
    data:
      devices: [{ mac: "AA:BB:CC:DD:EE:FF", name: "throwaway" }]
      state: block
  3. Within 60 seconds, fire a REMOVE for the same MAC.

    Before this PR: WARNING "Cannot set parental control rule" in logs; switch entity stays on; HA state stuck.

    After this PR: INFO log "PC rule write rejected by router (likely firewall settle window); waiting 60s ...". ~60s later, INFO log "PC rule removal already applied during settle window" OR "PC rule set on retry". Switch entity transitions correctly. HA state correct.

Switch entity reproducer

  1. Same as above but step 3 uses the switch UI: tap-toggle switch.<mac>_block_internet off within 60s of the initial block.
  2. Expected behavior matches above — retry kicks in via ClientInternetSwitch._set_state.

Regression checks

  • BLOCK on a fresh MAC (no preceding write) still completes in one call without hitting the retry path.
  • Operations done OUTSIDE the settle window (>60s after the previous write) still succeed in one call.
  • A genuine refusal (e.g., router in some other failure mode) returns False from the helper after the single retry — surfaces as a WARNING and propagates up.

Notes / open questions

After any PC rule write, the router returns restart_needed_time (~53s on
BT8) indicating how long the firewall daemon needs to apply the change.
Writes that arrive inside that window are accepted at the NVRAM layer
but rejected at the firewall enforcement layer — the underlying
asusrouter library's async_set_state(rule) returns False for them.

Today the integration's response is to log a WARNING and move on,
exposing the rejection to downstream consumers (the
device_internet_access service, the ClientInternetSwitch entity, and any
automation built on top) as a permanent failure when the write would
have succeeded if retried after the settle window.

This change adds async_set_pc_rule_with_retry on ARDevice that:
  - waits the settle window on rejection (default 60s)
  - refreshes update_pc_rules() so the cache reflects BT8's actual state
  - returns True if the cache now matches intent (BT8 applied late)
  - otherwise retries the write once before declaring genuine failure

Routes both the device_internet_access service path (via
bridge.async_pc_rule with new optional router= kwarg) and
ClientInternetSwitch._set_state through the helper. Backward-compat
preserved on bridge.async_pc_rule for callers that don't pass a router.
@paul43210

Copy link
Copy Markdown
Author

Heads-up before review: same disclosure pattern as #1255 — this PR was largely AI-drafted (Claude, Anthropic). The code transform, prose, and reproducer text are AI-generated. The diagnosis (capturing the BT8 restart_needed_time: 53 response and correlating the timing of a successful BLOCK followed by a rejected REMOVE 43s later vs a successful REMOVE 3 hours later with identical payload) is mine — Python is not my strong suit, hence the AI-assisted code transform.

Happy to elaborate on the diagnosis, rework the change, or close if the timing / approach isn't a fit. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Incoming

Development

Successfully merging this pull request may close these issues.

1 participant