-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
Is it platform specific
generic
Importance or Severity
High
Description of the bug
Why it happens:
arp_updateruns in a loop: it periodically comparesAPPL_DBneighbors with the kernel and, for any (IP, interface) that is inAPPL_DBbut not in the kernel, it pings to “repair” the mismatch.- When the test runs
ip neigh flush, the kernel table is cleared whilearp_updateis in the middle of that loop (or right before its next pass). - Right after the flush, the kernel has no neighbors, but
APPL_DBstill has the entries that existed before the flush (e.g.172.16.x.xfrom the test). arp_updatethen sees a large “mismatch”: manyAPPL_DBentries are missing from the kernel. It treats that as “kernel is missing these neighbors” and starts pinging those IPs to repopulate the kernel.- Those pings recreate neighbor entries (which go
FAILED/INCOMPLETEagain for172.16.x.x). Neighsyncd and/or orchagent react to that and recreate the tunnel routes. - So the flush clears the kernel once, but
arp_update’s “mismatch” logic immediately refills the kernel and brings tunnel routes back.
The race between neighsyncd and arp_update has two implications:
- Test case failures due to leftovers
- Real use case scenarios when neighbor(s) and tunnel route(s) synchronization may happen endlessly due to timings
Steps to Reproduce
- Run
test_stress_arp.py
python3 -m pytest arp/test_stress_arp.py --inventory="../ansible/inventory,../ansible/veos" --host-pattern <dut-1>,<dut-2> --module-path ../ansible/library/ --testbed <testbed_name> --setup_name=<setup_name> --testbed_file ../ansible/testbed.yaml --allow_recover --assert plain --log-cli-level info --show-capture=no -ra --showlocals --skip_sanity --store_la_logs --ignore_la_failure -k "ipv4"
Actual Behavior and Expected Behavior
SONiC:
root@sonic:/home/admin# ip -4 ne
172.16.27.98 dev Vlan1000 FAILED
172.16.31.31 dev Vlan1000 FAILED
172.16.12.153 dev Vlan1000 FAILED
172.16.26.112 dev Vlan1000 FAILED
172.16.35.201 dev Vlan1000 FAILED
172.16.15.234 dev Vlan1000 FAILED
172.16.30.109 dev Vlan1000 FAILED
172.16.21.65 dev Vlan1000 FAILED
172.16.24.35 dev Vlan1000 INCOMPLETE
172.16.12.2 dev Vlan1000 FAILED
172.16.13.93 dev Vlan1000 FAILED
172.16.34.215 dev Vlan1000 FAILED
...
root@sonic:/home/admin# redis-cli -n 1 KEYS "*" | grep ":.172.16\|:.fc02:1000" | wc -l
921
The expectation is to have neighbors/routes removed after ip neigh flush all
Relevant log output
SYSLOG:
syslog:2026 Feb 25 16:40:55.641700 sonic INFO python3.13[679724]: ansible-ansible.legacy.command Invoked with _raw_params=ip -stats neigh flush all _uses_shel
l=True expand_argument_vars=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
syslog:2026 Feb 25 16:40:55.649495 sonic NOTICE swss#arp_update[17851]: 114 mismatch arp entry, pinging 172.16.12.50 on Vlan1000
syslog:2026 Feb 25 16:40:55.792361 sonic NOTICE swss#orchagent: :- create_route: Created tunnel route to 172.16.24.124/32
syslog:2026 Feb 25 16:40:55.860267 sonic NOTICE swss#arp_update[17858]: 114 mismatch arp entry, pinging 172.16.4.147 on Vlan1000
syslog:2026 Feb 25 16:40:55.867902 sonic NOTICE swss#orchagent: :- create_route: Created tunnel route to 172.16.28.159/32
syslog:2026 Feb 25 16:40:55.918404 sonic NOTICE swss#orchagent: :- create_route: Created tunnel route to 172.16.31.216/32
syslog:2026 Feb 25 16:40:56.071206 sonic NOTICE swss#arp_update[17865]: 114 mismatch arp entry, pinging 172.16.25.81 on Vlan1000
syslog:2026 Feb 25 16:40:56.120168 sonic NOTICE swss#orchagent: :- create_route: Created tunnel route to 172.16.4.199/32
syslog:2026 Feb 25 16:40:56.281553 sonic NOTICE swss#arp_update[17872]: 114 mismatch arp entry, pinging 172.16.31.128 on Vlan1000
syslog:2026 Feb 25 16:40:56.304025 sonic NOTICE swss#orchagent: :- create_route: Created tunnel route to 172.16.5.236/32
syslog:2026 Feb 25 16:40:56.317230 sonic NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 172.16.7.101/32
Output of show version, show techsupport
- N/A
Attach files (if any)
- N/A
Reactions are currently unavailable