Skip to content

frrcfgd: create unnumbered BGP neighbors and peer-groups from CONFIG_DB#27724

Open
zedzean wants to merge 1 commit into
sonic-net:masterfrom
zedzean:zaahir/frrcfgd-unnumbered-bgp-from-configdb
Open

frrcfgd: create unnumbered BGP neighbors and peer-groups from CONFIG_DB#27724
zedzean wants to merge 1 commit into
sonic-net:masterfrom
zedzean:zaahir/frrcfgd-unnumbered-bgp-from-configdb

Conversation

@zedzean

@zedzean zedzean commented Jun 4, 2026

Copy link
Copy Markdown

Why I did it

In frr_mgmt_framework mode, frrcfgd does not create unnumbered (interface-based) BGP neighbors or their peer-groups from CONFIG_DB when FRR starts from an empty / freshly-provisioned state. The BGP_NEIGHBOR entries are present, but router bgp ends up with no working neighbors. Full root-cause analysis in #27723.

frrcfgd.__init__ pre-seeds its "already created in FRR" trackers (bgp_intf_nbr, bgp_peer_group) from CONFIG_DB, so the one-time create commands are skipped on a cold boot and the following neighbor <ifname> remote-as ... is rejected by FRR with % Create the peer-group or interface first. It is masked on in-service switches because FRR loads its persisted config before frrcfgd starts.

Fixes #27723. Related to #26960 (same capability gap in bgpcfgd).

Work item tracking
  • Microsoft ADO (number only):

How I did it

Do not treat "present in CONFIG_DB" as "already created in FRR":

  • Do not pre-seed bgp_intf_nbr from CONFIG_DB; the apply path creates and records each interface neighbor.
  • Gate peer-group creation on a new bgp_pg_created set (empty at init) instead of the bgp_peer_group model dict, so the create runs while the model dict and its ref_nbrs stay intact.

The create commands (neighbor <ifname> interface, neighbor <name> peer-group) are idempotent in FRR.

How to verify it

On a switch with frr_mgmt_framework_config: true, configure an unnumbered neighbor purely via CONFIG_DB (no pre-existing frr.conf):

sonic-db-cli CONFIG_DB hset 'BGP_PEER_GROUP|default|PG_V6' admin_status up
sonic-db-cli CONFIG_DB hset 'BGP_NEIGHBOR|default|PortChannel1' asn 64999 peer_group_name PG_V6
  • Before: show ip bgp summary shows no working neighbor; syslog has failed running FRR command: neighbor PortChannel1 remote-as ....
  • After: bgpd running-config contains neighbor PortChannel1 interface peer-group PG_V6 and the neighbor is programmed.

Regression checks performed:

  • Warm restart / no flap: with established unnumbered sessions, restarting frrcfgd repeatedly (which now re-issues the idempotent create commands) did not reset peer uptime or change received-prefix counts. Re-creating existing interface peers / peer-groups does not flap sessions.
  • Create + delete: creating and deleting an interface neighbor and a peer-group via CONFIG_DB both applied and removed cleanly (neighbor ... interface / neighbor ... peer-group created; no neighbor ... on delete).

Which release branch to backport

  • 202511

Tested branch (Please provide the tested image version)

  • master (frrcfgd frr_mgmt_framework daemon)

Description for the changelog

frrcfgd: create unnumbered (interface) BGP neighbors and peer-groups from CONFIG_DB on a fresh boot

A picture of a cute animal (not mandatory but encouraged)

https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg

@zedzean zedzean requested a review from lguohan as a code owner June 4, 2026 12:04
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

frrcfgd seeds its "already created in FRR" trackers (bgp_intf_nbr and
bgp_peer_group) from CONFIG_DB at init and uses them to gate the one-time
create commands ('neighbor <ifname> interface', 'neighbor <name>
peer-group'). On a from-scratch boot (empty FRR, CONFIG_DB as the sole
source of FRR config) every interface neighbor / peer-group in CONFIG_DB is
marked "already created" before it has been created, so the create is
skipped and the subsequent 'neighbor <ifname> remote-as ...' is rejected
with '% Create the peer-group or interface first'. router bgp ends up with
no working unnumbered neighbors.

This is masked on in-service switches because FRR loads its persisted
config before frrcfgd starts, so the objects already exist.

Fix: do not treat presence in CONFIG_DB as proof of FRR state.
- Do not pre-seed bgp_intf_nbr from CONFIG_DB; the apply path creates and
  records each interface neighbor.
- Gate peer-group creation on a new bgp_pg_created set (empty at init)
  instead of the bgp_peer_group model dict, so the create runs while the
  model dict and its ref_nbrs stay intact.

The create commands are idempotent in FRR, so re-issuing them on a warm
restart does not flap established sessions.

Signed-off-by: Zaahir Ahmed Syed <zaahir@cloudflare.com>
@zedzean zedzean force-pushed the zaahir/frrcfgd-unnumbered-bgp-from-configdb branch from 370a558 to 422711d Compare June 4, 2026 12:12
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld

Copy link
Copy Markdown
Collaborator

This PR has backport request for branch(es): 202511.
Added label(s) for branch(es) 202511.

---Powered by SONiC BuildBot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

frrcfgd: unnumbered (interface) BGP neighbors not created from CONFIG_DB on a fresh boot

3 participants