frrcfgd: create unnumbered BGP neighbors and peer-groups from CONFIG_DB#27724
Open
zedzean wants to merge 1 commit into
Open
frrcfgd: create unnumbered BGP neighbors and peer-groups from CONFIG_DB#27724zedzean wants to merge 1 commit into
zedzean wants to merge 1 commit into
Conversation
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
frrcfgd seeds its "already created in FRR" trackers (bgp_intf_nbr and
bgp_peer_group) from CONFIG_DB at init and uses them to gate the one-time
create commands ('neighbor <ifname> interface', 'neighbor <name>
peer-group'). On a from-scratch boot (empty FRR, CONFIG_DB as the sole
source of FRR config) every interface neighbor / peer-group in CONFIG_DB is
marked "already created" before it has been created, so the create is
skipped and the subsequent 'neighbor <ifname> remote-as ...' is rejected
with '% Create the peer-group or interface first'. router bgp ends up with
no working unnumbered neighbors.
This is masked on in-service switches because FRR loads its persisted
config before frrcfgd starts, so the objects already exist.
Fix: do not treat presence in CONFIG_DB as proof of FRR state.
- Do not pre-seed bgp_intf_nbr from CONFIG_DB; the apply path creates and
records each interface neighbor.
- Gate peer-group creation on a new bgp_pg_created set (empty at init)
instead of the bgp_peer_group model dict, so the create runs while the
model dict and its ref_nbrs stay intact.
The create commands are idempotent in FRR, so re-issuing them on a warm
restart does not flap established sessions.
Signed-off-by: Zaahir Ahmed Syed <zaahir@cloudflare.com>
370a558 to
422711d
Compare
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
This PR has backport request for branch(es): 202511. ---Powered by SONiC BuildBot
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
In
frr_mgmt_frameworkmode,frrcfgddoes not create unnumbered (interface-based) BGP neighbors or their peer-groups from CONFIG_DB when FRR starts from an empty / freshly-provisioned state. TheBGP_NEIGHBORentries are present, butrouter bgpends up with no working neighbors. Full root-cause analysis in #27723.frrcfgd.__init__pre-seeds its "already created in FRR" trackers (bgp_intf_nbr,bgp_peer_group) from CONFIG_DB, so the one-time create commands are skipped on a cold boot and the followingneighbor <ifname> remote-as ...is rejected by FRR with% Create the peer-group or interface first. It is masked on in-service switches because FRR loads its persisted config before frrcfgd starts.Fixes #27723. Related to #26960 (same capability gap in
bgpcfgd).Work item tracking
How I did it
Do not treat "present in CONFIG_DB" as "already created in FRR":
bgp_intf_nbrfrom CONFIG_DB; the apply path creates and records each interface neighbor.bgp_pg_createdset (empty at init) instead of thebgp_peer_groupmodel dict, so the create runs while the model dict and itsref_nbrsstay intact.The create commands (
neighbor <ifname> interface,neighbor <name> peer-group) are idempotent in FRR.How to verify it
On a switch with
frr_mgmt_framework_config: true, configure an unnumbered neighbor purely via CONFIG_DB (no pre-existingfrr.conf):show ip bgp summaryshows no working neighbor; syslog hasfailed running FRR command: neighbor PortChannel1 remote-as ....neighbor PortChannel1 interface peer-group PG_V6and the neighbor is programmed.Regression checks performed:
frrcfgdrepeatedly (which now re-issues the idempotent create commands) did not reset peer uptime or change received-prefix counts. Re-creating existing interface peers / peer-groups does not flap sessions.neighbor ... interface/neighbor ... peer-groupcreated;no neighbor ...on delete).Which release branch to backport
Tested branch (Please provide the tested image version)
frr_mgmt_frameworkdaemon)Description for the changelog
frrcfgd: create unnumbered (interface) BGP neighbors and peer-groups from CONFIG_DB on a fresh boot
A picture of a cute animal (not mandatory but encouraged)
https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg