Skip to content

Commit 8efb1bc

Browse files
committed
BMC flows in SONiC
Signed-off-by: Yuanzhe, Liu <[email protected]>
1 parent 08b917a commit 8efb1bc

File tree

5 files changed

+131
-0
lines changed

5 files changed

+131
-0
lines changed

doc/bmc/bmc_hld.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Support BMC flows in SONiC
2+
3+
## 1. BMC and Redfish
4+
Board Management Controller (BMC) is a specialized microcontroller embedded on a motherboard. It manages the interface between system management software and hardware. BMC provides out-of-band management capabilities, allowing administrators to monitor and manage hardware remotely.
5+
OpenBMC is an open-source project that provides a Linux-based firmware stack for Board Management Controllers (BMCs). It implements the Redfish standard, allowing for standardized and secure remote management of server hardware. In essence, OpenBMC serves as the software that runs on BMC hardware, utilizing the Redfish API to facilitate efficient hardware management.
6+
Redfish is a standard for managing and interacting with hardware in a datacenter, designed to be simple, secure, and scalable. It works with BMC to provide a RESTful API for remote management of servers. Together, Redfish and BMC enable efficient and standardized hardware management.
7+
In summary, NOS will deal with BMC through the redfish RESTful API.
8+
9+
10+
## 2. BMC flows in SONiC
11+
The implementation is straightforward: SONiC will incorporate a Redfish client as the underlying infrastructure to support the BMC action. This Redfish client object is implemented and initialized at runtime by SONiC itself.
12+
![general flow](https://github.com/sonic-net/SONiC/blob/30d7b3524e1e1f25abb4679f7ffa777eabe9f499/images/bmc/bmc_overall_flow.png)
13+
14+
## 3. BMC ip address initialization
15+
This is the flow of the bmc ip address configuration:
16+
- device/platform/bmc.json contains bmc_if_name,bmc_if_addr,bmc_addr,bmc_net_mask
17+
- src/sonic-py-common/sonic_py_common/device_info.py::get_bmc_data read the bmc.json
18+
- src/sonic-config-engine/sonic-cfggen::main call to device_info.get_bmc_data and write it to DEVICE_METADATA|bmc (This field will be added to DEVICE_METADATA )
19+
- files/image_config/interfaces/interfaces.j2 read DEVICE_METADATA|bmc write to /etc/network/interfaces:
20+
```
21+
auto usb0
22+
iface usb0 inet static
23+
address <address>
24+
netmask <netmask>
25+
```
26+
27+
![ip address init flow](https://github.com/sonic-net/SONiC/blob/30d7b3524e1e1f25abb4679f7ffa777eabe9f499/images/bmc/bmc_ip_set_flow.png)
28+
29+
30+
## 3. BMC firmware upgrade flow
31+
32+
It requires a new ComponenetBMC object to be added to the component.py
33+
34+
![firmware upgrade flow](https://github.com/sonic-net/SONiC/blob/58f1fda2ea4e73d86a0f477f2901129a773d0439/images/bmc/bmc_firmware_upgrade_flow.png)
35+
36+
## 4. BMC platfrom common api scope
37+
Because in SONiC, each command will be executed as a separate process, nothing will be shared between 2 commands. This requires 2 separate BMC RF sessions, so to avoid exhausting session numbers, we will have a logout call after each of the commands executed.
38+
Thus, there will be a python decorator used for each API/fucntion, for both login and logout.
39+
```
40+
APIs inherited from Device Base
41+
get_name()
42+
get_presence()
43+
get_model()
44+
get_serial()
45+
get_revision()
46+
get_status()
47+
is_replaceable()
48+
49+
BMC general APIs
50+
“return dictionary (Manufacturer, Model, PartNumber, PowerState, SerialNumber) to show the eeprom info or exception with the failure reason
51+
get_eeprom()
52+
53+
“return string to show the firmware version or exception with the failure reason
54+
get_version()
55+
56+
57+
“return True for success or exception with the failure reason
58+
reset_root_password()
59+
60+
“return True for success or exception with the failure reason
61+
trigger_bmc_debug_log_dump()
62+
63+
“return dest of the dump file or exception with the failure reason
64+
get_bmc_debug_log_dump(task_id, filename, path)
65+
66+
“param fw_image: string to indicate the path of the firmware image
67+
“return true for success or exception false with the failure reason
68+
update_firmware(fw_image)
69+
70+
```
71+
72+
## 5. CLI commands
73+
```
74+
show platform bmc summary
75+
---------------------------
76+
Manufacturer: XXXXX
77+
Model: XXXXX
78+
PartNumber: XXXXX
79+
SerialNumber: XXXXX
80+
PowerState: XXXXX
81+
FirmwareVersion: XXXXX
82+
83+
show platform firmware status
84+
Component Version Description
85+
----------- ------------------------- ----------------------------------------
86+
ONIE XXXXXXXXXXXXXXXXXXXXXXXXX ONIE - Open Network Install Environment
87+
SSD XXXXXXXXXXXXXXXXXXXXXXXXX SSD - Solid-State Drive
88+
BIOS XXXXXXXXXXXXXXXXXXXXXXXXX BIOS - Basic Input/Output System
89+
CPLD1 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device
90+
CPLD2 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device
91+
CPLD3 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device
92+
BMC XXXXXXXXXXXXXXXXXXXXXXXXX BMC – Board Management Controller
93+
94+
show platform bmc eeprom
95+
---------------------------
96+
Manufacturer: XXXXX
97+
Model: XXXXX
98+
PartNumber: XXXXX
99+
PowerState: XXXXX
100+
SerialNumber: XXXXX
101+
102+
config platform firmware install component BMC fw -y ${BMC_IMAGE}
103+
104+
```
105+
106+
## 6. show techsupport
107+
Bmc dump will be included in the show techsupport, trigger_bmc_debug_log_dump() and get_bmc_debug_log_dump() shall be called by the generate-dump script.
108+
109+
### 6.1. Overview
110+
The 'show techsupport' command is extended to collect BMC dump logs via Redfish API.
111+
This integration is non-blocking and asynchronous:
112+
It triggers a BMC dump task at the start of the script, then continues with regular
113+
system data collection. Before the script finishes, it collects the dump from BMC
114+
using the task ID previously received.
115+
The design ensures that BMC issues (timeouts, failures, unsupported platforms)
116+
do not block or interrupt the standard dump flow.
117+
118+
### 6.2 High-Level Diagram
119+
![show techsupport flow](https://github.com/sonic-net/SONiC/blob/30d7b3524e1e1f25abb4679f7ffa777eabe9f499/images/bmc/show_techsupport_flow.png)
120+
121+
### 6.3 Errors Handling:
122+
- generate_dump check whether BMC is suppported (via bmc.json file). If not, BMC logic is skipped.
123+
- Errors in BMC initialization, trigger, or collect phases are caught and logged.
124+
- The timeout in techsupport script for collect_bmc_dump is set to 60 seconds.
125+
In practice, the dump is typically ready before collection begins.
126+
Since SONiC’s full techsupport script duration is already ≥ 1m20s,
127+
the BMC dump is often complete before reaching the collect stage.
128+
If not yet, we will wait for it with 60s timeout (a fallback and rarely used).
129+
130+
## 7. Fast/Warm/Cold boot and SONiC upgrade flow
131+
In general, this flow are cpu method so they are independent of bmc, no performace impact.
38.7 KB
Loading

images/bmc/bmc_ip_set_flow.png

65.9 KB
Loading

images/bmc/bmc_overall_flow.png

39.1 KB
Loading
166 KB
Loading

0 commit comments

Comments
 (0)