- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
BMC flows in SONiC #2062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
BMC flows in SONiC #2062
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| # Support BMC flows in SONiC | ||
|  | ||
| ## 1. BMC and Redfish | ||
| Board Management Controller (BMC) is a specialized microcontroller embedded on a motherboard. It manages the interface between system management software and hardware. BMC provides out-of-band management capabilities, allowing administrators to monitor and manage hardware remotely. | ||
| OpenBMC is an open-source project that provides a Linux-based firmware stack for Board Management Controllers (BMCs). It implements the Redfish standard, allowing for standardized and secure remote management of server hardware. In essence, OpenBMC serves as the software that runs on BMC hardware, utilizing the Redfish API to facilitate efficient hardware management. | ||
| Redfish is a standard for managing and interacting with hardware in a datacenter, designed to be simple, secure, and scalable. It works with BMC to provide a RESTful API for remote management of servers. Together, Redfish and BMC enable efficient and standardized hardware management. | ||
| In summary, NOS will deal with BMC through the redfish RESTful API. | ||
|  | ||
|  | ||
| ## 2. BMC flows in SONiC | ||
| The implementation is straightforward: SONiC will incorporate a Redfish client as the underlying infrastructure to support the BMC action. This Redfish client object is implemented and initialized at runtime by SONiC itself. | ||
|  | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a recommendation for the kind of authentication that will be used by the redfish client from the NOS to use the RESTFUL APIs? I'm assuming vendors will move away from default username and passwords for OpenBMC. Please ignore this comment if its outside the scope of this document. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redfish provides HTTP basic authentication using the Authorization header in an HTTPS request, in my opinion, the way of generating the password to replace the default one is vendor specific | ||
|  | ||
| ## 3. BMC ip address initialization | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume this section is setting an interface on the host (running the NOS) with static information based on configuration in a json file. This interface will then be used to communicate with the BMC. If above is the case, do we assume the BMC stack provided by the vendor will be responsible for setting the corresponding interface to communicate with the host/NOS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct, and yes the vendor is responsible for setting the interface between bmc/nos | ||
| This is the flow of the bmc ip address configuration: | ||
| - device/platform/bmc.json contains bmc_if_name,bmc_if_addr,bmc_addr,bmc_net_mask | ||
|         
                  yuazhe marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - src/sonic-py-common/sonic_py_common/device_info.py::get_bmc_data read the bmc.json | ||
| - src/sonic-config-engine/sonic-cfggen::main call to device_info.get_bmc_data and write it to DEVICE_METADATA|bmc (This field will be added to DEVICE_METADATA ) | ||
| - files/image_config/interfaces/interfaces.j2 read DEVICE_METADATA|bmc write to /etc/network/interfaces: | ||
| ``` | ||
| auto usb0 | ||
| iface usb0 inet static | ||
| address <address> | ||
| netmask <netmask> | ||
| ``` | ||
|  | ||
|  | ||
|  | ||
|  | ||
| ## 3. BMC firmware upgrade flow | ||
|  | ||
| It requires a new ComponenetBMC object to be added to the component.py | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for 1, yes, I'm talking about an entire flash There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By (2) I meant, in certain cases, the hardware running OpenBMC comes with support for dual flash (Primary and Golden/Alternate) which are both programmed with OpenBMC images. The image from alternate flash boots when the Primary gets corrupted or unable to boot an image. Most boards (Including eval boards like ast2600) that run OpenBMC support dual flash configurations as a standard. But this will be BMC hardware/vendor specific. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. at this stage I suggest to not include this functionality to keep the simplicity | ||
|  | ||
|  | ||
|  | ||
| ## 4. Sonic-platform-common support for bmc | ||
|  | ||
| ### 4.1 BMC redfish client | ||
| The redfish_client.py module provides the RedfishClient class, which facilitates BMC access via cURL requests to Redfish APIs. This class serves as a cURL wrapper for executing various Redfish commands. The class utilizes callback functions to obtain user credentials securely and supports asynchronous task monitoring to handle long-running operations like firmware updates and log dump. | ||
|  | ||
| Key functionalities: | ||
| 1. Session Management: Handles login and logout operations, ensuring secure sessions with the BMC. It manages tokens and session IDs, and automates re-login if tokens expire. | ||
| 2. Firmware Management: Supports listing, updating, and querying firmware versions using Redfish APIs. | ||
| 3. BMC Operations: Enables BMC reset requests, password changes, and triggering/debugging log dumps. | ||
| 4. Error Handling: Maps cURL error codes to RedfishClient error codes, and includes comprehensive error handling and logging. | ||
| 5. Security: Obfuscates sensitive information such as tokens and passwords in logs and command outputs. | ||
|  | ||
| ### 4.2 BMC api scope | ||
| Because in SONiC, each command will be executed as a separate process, nothing will be shared between 2 commands. This requires 2 separate BMC RF sessions, so to avoid exhausting session numbers, we will have a logout call after each of the commands executed. | ||
| Thus, there will be a python decorator used for each API/fucntion, for both login and logout. | ||
| ``` | ||
| APIs inherited from Device Base | ||
| get_name() | ||
| get_presence() | ||
| get_model() | ||
| get_serial() | ||
| get_revision() | ||
| get_status() | ||
| is_replaceable() | ||
|  | ||
| BMC general APIs | ||
| Return dictionary (Manufacturer, Model, PartNumber, PowerState, SerialNumber) to show the eeprom info or exception with the failure reason | ||
| Returns an empty dictionary {} if EEPROM information cannot be retrieved | ||
| get_eeprom() | ||
|  | ||
| Return string to show the firmware version or exception with the failure reason | ||
| Returns 'N/A' if the BMC firmware version cannot be retrieved | ||
| get_version() | ||
|  | ||
|  | ||
| Returns: A tuple (ret, msg) where: | ||
| ret: An integer return code indicating success (0) or failure | ||
| msg: A string containing success message or error description | ||
| reset_root_password() | ||
|  | ||
|  | ||
| Returns: A tuple (ret, (task_id, err_msg)) where: | ||
| ret: An integer return code indicating success (0) or failure | ||
| task_id: A string containing the Redfish task ID for monitoring | ||
| the debug log dump operation. Returns '-1' on failure. | ||
| err_msg: A string containing error message if operation failed, | ||
| None if successful | ||
| trigger_bmc_debug_log_dump() | ||
|  | ||
| Returns: A tuple (ret, err_msg) where: | ||
| ret: An integer return code indicating success (0) or failure | ||
| err_msg: A string containing error message if operation failed | ||
| get_bmc_debug_log_dump(task_id, filename, path) | ||
|  | ||
| param fw_image: string to indicate the path of the firmware image | ||
| Returns:A tuple (ret, msg) where: | ||
| ret: An integer return code indicating success (0) or failure | ||
| msg: A string containing status message about the firmware update | ||
|  | ||
| update_firmware(fw_image) | ||
|  | ||
| ``` | ||
|  | ||
| ## 5. CLI commands | ||
| ``` | ||
| show platform bmc summary | ||
| --------------------------- | ||
| Manufacturer: XXXXX | ||
| Model: XXXXX | ||
| PartNumber: XXXXX | ||
| SerialNumber: XXXXX | ||
| PowerState: XXXXX | ||
| FirmwareVersion: XXXXX | ||
|  | ||
| show platform firmware status | ||
| Component Version Description | ||
| ----------- ------------------------- ---------------------------------------- | ||
| ONIE XXXXXXXXXXXXXXXXXXXXXXXXX ONIE - Open Network Install Environment | ||
| SSD XXXXXXXXXXXXXXXXXXXXXXXXX SSD - Solid-State Drive | ||
| BIOS XXXXXXXXXXXXXXXXXXXXXXXXX BIOS - Basic Input/Output System | ||
| CPLD1 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device | ||
| CPLD2 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device | ||
| CPLD3 XXXXXXXXXXXXXXXXXXXXXXXXX CPLD - Complex Programmable Logic Device | ||
| BMC XXXXXXXXXXXXXXXXXXXXXXXXX BMC – Board Management Controller | ||
|  | ||
| show platform bmc eeprom | ||
| --------------------------- | ||
| Manufacturer: XXXXX | ||
| Model: XXXXX | ||
| PartNumber: XXXXX | ||
| PowerState: XXXXX | ||
| SerialNumber: XXXXX | ||
|  | ||
| config platform firmware install chassis component BMC fw -y ${BMC_IMAGE} | ||
|  | ||
| ``` | ||
|  | ||
| ## 6. show techsupport | ||
| Bmc dump will be included in the show techsupport, trigger_bmc_debug_log_dump() and get_bmc_debug_log_dump() shall be called by the generate-dump script. | ||
|  | ||
| ### 6.1. Overview | ||
| The 'show techsupport' command is extended to collect BMC dump logs via Redfish API. | ||
| This integration is non-blocking and asynchronous: | ||
| It triggers a BMC dump task at the start of the script, then continues with regular | ||
| system data collection. Before the script finishes, it collects the dump from BMC | ||
| using the task ID previously received. | ||
| The design ensures that BMC issues (timeouts, failures, unsupported platforms) | ||
| do not block or interrupt the standard dump flow. | ||
|  | ||
| ### 6.2 High-Level Diagram | ||
|  | ||
|  | ||
| ### 6.3 Errors Handling: | ||
| - generate_dump check whether BMC is suppported (via bmc.json file). If not, BMC logic is skipped. | ||
| - Errors in BMC initialization, trigger, or collect phases are caught and logged. | ||
| - The timeout in techsupport script for collect_bmc_dump is set to 60 seconds. | ||
| In practice, the dump is typically ready before collection begins. | ||
| Since SONiC’s full techsupport script duration is already ≥ 1m20s, | ||
| the BMC dump is often complete before reaching the collect stage. | ||
| If not yet, we will wait for it with 60s timeout (a fallback and rarely used). | ||
|  | ||
| ## 7. Fast/Warm/Cold boot and SONiC upgrade flow | ||
| In general, this flow are cpu method so they are independent of bmc, no performace impact. | ||
|  | ||
| ## 8. Further enhancement | ||
|  | ||
| After community review, there are two improvements that will be made in the 202605 branch: | ||
|  | ||
| 1. The Redfish client will be added to the platform common API, providing support for these APIs, and it will be easier to extend for vendor-specific use. | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need to have this as a common library not a vendor specific library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will provide a basic Redfish client common library https://github.com/sonic-net/SONiC/pull/2062/files#diff-3eea7993a54f0e54d1c8982663cdfdd3a124dcfa768ab67cc2419e6db71e92f1R38, which vendors can enhance further to support additional APIs if needed. | ||
| 2. The IP address for BMC will be assigned automatically via DHCP, saving the effort to manage a separate configuration file. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "NOS will deal with BMC", did you mean NOS managing the below responsibilities?
Mainly asking in the context of ensuring the use cases being addressed in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1, the upgrade of bmc is controlled by nos, but whether it will through FPD is arguable.
for 2,3 yes.