-
Notifications
You must be signed in to change notification settings - Fork 12
[SIG-CLOUD-9] rebase custom changes to 5.14.0-570.39.1.el9 #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
PlaidCat
merged 21 commits into
sig-cloud-9/5.14.0-570.39.1.el9_6
from
jmaple_sig-cloud-9/5.14.0-570.39.1.el9_6
Sep 8, 2025
Merged
[SIG-CLOUD-9] rebase custom changes to 5.14.0-570.39.1.el9 #555
PlaidCat
merged 21 commits into
sig-cloud-9/5.14.0-570.39.1.el9_6
from
jmaple_sig-cloud-9/5.14.0-570.39.1.el9_6
Sep 8, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jira SECO-170 In Rocky9 if you run ./run_vmtests.sh -t hmm it will fail and cause an infinite loop on ASSERTs in FIXTURE_TEARDOWN() This temporary fix is based on the discussion here https://patchwork.kernel.org/project/linux-kselftest/patch/[email protected]/#25046055 We will investigate further kselftest updates that will resolve the root causes of this. Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3208 feature net_mana commit-author Haiyang Zhang <[email protected]> commit 290e5d3 To support Multi Vports on Bare metal, increase the device config response version. And, skip the register HW vport, and register filter steps, when the Bare metal hostmode is set. Signed-off-by: Haiyang Zhang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> (cherry picked from commit 290e5d3) Signed-off-by: Jonathan Maple <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3207 feature tools_hv commit-author Shradha Gupta <[email protected]> commit a9c0b33 Allow the KVP daemon to log the KVP updates triggered in the VM with a new debug flag(-d). When the daemon is started with this flag, it logs updates and debug information in syslog with loglevel LOG_DEBUG. This information comes in handy for debugging issues where the key-value pairs for certain pools show mismatch/incorrect values. The distro-vendors can further consume these changes and modify the respective service files to redirect the logs to specific files as needed. Signed-off-by: Shradha Gupta <[email protected]> Reviewed-by: Naman Jain <[email protected]> Reviewed-by: Dexuan Cui <[email protected]> Link: https://lore.kernel.org/r/1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Wei Liu <[email protected]> Message-ID: <1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com> (cherry picked from commit a9c0b33) Signed-off-by: Jonathan Maple <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
…l page jira LE-3813 commit-author Long Li <[email protected]> commit 4a3b99b When mapping doorbell page from user-mode, the driver should use the system page size as this memory is allocated via mmap() from user-mode. Cc: [email protected] Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Long Li <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> (cherry picked from commit 4a3b99b) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
… size jira LE-3813 commit-author Long Li <[email protected]> commit 9e517a8 MANA hardware uses 4k page size. When calculating the page table index, it should use the hardware page size, not the system page size. Cc: [email protected] Fixes: 0266a17 ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Long Li <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> (cherry picked from commit 9e517a8) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3545 commit-author Dexuan Cui <[email protected]> commit b2f9665 Currently storvsc_timeout is only used in storvsc_sdev_configure(), and 5s and 10s are used elsewhere. It turns out that rarely the 5s is not enough on Azure, so let's use storvsc_timeout everywhere. In case a timeout happens and storvsc_channel_init() returns an error, close the VMBus channel so that any host-to-guest messages in the channel's ringbuffer, which might come late, can be safely ignored. Add a "const" to storvsc_timeout. Cc: [email protected] Signed-off-by: Dexuan Cui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Long Li <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> (cherry picked from commit b2f9665) Signed-off-by: Sultan Alsawaf <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3554 commit-author Michael Kelley <[email protected]> commit 380b75d vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor. Update the storvsc driver to reflect this new approach. Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 380b75d) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3554 commit-author Michael Kelley <[email protected]> commit 4f98616 netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping. Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 4f98616) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3554 commit-author Michael Kelley <[email protected]> commit 41a6328 Starting with commit dca5161 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 41a6328) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3554 commit-author Michael Kelley <[email protected]> commit 5bbc644 init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed. Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 5bbc644) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3554 commit-author Michael Kelley <[email protected]> commit 45a442f With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 45a442f) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3885 commit-author Shradha Gupta <[email protected]> commit 6859209 On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE is not possible without allowing the same for netvsc NIC (as the NICs are bonded together). For bonded NICs, the min of the max aggregated pkt size of the members is propagated in the stack. Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size to VF's packet size for netvsc too, when the data path is switched over to the VF Tested on azure env with Accelerated Networking enabled and disabled. Signed-off-by: Shradha Gupta <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 6859209) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3885 commit-author Shradha Gupta <[email protected]> commit 2731583 Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC. This patch only increases the max allowable gso/gro pkt size for MANA devices and does not change the defaults. Following are the perf benefits by increasing the pkt aggregate size from legacy gso_max_size value(64K) to newer one(up-to 511K IPv4 tests for i in {1..10}; do netperf -t TCP_RR -H 10.0.0.5 -p50000 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done min p90 p99 Throughput gso_max_size 93 171 194 6594.25 97 154 180 7183.74 95 165 189 6927.86 96 165 188 6976.04 93 154 185 7338.05 64K 93 168 189 6938.03 94 169 189 6784.93 92 166 189 7117.56 94 179 191 6678.44 95 157 183 7277.81 min p90 p99 Throughput 93 134 146 8448.75 95 134 140 8396.54 94 137 148 8204.12 94 137 148 8244.41 94 128 139 8666.52 80K 94 141 153 8116.86 94 138 149 8163.92 92 135 142 8362.72 92 134 142 8497.57 93 136 148 8393.23 IPv6 Tests for i in {1..10}; do netperf -t TCP_RR -H fd00:9013:cadd::4 -p50000 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done min p90 p99 Throughput gso_max_size 108 165 170 6673.2 101 169 189 6451.69 101 165 169 6737.65 102 167 175 6614.64 101 178 189 6247.13 64K 107 163 169 6678.63 106 176 187 6350.86 100 164 169 6617.36 102 163 170 6849.21 102 168 175 6605.7 min p90 p99 Throughput 108 155 166 7183 110 154 163 7268.87 109 152 159 7434.35 107 145 157 7569.15 107 149 164 7496.17 80K 110 154 159 7245.85 108 156 162 7266.24 109 145 158 7526.66 106 145 151 7785.75 111 148 157 7246.65 Tested on azure env with Accelerated Networking enabled and disabled. Signed-off-by: Shradha Gupta <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 2731583) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3889 commit-author Erni Sri Satya Vennela <[email protected]> commit 47dfd7a Add more logs to assist in debugging and monitoring driver behaviour, making it easier to identify potential issues during development and testing. Signed-off-by: Erni Sri Satya Vennela <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 47dfd7a) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3893 commit-author Long Li <[email protected]> commit a8445cf Change mana_get_primary_netdev_rcu() to mana_get_primary_netdev(), and return the ndev with refcount held. The caller is responsible for dropping the refcount. Also drop the check for IFF_SLAVE as it is not necessary if the upper device is present. Signed-off-by: Long Li <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> (cherry picked from commit a8445cf) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3893 commit-author Long Li <[email protected]> commit bee35b7 upstream-diff There were conflicts when applying this patch due to the following missing commits :- 79bccd7 ("RDMA/mana_ib: Add port statistics support") df91c47 ("RDMA/mana_ib: create/destroy AH") When running under Hyper-V, the master device to the RDMA device is always bonded to this RDMA device. This is not user-configurable. The master device can be unbind/bind from the kernel. During those events, the RDMA device should set to the current netdev to reflect the change of master device from those events. Signed-off-by: Long Li <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> (cherry picked from commit bee35b7) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3903 commit-author Haiyang Zhang <[email protected]> commit 2fc8a34 According to GDMA protocol, holes (zeros) are allowed at the beginning or middle of the gdma_list_devices_resp message. The existing code cannot properly handle this, and may miss some devices in the list. To fix, scan the entire list until the num_of_devs are found, or until the end of the list. Cc: [email protected] Fixes: ca9c54d ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Long Li <[email protected]> Reviewed-by: Shradha Gupta <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> (cherry picked from commit 2fc8a34) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3907 commit-author Haiyang Zhang <[email protected]> commit fa37a88 Frag allocators, such as netdev_alloc_frag(), were not designed to work for fragsz > PAGE_SIZE. So, switch to page pool for jumbo frames instead of using page frag allocators. This driver is using page pool for smaller MTUs already. Cc: [email protected] Fixes: 80f6215 ("net: mana: Add support for jumbo frame") Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Long Li <[email protected]> Reviewed-by: Shradha Gupta <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit fa37a88) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
…htool. jira LE-3915 commit-author Dipayaan Roy <[email protected]> commit c09ef59 Add support for reporting additional hardware counters for drop and TC using the ethtool -S interface. These counters include: - Aggregate Rx/Tx drop counters - Per-TC Rx/Tx packet counters - Per-TC Rx/Tx byte counters - Per-TC Rx/Tx pause frame counters The counters are exposed using ethtool_ops->get_ethtool_stats and ethtool_ops->get_strings. This feature/counters are not available to all versions of hardware. Signed-off-by: Dipayaan Roy <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Link: https://patch.msgid.link/20250609100103.GA7102@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit c09ef59) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3919 commit-author Haiyang Zhang <[email protected]> commit 7768c5f To collaborate with hardware servicing events, upon receiving the special EQE notification from the HW channel, remove the devices on this bus. Then, after a waiting period based on the device specs, rescan the parent bus to recover the devices. Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Shradha Gupta <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 7768c5f) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
jira LE-3923 commit-author Haiyang Zhang <[email protected]> commit fbe346c upstream-diff There were conflicts seen when applying this patch due to the following missing commits :- ca8ac48 ("net: mana: Handle unsupported HWC commands") 505cc26 ("net: mana: Add support for auxiliary device servicing events") Upon receiving the Reset Request, pause the connection and clean up queues, wait for the specified period, then resume the NIC. In the cleanup phase, the HWC is no longer responding, so set hwc_timeout to zero to skip waiting on the response. Signed-off-by: Haiyang Zhang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit fbe346c) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen
approved these changes
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥌
jdieter
approved these changes
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
shreeya-patel98
approved these changes
Sep 8, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update process (This kernel CentOS base for
5.14.0-570
)src.rpm
s hosted by RESFsig-cloud-9/5.14.0-570.X.1.el8_10
branchel
release.Removed Commits
None
Forward Port Process
BUILD
KselfTests