Skip to content

Commit ab6358b

Browse files
shreeya-patel98PlaidCat
authored andcommitted
hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
jira LE-3555 commit-author Michael Kelley <[email protected]> commit 41a6328 Starting with commit dca5161 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: <[email protected]> # 6.1.x Signed-off-by: Michael Kelley <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 41a6328) Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: Jonathan Maple <[email protected]>
1 parent 5f8d33c commit ab6358b

File tree

3 files changed

+32
-67
lines changed

3 files changed

+32
-67
lines changed

drivers/net/hyperv/hyperv_net.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -893,6 +893,18 @@ struct nvsp_message {
893893
sizeof(struct nvsp_message))
894894
#define NETVSC_MIN_IN_MSG_SIZE sizeof(struct vmpacket_descriptor)
895895

896+
/* Maximum # of contiguous data ranges that can make up a trasmitted packet.
897+
* Typically it's the max SKB fragments plus 2 for the rndis packet and the
898+
* linear portion of the SKB. But if MAX_SKB_FRAGS is large, the value may
899+
* need to be limited to MAX_PAGE_BUFFER_COUNT, which is the max # of entries
900+
* in a GPA direct packet sent to netvsp over VMBus.
901+
*/
902+
#if MAX_SKB_FRAGS + 2 < MAX_PAGE_BUFFER_COUNT
903+
#define MAX_DATA_RANGES (MAX_SKB_FRAGS + 2)
904+
#else
905+
#define MAX_DATA_RANGES MAX_PAGE_BUFFER_COUNT
906+
#endif
907+
896908
/* Estimated requestor size:
897909
* out_ring_size/min_out_msg_size + in_ring_size/min_in_msg_size
898910
*/

drivers/net/hyperv/netvsc_drv.c

Lines changed: 15 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -325,43 +325,10 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
325325
return txq;
326326
}
327327

328-
static u32 fill_pg_buf(unsigned long hvpfn, u32 offset, u32 len,
329-
struct hv_page_buffer *pb)
330-
{
331-
int j = 0;
332-
333-
hvpfn += offset >> HV_HYP_PAGE_SHIFT;
334-
offset = offset & ~HV_HYP_PAGE_MASK;
335-
336-
while (len > 0) {
337-
unsigned long bytes;
338-
339-
bytes = HV_HYP_PAGE_SIZE - offset;
340-
if (bytes > len)
341-
bytes = len;
342-
pb[j].pfn = hvpfn;
343-
pb[j].offset = offset;
344-
pb[j].len = bytes;
345-
346-
offset += bytes;
347-
len -= bytes;
348-
349-
if (offset == HV_HYP_PAGE_SIZE && len) {
350-
hvpfn++;
351-
offset = 0;
352-
j++;
353-
}
354-
}
355-
356-
return j + 1;
357-
}
358-
359328
static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
360329
struct hv_netvsc_packet *packet,
361330
struct hv_page_buffer *pb)
362331
{
363-
u32 slots_used = 0;
364-
char *data = skb->data;
365332
int frags = skb_shinfo(skb)->nr_frags;
366333
int i;
367334

@@ -370,28 +337,28 @@ static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
370337
* 2. skb linear data
371338
* 3. skb fragment data
372339
*/
373-
slots_used += fill_pg_buf(virt_to_hvpfn(hdr),
374-
offset_in_hvpage(hdr),
375-
len,
376-
&pb[slots_used]);
377340

341+
pb[0].offset = offset_in_hvpage(hdr);
342+
pb[0].len = len;
343+
pb[0].pfn = virt_to_hvpfn(hdr);
378344
packet->rmsg_size = len;
379-
packet->rmsg_pgcnt = slots_used;
345+
packet->rmsg_pgcnt = 1;
380346

381-
slots_used += fill_pg_buf(virt_to_hvpfn(data),
382-
offset_in_hvpage(data),
383-
skb_headlen(skb),
384-
&pb[slots_used]);
347+
pb[1].offset = offset_in_hvpage(skb->data);
348+
pb[1].len = skb_headlen(skb);
349+
pb[1].pfn = virt_to_hvpfn(skb->data);
385350

386351
for (i = 0; i < frags; i++) {
387352
skb_frag_t *frag = skb_shinfo(skb)->frags + i;
353+
struct hv_page_buffer *cur_pb = &pb[i + 2];
354+
u64 pfn = page_to_hvpfn(skb_frag_page(frag));
355+
u32 offset = skb_frag_off(frag);
388356

389-
slots_used += fill_pg_buf(page_to_hvpfn(skb_frag_page(frag)),
390-
skb_frag_off(frag),
391-
skb_frag_size(frag),
392-
&pb[slots_used]);
357+
cur_pb->offset = offset_in_hvpage(offset);
358+
cur_pb->len = skb_frag_size(frag);
359+
cur_pb->pfn = pfn + (offset >> HV_HYP_PAGE_SHIFT);
393360
}
394-
return slots_used;
361+
return frags + 2;
395362
}
396363

397364
static int count_skb_frag_slots(struct sk_buff *skb)
@@ -482,7 +449,7 @@ static int netvsc_xmit(struct sk_buff *skb, struct net_device *net, bool xdp_tx)
482449
struct net_device *vf_netdev;
483450
u32 rndis_msg_size;
484451
u32 hash;
485-
struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT];
452+
struct hv_page_buffer pb[MAX_DATA_RANGES];
486453

487454
/* If VF is present and up then redirect packets to it.
488455
* Skip the VF if it is marked down or has no carrier.

drivers/net/hyperv/rndis_filter.c

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,7 @@ static int rndis_filter_send_request(struct rndis_device *dev,
225225
struct rndis_request *req)
226226
{
227227
struct hv_netvsc_packet *packet;
228-
struct hv_page_buffer page_buf[2];
229-
struct hv_page_buffer *pb = page_buf;
228+
struct hv_page_buffer pb;
230229
int ret;
231230

232231
/* Setup the packet to send it */
@@ -235,27 +234,14 @@ static int rndis_filter_send_request(struct rndis_device *dev,
235234
packet->total_data_buflen = req->request_msg.msg_len;
236235
packet->page_buf_cnt = 1;
237236

238-
pb[0].pfn = virt_to_phys(&req->request_msg) >>
239-
HV_HYP_PAGE_SHIFT;
240-
pb[0].len = req->request_msg.msg_len;
241-
pb[0].offset = offset_in_hvpage(&req->request_msg);
242-
243-
/* Add one page_buf when request_msg crossing page boundary */
244-
if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) {
245-
packet->page_buf_cnt++;
246-
pb[0].len = HV_HYP_PAGE_SIZE -
247-
pb[0].offset;
248-
pb[1].pfn = virt_to_phys((void *)&req->request_msg
249-
+ pb[0].len) >> HV_HYP_PAGE_SHIFT;
250-
pb[1].offset = 0;
251-
pb[1].len = req->request_msg.msg_len -
252-
pb[0].len;
253-
}
237+
pb.pfn = virt_to_phys(&req->request_msg) >> HV_HYP_PAGE_SHIFT;
238+
pb.len = req->request_msg.msg_len;
239+
pb.offset = offset_in_hvpage(&req->request_msg);
254240

255241
trace_rndis_send(dev->ndev, 0, &req->request_msg);
256242

257243
rcu_read_lock_bh();
258-
ret = netvsc_send(dev->ndev, packet, NULL, pb, NULL, false);
244+
ret = netvsc_send(dev->ndev, packet, NULL, &pb, NULL, false);
259245
rcu_read_unlock_bh();
260246

261247
return ret;

0 commit comments

Comments
 (0)