Commit d1ceef5
net: Handle napi_schedule() calls from non-interrupt
[ Upstream commit 77e4514 ]
napi_schedule() is expected to be called either:
* From an interrupt, where raised softirqs are handled on IRQ exit
* From a softirq disabled section, where raised softirqs are handled on
the next call to local_bh_enable().
* From a softirq handler, where raised softirqs are handled on the next
round in do_softirq(), or further deferred to a dedicated kthread.
Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.
Such "misuses" have been detected on several places thanks to messages
of the kind:
"NOHZ tick-stop error: local softirq work is pending, handler #8!!!"
For example:
__raise_softirq_irqoff
__napi_schedule
rtl8152_runtime_resume.isra.0
rtl8152_resume
usb_resume_interface.isra.0
usb_resume_both
__rpm_callback
rpm_callback
rpm_resume
__pm_runtime_resume
usb_autoresume_device
usb_remote_wakeup
hub_event
process_one_work
worker_thread
kthread
ret_from_fork
ret_from_fork_asm
And also:
* drivers/net/usb/r8152.c::rtl_work_func_t
* drivers/net/netdevsim/netdev.c::nsim_start_xmit
There is a long history of issues of this kind:
019edd0 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
3300685 ("idpf: disable local BH when scheduling napi for marker packets")
e3d5d70 ("net: lan78xx: fix "softirq work is pending" error")
e55c27e ("mt76: mt7615: add missing bh-disable around rx napi schedule")
c0182aa ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
970be1d ("mt76: disable BH around napi_schedule() calls")
019edd0 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
30bfec4 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt")
e63052a ("mlx5e: add add missing BH locking around napi_schdule()")
83a0c6e ("i40e: Invoke softirqs after napi_reschedule")
bd4ce94 ("mlx4: Invoke softirqs after napi_reschedule")
8cf699e ("mlx4: do not call napi_schedule() without care")
ec13ee8 ("virtio_net: invoke softirqs after __napi_schedule")
This shows that relying on the caller to arrange a proper context for
the softirqs to be handled while calling napi_schedule() is very fragile
and error prone. Also fixing them can also prove challenging if the
caller may be called from different kinds of contexts.
Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.
Reported-by: Paul Menzel <[email protected]>
Reported-by: Jakub Kicinski <[email protected]>
Reported-by: Francois Romieu <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Cc: Breno Leitao <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>1 parent 1cf295a commit d1ceef5
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4608 | 4608 | | |
4609 | 4609 | | |
4610 | 4610 | | |
4611 | | - | |
| 4611 | + | |
4612 | 4612 | | |
4613 | 4613 | | |
4614 | 4614 | | |
| |||
0 commit comments