Commit 10e8a2d
bpf: Call free_htab_elem() after htab_unlock_bucket()
[ Upstream commit b9e9ed9 ]
For htab of maps, when the map is removed from the htab, it may hold the
last reference of the map. bpf_map_fd_put_ptr() will invoke
bpf_map_free_id() to free the id of the removed map element. However,
bpf_map_fd_put_ptr() is invoked while holding a bucket lock
(raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock
(spinlock_t), triggering the following lockdep warning:
=============================
[ BUG: Invalid wait context ]
6.11.0-rc4+ #49 Not tainted
-----------------------------
test_maps/4881 is trying to lock:
ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70
other info that might help us debug this:
context-{5:5}
2 locks held by test_maps/4881:
#0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270
#1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80
stack backtrace:
CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xb0
dump_stack+0x10/0x20
__lock_acquire+0x73e/0x36c0
lock_acquire+0x182/0x450
_raw_spin_lock_irqsave+0x43/0x70
bpf_map_free_id.part.0+0x21/0x70
bpf_map_put+0xcf/0x110
bpf_map_fd_put_ptr+0x9a/0xb0
free_htab_elem+0x69/0xe0
htab_map_update_elem+0x50f/0xa80
bpf_fd_htab_map_update_elem+0x131/0x270
htab_map_update_elem+0x50f/0xa80
bpf_fd_htab_map_update_elem+0x131/0x270
bpf_map_update_value+0x266/0x380
__sys_bpf+0x21bb/0x36b0
__x64_sys_bpf+0x45/0x60
x64_sys_call+0x1b2a/0x20d0
do_syscall_64+0x5d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
One way to fix the lockdep warning is using raw_spinlock_t for
map_idr_lock as well. However, bpf_map_alloc_id() invokes
idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a
similar lockdep warning because the slab's lock (s->cpu_slab->lock) is
still a spinlock.
Instead of changing map_idr_lock's type, fix the issue by invoking
htab_put_fd_value() after htab_unlock_bucket(). However, only deferring
the invocation of htab_put_fd_value() is not enough, because the old map
pointers in htab of maps can not be saved during batched deletion.
Therefore, also defer the invocation of free_htab_elem(), so these
to-be-freed elements could be linked together similar to lru map.
There are four callers for ->map_fd_put_ptr:
(1) alloc_htab_elem() (through htab_put_fd_value())
It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of
htab_put_fd_value() can not simply move after htab_unlock_bucket(),
because the old element has already been stashed in htab->extra_elems.
It may be reused immediately after htab_unlock_bucket() and the
invocation of htab_put_fd_value() after htab_unlock_bucket() may release
the newly-added element incorrectly. Therefore, saving the map pointer
of the old element for htab of maps before unlocking the bucket and
releasing the map_ptr after unlock. Beside the map pointer in the old
element, should do the same thing for the special fields in the old
element as well.
(2) free_htab_elem() (through htab_put_fd_value())
Its caller includes __htab_map_lookup_and_delete_elem(),
htab_map_delete_elem() and __htab_map_lookup_and_delete_batch().
For htab_map_delete_elem(), simply invoke free_htab_elem() after
htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just
like lru map, linking the to-be-freed element into node_to_free list
and invoking free_htab_elem() for these element after unlock. It is safe
to reuse batch_flink as the link for node_to_free, because these
elements have been removed from the hash llist.
Because htab of maps doesn't support lookup_and_delete operation,
__htab_map_lookup_and_delete_elem() doesn't have the problem, so kept
it as is.
(3) fd_htab_map_free()
It invokes ->map_fd_put_ptr without raw_spinlock_t.
(4) bpf_fd_htab_map_update_elem()
It invokes ->map_fd_put_ptr without raw_spinlock_t.
After moving free_htab_elem() outside htab bucket lock scope, using
pcpu_freelist_push() instead of __pcpu_freelist_push() to disable
the irq before freeing elements, and protecting the invocations of
bpf_mem_cache_free() with migrate_{disable|enable} pair.
Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>1 parent 07c020c commit 10e8a2d
1 file changed
+39
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
892 | 892 | | |
893 | 893 | | |
894 | 894 | | |
| 895 | + | |
| 896 | + | |
895 | 897 | | |
896 | 898 | | |
897 | 899 | | |
| 900 | + | |
898 | 901 | | |
899 | 902 | | |
900 | 903 | | |
| |||
944 | 947 | | |
945 | 948 | | |
946 | 949 | | |
947 | | - | |
| 950 | + | |
948 | 951 | | |
949 | 952 | | |
950 | 953 | | |
| |||
1014 | 1017 | | |
1015 | 1018 | | |
1016 | 1019 | | |
1017 | | - | |
1018 | 1020 | | |
1019 | 1021 | | |
1020 | 1022 | | |
| |||
1100 | 1102 | | |
1101 | 1103 | | |
1102 | 1104 | | |
| 1105 | + | |
1103 | 1106 | | |
1104 | 1107 | | |
1105 | 1108 | | |
| |||
1178 | 1181 | | |
1179 | 1182 | | |
1180 | 1183 | | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
1181 | 1201 | | |
1182 | 1202 | | |
1183 | | - | |
1184 | | - | |
1185 | 1203 | | |
1186 | | - | |
| 1204 | + | |
1187 | 1205 | | |
1188 | 1206 | | |
1189 | 1207 | | |
| |||
1427 | 1445 | | |
1428 | 1446 | | |
1429 | 1447 | | |
1430 | | - | |
1431 | | - | |
| 1448 | + | |
1432 | 1449 | | |
1433 | | - | |
1434 | | - | |
| 1450 | + | |
1435 | 1451 | | |
1436 | | - | |
1437 | 1452 | | |
1438 | 1453 | | |
| 1454 | + | |
| 1455 | + | |
| 1456 | + | |
1439 | 1457 | | |
1440 | 1458 | | |
1441 | 1459 | | |
| |||
1842 | 1860 | | |
1843 | 1861 | | |
1844 | 1862 | | |
| 1863 | + | |
| 1864 | + | |
| 1865 | + | |
| 1866 | + | |
| 1867 | + | |
1845 | 1868 | | |
1846 | | - | |
1847 | | - | |
1848 | | - | |
1849 | | - | |
1850 | | - | |
1851 | | - | |
| 1869 | + | |
| 1870 | + | |
1852 | 1871 | | |
1853 | 1872 | | |
1854 | 1873 | | |
| |||
1860 | 1879 | | |
1861 | 1880 | | |
1862 | 1881 | | |
1863 | | - | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
1864 | 1886 | | |
1865 | 1887 | | |
1866 | 1888 | | |
| |||
0 commit comments