Skip to content

Commit 1fc70ed

Browse files
TaeheeYoodavem330
authored andcommitted
net: core: add nested_level variable in net_device
This patch is to add a new variable 'nested_level' into the net_device structure. This variable will be used as a parameter of spin_lock_nested() of dev->addr_list_lock. netif_addr_lock() can be called recursively so spin_lock_nested() is used instead of spin_lock() and dev->lower_level is used as a parameter of spin_lock_nested(). But, dev->lower_level value can be updated while it is being used. So, lockdep would warn a possible deadlock scenario. When a stacked interface is deleted, netif_{uc | mc}_sync() is called recursively. So, spin_lock_nested() is called recursively too. At this moment, the dev->lower_level variable is used as a parameter of it. dev->lower_level value is updated when interfaces are being unlinked/linked immediately. Thus, After unlinking, dev->lower_level shouldn't be a parameter of spin_lock_nested(). A (macvlan) | B (vlan) | C (bridge) | D (macvlan) | E (vlan) | F (bridge) A->lower_level : 6 B->lower_level : 5 C->lower_level : 4 D->lower_level : 3 E->lower_level : 2 F->lower_level : 1 When an interface 'A' is removed, it releases resources. At this moment, netif_addr_lock() would be called. Then, netdev_upper_dev_unlink() is called recursively. Then dev->lower_level is updated. There is no problem. But, when the bridge module is removed, 'C' and 'F' interfaces are removed at once. If 'F' is removed first, a lower_level value is like below. A->lower_level : 5 B->lower_level : 4 C->lower_level : 3 D->lower_level : 2 E->lower_level : 1 F->lower_level : 1 Then, 'C' is removed. at this moment, netif_addr_lock() is called recursively. The ordering is like this. C(3)->D(2)->E(1)->F(1) At this moment, the lower_level value of 'E' and 'F' are the same. So, lockdep warns a possible deadlock scenario. In order to avoid this problem, a new variable 'nested_level' is added. This value is the same as dev->lower_level - 1. But this value is updated in rtnl_unlock(). So, this variable can be used as a parameter of spin_lock_nested() safely in the rtnl context. Test commands: ip link add br0 type bridge vlan_filtering 1 ip link add vlan1 link br0 type vlan id 10 ip link add macvlan2 link vlan1 type macvlan ip link add br3 type bridge vlan_filtering 1 ip link set macvlan2 master br3 ip link add vlan4 link br3 type vlan id 10 ip link add macvlan5 link vlan4 type macvlan ip link add br6 type bridge vlan_filtering 1 ip link set macvlan5 master br6 ip link add vlan7 link br6 type vlan id 10 ip link add macvlan8 link vlan7 type macvlan ip link set br0 up ip link set vlan1 up ip link set macvlan2 up ip link set br3 up ip link set vlan4 up ip link set macvlan5 up ip link set br6 up ip link set vlan7 up ip link set macvlan8 up modprobe -rv bridge Splat looks like: [ 36.057436][ T744] WARNING: possible recursive locking detected [ 36.058848][ T744] 5.9.0-rc6+ #728 Not tainted [ 36.059959][ T744] -------------------------------------------- [ 36.061391][ T744] ip/744 is trying to acquire lock: [ 36.062590][ T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30 [ 36.064922][ T744] [ 36.064922][ T744] but task is already holding lock: [ 36.066626][ T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.068851][ T744] [ 36.068851][ T744] other info that might help us debug this: [ 36.070731][ T744] Possible unsafe locking scenario: [ 36.070731][ T744] [ 36.072497][ T744] CPU0 [ 36.073238][ T744] ---- [ 36.074007][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.075290][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.076590][ T744] [ 36.076590][ T744] *** DEADLOCK *** [ 36.076590][ T744] [ 36.078515][ T744] May be due to missing lock nesting notation [ 36.078515][ T744] [ 36.080491][ T744] 3 locks held by ip/744: [ 36.081471][ T744] #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490 [ 36.083614][ T744] #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.085942][ T744] #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80 [ 36.088400][ T744] [ 36.088400][ T744] stack backtrace: [ 36.089772][ T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728 [ 36.091364][ T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 36.093630][ T744] Call Trace: [ 36.094416][ T744] dump_stack+0x77/0x9b [ 36.095385][ T744] __lock_acquire+0xbc3/0x1f40 [ 36.096522][ T744] lock_acquire+0xb4/0x3b0 [ 36.097540][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.098657][ T744] ? rtmsg_ifinfo+0x1f/0x30 [ 36.099711][ T744] ? __dev_notify_flags+0xa5/0xf0 [ 36.100874][ T744] ? rtnl_is_locked+0x11/0x20 [ 36.101967][ T744] ? __dev_set_promiscuity+0x7b/0x1a0 [ 36.103230][ T744] _raw_spin_lock_bh+0x38/0x70 [ 36.104348][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.105461][ T744] dev_set_rx_mode+0x19/0x30 [ 36.106532][ T744] dev_set_promiscuity+0x36/0x50 [ 36.107692][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.108929][ T744] dev_set_promiscuity+0x1e/0x50 [ 36.110093][ T744] br_port_set_promisc+0x1f/0x40 [bridge] [ 36.111415][ T744] br_manage_promisc+0x8b/0xe0 [bridge] [ 36.112728][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.113967][ T744] ? __hw_addr_sync_one+0x23/0x50 [ 36.115135][ T744] __dev_set_rx_mode+0x68/0x90 [ 36.116249][ T744] dev_uc_sync+0x70/0x80 [ 36.117244][ T744] dev_uc_add+0x50/0x60 [ 36.118223][ T744] macvlan_open+0x18e/0x1f0 [macvlan] [ 36.119470][ T744] __dev_open+0xd6/0x170 [ 36.120470][ T744] __dev_change_flags+0x181/0x1d0 [ 36.121644][ T744] dev_change_flags+0x23/0x60 [ 36.122741][ T744] do_setlink+0x30a/0x11e0 [ 36.123778][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.124929][ T744] ? __nla_validate_parse.part.6+0x45/0x8e0 [ 36.126309][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.127457][ T744] __rtnl_newlink+0x546/0x8e0 [ 36.128560][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.129623][ T744] ? deactivate_slab.isra.85+0x6a1/0x850 [ 36.130946][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.132102][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.133176][ T744] ? is_bpf_text_address+0x5/0xe0 [ 36.134364][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.135445][ T744] ? rcu_read_lock_sched_held+0x32/0x60 [ 36.136771][ T744] ? kmem_cache_alloc_trace+0x2d8/0x380 [ 36.138070][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.139164][ T744] rtnl_newlink+0x47/0x70 [ ... ] Fixes: 845e0eb ("net: change addr_list_lock back to static key") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent eff7423 commit 1fc70ed

File tree

3 files changed

+122
-27
lines changed

3 files changed

+122
-27
lines changed

include/linux/netdevice.h

+44-8
Original file line numberDiff line numberDiff line change
@@ -1955,6 +1955,7 @@ struct net_device {
19551955
unsigned short type;
19561956
unsigned short hard_header_len;
19571957
unsigned char min_header_len;
1958+
unsigned char name_assign_type;
19581959

19591960
unsigned short needed_headroom;
19601961
unsigned short needed_tailroom;
@@ -1965,21 +1966,28 @@ struct net_device {
19651966
unsigned char addr_len;
19661967
unsigned char upper_level;
19671968
unsigned char lower_level;
1969+
19681970
unsigned short neigh_priv_len;
19691971
unsigned short dev_id;
19701972
unsigned short dev_port;
19711973
spinlock_t addr_list_lock;
1972-
unsigned char name_assign_type;
1973-
bool uc_promisc;
1974+
19741975
struct netdev_hw_addr_list uc;
19751976
struct netdev_hw_addr_list mc;
19761977
struct netdev_hw_addr_list dev_addrs;
19771978

19781979
#ifdef CONFIG_SYSFS
19791980
struct kset *queues_kset;
1981+
#endif
1982+
#ifdef CONFIG_LOCKDEP
1983+
struct list_head unlink_list;
19801984
#endif
19811985
unsigned int promiscuity;
19821986
unsigned int allmulti;
1987+
bool uc_promisc;
1988+
#ifdef CONFIG_LOCKDEP
1989+
unsigned char nested_level;
1990+
#endif
19831991

19841992

19851993
/* Protocol-specific pointers */
@@ -4260,17 +4268,23 @@ static inline void netif_tx_disable(struct net_device *dev)
42604268

42614269
static inline void netif_addr_lock(struct net_device *dev)
42624270
{
4263-
spin_lock(&dev->addr_list_lock);
4264-
}
4271+
unsigned char nest_level = 0;
42654272

4266-
static inline void netif_addr_lock_nested(struct net_device *dev)
4267-
{
4268-
spin_lock_nested(&dev->addr_list_lock, dev->lower_level);
4273+
#ifdef CONFIG_LOCKDEP
4274+
nest_level = dev->nested_level;
4275+
#endif
4276+
spin_lock_nested(&dev->addr_list_lock, nest_level);
42694277
}
42704278

42714279
static inline void netif_addr_lock_bh(struct net_device *dev)
42724280
{
4273-
spin_lock_bh(&dev->addr_list_lock);
4281+
unsigned char nest_level = 0;
4282+
4283+
#ifdef CONFIG_LOCKDEP
4284+
nest_level = dev->nested_level;
4285+
#endif
4286+
local_bh_disable();
4287+
spin_lock_nested(&dev->addr_list_lock, nest_level);
42744288
}
42754289

42764290
static inline void netif_addr_unlock(struct net_device *dev)
@@ -4455,7 +4469,19 @@ extern int dev_rx_weight;
44554469
extern int dev_tx_weight;
44564470
extern int gro_normal_batch;
44574471

4472+
enum {
4473+
NESTED_SYNC_IMM_BIT,
4474+
NESTED_SYNC_TODO_BIT,
4475+
};
4476+
4477+
#define __NESTED_SYNC_BIT(bit) ((u32)1 << (bit))
4478+
#define __NESTED_SYNC(name) __NESTED_SYNC_BIT(NESTED_SYNC_ ## name ## _BIT)
4479+
4480+
#define NESTED_SYNC_IMM __NESTED_SYNC(IMM)
4481+
#define NESTED_SYNC_TODO __NESTED_SYNC(TODO)
4482+
44584483
struct netdev_nested_priv {
4484+
unsigned char flags;
44594485
void *data;
44604486
};
44614487

@@ -4465,6 +4491,16 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
44654491
struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
44664492
struct list_head **iter);
44674493

4494+
#ifdef CONFIG_LOCKDEP
4495+
static LIST_HEAD(net_unlink_list);
4496+
4497+
static inline void net_unlink_todo(struct net_device *dev)
4498+
{
4499+
if (list_empty(&dev->unlink_list))
4500+
list_add_tail(&dev->unlink_list, &net_unlink_list);
4501+
}
4502+
#endif
4503+
44684504
/* iterate through upper list, must be called under RCU read lock */
44694505
#define netdev_for_each_upper_dev_rcu(dev, updev, iter) \
44704506
for (iter = &(dev)->adj_list.upper, \

net/core/dev.c

+72-13
Original file line numberDiff line numberDiff line change
@@ -7104,6 +7104,7 @@ static bool __netdev_has_upper_dev(struct net_device *dev,
71047104
struct net_device *upper_dev)
71057105
{
71067106
struct netdev_nested_priv priv = {
7107+
.flags = 0,
71077108
.data = (void *)upper_dev,
71087109
};
71097110

@@ -7385,9 +7386,19 @@ static int __netdev_update_upper_level(struct net_device *dev,
73857386
}
73867387

73877388
static int __netdev_update_lower_level(struct net_device *dev,
7388-
struct netdev_nested_priv *__unused)
7389+
struct netdev_nested_priv *priv)
73897390
{
73907391
dev->lower_level = __netdev_lower_depth(dev) + 1;
7392+
7393+
#ifdef CONFIG_LOCKDEP
7394+
if (!priv)
7395+
return 0;
7396+
7397+
if (priv->flags & NESTED_SYNC_IMM)
7398+
dev->nested_level = dev->lower_level - 1;
7399+
if (priv->flags & NESTED_SYNC_TODO)
7400+
net_unlink_todo(dev);
7401+
#endif
73917402
return 0;
73927403
}
73937404

@@ -7665,6 +7676,7 @@ static void __netdev_adjacent_dev_unlink_neighbour(struct net_device *dev,
76657676
static int __netdev_upper_dev_link(struct net_device *dev,
76667677
struct net_device *upper_dev, bool master,
76677678
void *upper_priv, void *upper_info,
7679+
struct netdev_nested_priv *priv,
76687680
struct netlink_ext_ack *extack)
76697681
{
76707682
struct netdev_notifier_changeupper_info changeupper_info = {
@@ -7721,9 +7733,9 @@ static int __netdev_upper_dev_link(struct net_device *dev,
77217733
__netdev_update_upper_level(dev, NULL);
77227734
__netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL);
77237735

7724-
__netdev_update_lower_level(upper_dev, NULL);
7736+
__netdev_update_lower_level(upper_dev, priv);
77257737
__netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level,
7726-
NULL);
7738+
priv);
77277739

77287740
return 0;
77297741

@@ -7748,8 +7760,13 @@ int netdev_upper_dev_link(struct net_device *dev,
77487760
struct net_device *upper_dev,
77497761
struct netlink_ext_ack *extack)
77507762
{
7763+
struct netdev_nested_priv priv = {
7764+
.flags = NESTED_SYNC_IMM | NESTED_SYNC_TODO,
7765+
.data = NULL,
7766+
};
7767+
77517768
return __netdev_upper_dev_link(dev, upper_dev, false,
7752-
NULL, NULL, extack);
7769+
NULL, NULL, &priv, extack);
77537770
}
77547771
EXPORT_SYMBOL(netdev_upper_dev_link);
77557772

@@ -7772,13 +7789,19 @@ int netdev_master_upper_dev_link(struct net_device *dev,
77727789
void *upper_priv, void *upper_info,
77737790
struct netlink_ext_ack *extack)
77747791
{
7792+
struct netdev_nested_priv priv = {
7793+
.flags = NESTED_SYNC_IMM | NESTED_SYNC_TODO,
7794+
.data = NULL,
7795+
};
7796+
77757797
return __netdev_upper_dev_link(dev, upper_dev, true,
7776-
upper_priv, upper_info, extack);
7798+
upper_priv, upper_info, &priv, extack);
77777799
}
77787800
EXPORT_SYMBOL(netdev_master_upper_dev_link);
77797801

77807802
static void __netdev_upper_dev_unlink(struct net_device *dev,
7781-
struct net_device *upper_dev)
7803+
struct net_device *upper_dev,
7804+
struct netdev_nested_priv *priv)
77827805
{
77837806
struct netdev_notifier_changeupper_info changeupper_info = {
77847807
.info = {
@@ -7803,9 +7826,9 @@ static void __netdev_upper_dev_unlink(struct net_device *dev,
78037826
__netdev_update_upper_level(dev, NULL);
78047827
__netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL);
78057828

7806-
__netdev_update_lower_level(upper_dev, NULL);
7829+
__netdev_update_lower_level(upper_dev, priv);
78077830
__netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level,
7808-
NULL);
7831+
priv);
78097832
}
78107833

78117834
/**
@@ -7819,7 +7842,12 @@ static void __netdev_upper_dev_unlink(struct net_device *dev,
78197842
void netdev_upper_dev_unlink(struct net_device *dev,
78207843
struct net_device *upper_dev)
78217844
{
7822-
__netdev_upper_dev_unlink(dev, upper_dev);
7845+
struct netdev_nested_priv priv = {
7846+
.flags = NESTED_SYNC_TODO,
7847+
.data = NULL,
7848+
};
7849+
7850+
__netdev_upper_dev_unlink(dev, upper_dev, &priv);
78237851
}
78247852
EXPORT_SYMBOL(netdev_upper_dev_unlink);
78257853

@@ -7855,15 +7883,19 @@ int netdev_adjacent_change_prepare(struct net_device *old_dev,
78557883
struct net_device *dev,
78567884
struct netlink_ext_ack *extack)
78577885
{
7886+
struct netdev_nested_priv priv = {
7887+
.flags = 0,
7888+
.data = NULL,
7889+
};
78587890
int err;
78597891

78607892
if (!new_dev)
78617893
return 0;
78627894

78637895
if (old_dev && new_dev != old_dev)
78647896
netdev_adjacent_dev_disable(dev, old_dev);
7865-
7866-
err = netdev_upper_dev_link(new_dev, dev, extack);
7897+
err = __netdev_upper_dev_link(new_dev, dev, false, NULL, NULL, &priv,
7898+
extack);
78677899
if (err) {
78687900
if (old_dev && new_dev != old_dev)
78697901
netdev_adjacent_dev_enable(dev, old_dev);
@@ -7878,28 +7910,38 @@ void netdev_adjacent_change_commit(struct net_device *old_dev,
78787910
struct net_device *new_dev,
78797911
struct net_device *dev)
78807912
{
7913+
struct netdev_nested_priv priv = {
7914+
.flags = NESTED_SYNC_IMM | NESTED_SYNC_TODO,
7915+
.data = NULL,
7916+
};
7917+
78817918
if (!new_dev || !old_dev)
78827919
return;
78837920

78847921
if (new_dev == old_dev)
78857922
return;
78867923

78877924
netdev_adjacent_dev_enable(dev, old_dev);
7888-
netdev_upper_dev_unlink(old_dev, dev);
7925+
__netdev_upper_dev_unlink(old_dev, dev, &priv);
78897926
}
78907927
EXPORT_SYMBOL(netdev_adjacent_change_commit);
78917928

78927929
void netdev_adjacent_change_abort(struct net_device *old_dev,
78937930
struct net_device *new_dev,
78947931
struct net_device *dev)
78957932
{
7933+
struct netdev_nested_priv priv = {
7934+
.flags = 0,
7935+
.data = NULL,
7936+
};
7937+
78967938
if (!new_dev)
78977939
return;
78987940

78997941
if (old_dev && new_dev != old_dev)
79007942
netdev_adjacent_dev_enable(dev, old_dev);
79017943

7902-
netdev_upper_dev_unlink(new_dev, dev);
7944+
__netdev_upper_dev_unlink(new_dev, dev, &priv);
79037945
}
79047946
EXPORT_SYMBOL(netdev_adjacent_change_abort);
79057947

@@ -10083,6 +10125,19 @@ static void netdev_wait_allrefs(struct net_device *dev)
1008310125
void netdev_run_todo(void)
1008410126
{
1008510127
struct list_head list;
10128+
#ifdef CONFIG_LOCKDEP
10129+
struct list_head unlink_list;
10130+
10131+
list_replace_init(&net_unlink_list, &unlink_list);
10132+
10133+
while (!list_empty(&unlink_list)) {
10134+
struct net_device *dev = list_first_entry(&unlink_list,
10135+
struct net_device,
10136+
unlink_list);
10137+
list_del(&dev->unlink_list);
10138+
dev->nested_level = dev->lower_level - 1;
10139+
}
10140+
#endif
1008610141

1008710142
/* Snapshot list, allow later requests */
1008810143
list_replace_init(&net_todo_list, &list);
@@ -10295,6 +10350,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
1029510350
dev->gso_max_segs = GSO_MAX_SEGS;
1029610351
dev->upper_level = 1;
1029710352
dev->lower_level = 1;
10353+
#ifdef CONFIG_LOCKDEP
10354+
dev->nested_level = 0;
10355+
INIT_LIST_HEAD(&dev->unlink_list);
10356+
#endif
1029810357

1029910358
INIT_LIST_HEAD(&dev->napi_list);
1030010359
INIT_LIST_HEAD(&dev->unreg_list);

net/core/dev_addr_lists.c

+6-6
Original file line numberDiff line numberDiff line change
@@ -637,7 +637,7 @@ int dev_uc_sync(struct net_device *to, struct net_device *from)
637637
if (to->addr_len != from->addr_len)
638638
return -EINVAL;
639639

640-
netif_addr_lock_nested(to);
640+
netif_addr_lock(to);
641641
err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len);
642642
if (!err)
643643
__dev_set_rx_mode(to);
@@ -667,7 +667,7 @@ int dev_uc_sync_multiple(struct net_device *to, struct net_device *from)
667667
if (to->addr_len != from->addr_len)
668668
return -EINVAL;
669669

670-
netif_addr_lock_nested(to);
670+
netif_addr_lock(to);
671671
err = __hw_addr_sync_multiple(&to->uc, &from->uc, to->addr_len);
672672
if (!err)
673673
__dev_set_rx_mode(to);
@@ -700,7 +700,7 @@ void dev_uc_unsync(struct net_device *to, struct net_device *from)
700700
* larger.
701701
*/
702702
netif_addr_lock_bh(from);
703-
netif_addr_lock_nested(to);
703+
netif_addr_lock(to);
704704
__hw_addr_unsync(&to->uc, &from->uc, to->addr_len);
705705
__dev_set_rx_mode(to);
706706
netif_addr_unlock(to);
@@ -867,7 +867,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from)
867867
if (to->addr_len != from->addr_len)
868868
return -EINVAL;
869869

870-
netif_addr_lock_nested(to);
870+
netif_addr_lock(to);
871871
err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len);
872872
if (!err)
873873
__dev_set_rx_mode(to);
@@ -897,7 +897,7 @@ int dev_mc_sync_multiple(struct net_device *to, struct net_device *from)
897897
if (to->addr_len != from->addr_len)
898898
return -EINVAL;
899899

900-
netif_addr_lock_nested(to);
900+
netif_addr_lock(to);
901901
err = __hw_addr_sync_multiple(&to->mc, &from->mc, to->addr_len);
902902
if (!err)
903903
__dev_set_rx_mode(to);
@@ -922,7 +922,7 @@ void dev_mc_unsync(struct net_device *to, struct net_device *from)
922922

923923
/* See the above comments inside dev_uc_unsync(). */
924924
netif_addr_lock_bh(from);
925-
netif_addr_lock_nested(to);
925+
netif_addr_lock(to);
926926
__hw_addr_unsync(&to->mc, &from->mc, to->addr_len);
927927
__dev_set_rx_mode(to);
928928
netif_addr_unlock(to);

0 commit comments

Comments
 (0)