Skip to content

Commit e180a33

Browse files
committed
Merge branch 'add-layer-2-miss-indication-and-filtering'
Ido Schimmel says: ==================== Add layer 2 miss indication and filtering tl;dr ===== This patchset adds a single bit to the tc skb extension to indicate that a packet encountered a layer 2 miss in the bridge and extends flower to match on this metadata. This is required for non-DF (Designated Forwarder) filtering in EVPN multi-homing which prevents decapsulated BUM packets from being forwarded multiple times to the same multi-homed host. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches in a rack. These switches act as VTEPs and are not directly connected (as opposed to MLAG), but can communicate with each other (as well as with VTEPs in remote racks) via spine switches over L3. When a host sends a BUM packet over ES1 to VTEP1, the VTEP will flood it to other VTEPs in the network, including those connected to the host over ES1. The receiving VTEPs must drop the packet and not forward it back to the host. This is called "split-horizon filtering" (SPH) [1]. FRR configures SPH filtering using two tc filters. The first, an ingress filter that matches on packets received from VTEP1 and marks them using a fwmark (firewall mark). The second, an egress filter configured on the LAG interface connected to the host that matches on the fwmark and drops the packets. Example: # tc filter add dev vxlan0 ingress pref 1 proto all flower enc_src_ip $VTEP1_IP action skbedit mark 101 # tc filter add dev bond0 egress pref 1 handle 101 fw action drop Motivation ========== For each ES, only one VTEP is elected by the control plane as the DF. The DF is responsible for forwarding decapsulated BUM traffic to the host over the ES. The non-DF VTEPs must drop such traffic as otherwise the host will receive multiple copies of BUM traffic. This is called "non-DF filtering" [2]. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers. Implementation ============== The proposed solution is to add a single bit to the tc skb extension that is set by the bridge for packets that encountered an FDB or MDB miss. The flower classifier is extended to be able to match on this new metadata bit in a similar fashion to existing metadata options such as 'indev'. A bit that is set for every flooded packet would also work, but it does not allow us to differentiate between registered and unregistered multicast traffic which might be useful in the future. A relatively generic name is chosen for this bit - 'l2_miss' - to allow its use to be extended to other layer 2 devices such as VXLAN, should a use case arise. With the above, the control plane can implement a non-DF filter using the following tc filters: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop # tc filter add dev bond0 egress pref 2 proto all flower indev vxlan0 l2_miss true action drop The first drops broadcast and multicast traffic and the second drops unknown unicast traffic. Testing ======= A test exercising the different permutations of the 'l2_miss' bit is added in patch #8. Patchset overview ================= Patch #1 adds the new bit to the tc skb extension and sets it in the bridge driver for packets that encountered a miss. The marking of the packets and the use of this extension is protected by the 'tc_skb_ext_tc' static key in order to keep performance impact to a minimum when the feature is not in use. Patch #2 extends the flow dissector to dissect this information from the tc skb extension into the 'FLOW_DISSECTOR_KEY_META' key. Patch #3 extends the flower classifier to be able to match on the new layer 2 miss metadata. The classifier enables the 'tc_skb_ext_tc' static key upon the installation of the first filter that matches on 'l2_miss' and disables the key upon the removal of the last filter that matches on it. Patch #4 rejects matching on the new metadata in drivers that already support the 'FLOW_DISSECTOR_KEY_META' key. Patches #5-#6 are small preparations in mlxsw. Patch #7 extends mlxsw to be able to match on layer 2 miss. Patch #8 adds a selftest. iproute2 patches can be found here [3]. [1] https://datatracker.ietf.org/doc/html/rfc7432#section-8.3 [2] https://datatracker.ietf.org/doc/html/rfc7432#section-8.5 [3] https://github.com/idosch/iproute2/tree/submit/non_df_filter_v1 [4] https://lore.kernel.org/netdev/[email protected]/ [5] https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 2e246bc + 8c33266 commit e180a33

File tree

18 files changed

+485
-16
lines changed

18 files changed

+485
-16
lines changed

drivers/net/ethernet/marvell/prestera/prestera_flower.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,12 @@ static int prestera_flower_parse_meta(struct prestera_acl_rule *rule,
148148
__be16 key, mask;
149149

150150
flow_rule_match_meta(f_rule, &match);
151+
152+
if (match.mask->l2_miss) {
153+
NL_SET_ERR_MSG_MOD(f->common.extack, "Can't match on \"l2_miss\"");
154+
return -EOPNOTSUPP;
155+
}
156+
151157
if (match.mask->ingress_ifindex != 0xFFFFFFFF) {
152158
NL_SET_ERR_MSG_MOD(f->common.extack,
153159
"Unsupported ingress ifindex mask");

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2587,6 +2587,12 @@ static int mlx5e_flower_parse_meta(struct net_device *filter_dev,
25872587
return 0;
25882588

25892589
flow_rule_match_meta(rule, &match);
2590+
2591+
if (match.mask->l2_miss) {
2592+
NL_SET_ERR_MSG_MOD(f->common.extack, "Can't match on \"l2_miss\"");
2593+
return -EOPNOTSUPP;
2594+
}
2595+
25902596
if (!match.mask->ingress_ifindex)
25912597
return 0;
25922598

drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ static const struct mlxsw_afk_element_info mlxsw_afk_element_infos[] = {
4242
MLXSW_AFK_ELEMENT_INFO_BUF(DST_IP_64_95, 0x34, 4),
4343
MLXSW_AFK_ELEMENT_INFO_BUF(DST_IP_32_63, 0x38, 4),
4444
MLXSW_AFK_ELEMENT_INFO_BUF(DST_IP_0_31, 0x3C, 4),
45+
MLXSW_AFK_ELEMENT_INFO_U32(FDB_MISS, 0x40, 0, 1),
4546
};
4647

4748
struct mlxsw_afk {

drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ enum mlxsw_afk_element {
3535
MLXSW_AFK_ELEMENT_IP_DSCP,
3636
MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB,
3737
MLXSW_AFK_ELEMENT_VIRT_ROUTER_LSB,
38+
MLXSW_AFK_ELEMENT_FDB_MISS,
3839
MLXSW_AFK_ELEMENT_MAX,
3940
};
4041

@@ -69,7 +70,7 @@ struct mlxsw_afk_element_info {
6970
MLXSW_AFK_ELEMENT_INFO(MLXSW_AFK_ELEMENT_TYPE_BUF, \
7071
_element, _offset, 0, _size)
7172

72-
#define MLXSW_AFK_ELEMENT_STORAGE_SIZE 0x40
73+
#define MLXSW_AFK_ELEMENT_STORAGE_SIZE 0x44
7374

7475
struct mlxsw_afk_element_inst { /* element instance in actual block */
7576
enum mlxsw_afk_element element;

drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,10 +123,12 @@ const struct mlxsw_afk_ops mlxsw_sp1_afk_ops = {
123123
};
124124

125125
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_mac_0[] = {
126+
MLXSW_AFK_ELEMENT_INST_U32(FDB_MISS, 0x00, 3, 1),
126127
MLXSW_AFK_ELEMENT_INST_BUF(DMAC_0_31, 0x04, 4),
127128
};
128129

129130
static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_mac_1[] = {
131+
MLXSW_AFK_ELEMENT_INST_U32(FDB_MISS, 0x00, 3, 1),
130132
MLXSW_AFK_ELEMENT_INST_BUF(SMAC_0_31, 0x04, 4),
131133
};
132134

drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -281,49 +281,68 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
281281
return 0;
282282
}
283283

284-
static int mlxsw_sp_flower_parse_meta(struct mlxsw_sp_acl_rule_info *rulei,
285-
struct flow_cls_offload *f,
286-
struct mlxsw_sp_flow_block *block)
284+
static int
285+
mlxsw_sp_flower_parse_meta_iif(struct mlxsw_sp_acl_rule_info *rulei,
286+
const struct mlxsw_sp_flow_block *block,
287+
const struct flow_match_meta *match,
288+
struct netlink_ext_ack *extack)
287289
{
288-
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
289290
struct mlxsw_sp_port *mlxsw_sp_port;
290291
struct net_device *ingress_dev;
291-
struct flow_match_meta match;
292292

293-
if (!flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_META))
293+
if (!match->mask->ingress_ifindex)
294294
return 0;
295295

296-
flow_rule_match_meta(rule, &match);
297-
if (match.mask->ingress_ifindex != 0xFFFFFFFF) {
298-
NL_SET_ERR_MSG_MOD(f->common.extack, "Unsupported ingress ifindex mask");
296+
if (match->mask->ingress_ifindex != 0xFFFFFFFF) {
297+
NL_SET_ERR_MSG_MOD(extack, "Unsupported ingress ifindex mask");
299298
return -EINVAL;
300299
}
301300

302301
ingress_dev = __dev_get_by_index(block->net,
303-
match.key->ingress_ifindex);
302+
match->key->ingress_ifindex);
304303
if (!ingress_dev) {
305-
NL_SET_ERR_MSG_MOD(f->common.extack, "Can't find specified ingress port to match on");
304+
NL_SET_ERR_MSG_MOD(extack, "Can't find specified ingress port to match on");
306305
return -EINVAL;
307306
}
308307

309308
if (!mlxsw_sp_port_dev_check(ingress_dev)) {
310-
NL_SET_ERR_MSG_MOD(f->common.extack, "Can't match on non-mlxsw ingress port");
309+
NL_SET_ERR_MSG_MOD(extack, "Can't match on non-mlxsw ingress port");
311310
return -EINVAL;
312311
}
313312

314313
mlxsw_sp_port = netdev_priv(ingress_dev);
315314
if (mlxsw_sp_port->mlxsw_sp != block->mlxsw_sp) {
316-
NL_SET_ERR_MSG_MOD(f->common.extack, "Can't match on a port from different device");
315+
NL_SET_ERR_MSG_MOD(extack, "Can't match on a port from different device");
317316
return -EINVAL;
318317
}
319318

320319
mlxsw_sp_acl_rulei_keymask_u32(rulei,
321320
MLXSW_AFK_ELEMENT_SRC_SYS_PORT,
322321
mlxsw_sp_port->local_port,
323322
0xFFFFFFFF);
323+
324324
return 0;
325325
}
326326

327+
static int mlxsw_sp_flower_parse_meta(struct mlxsw_sp_acl_rule_info *rulei,
328+
struct flow_cls_offload *f,
329+
struct mlxsw_sp_flow_block *block)
330+
{
331+
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
332+
struct flow_match_meta match;
333+
334+
if (!flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_META))
335+
return 0;
336+
337+
flow_rule_match_meta(rule, &match);
338+
339+
mlxsw_sp_acl_rulei_keymask_u32(rulei, MLXSW_AFK_ELEMENT_FDB_MISS,
340+
match.key->l2_miss, match.mask->l2_miss);
341+
342+
return mlxsw_sp_flower_parse_meta_iif(rulei, block, &match,
343+
f->common.extack);
344+
}
345+
327346
static void mlxsw_sp_flower_parse_ipv4(struct mlxsw_sp_acl_rule_info *rulei,
328347
struct flow_cls_offload *f)
329348
{

drivers/net/ethernet/mscc/ocelot_flower.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,16 @@ ocelot_flower_parse_key(struct ocelot *ocelot, int port, bool ingress,
592592
return -EOPNOTSUPP;
593593
}
594594

595+
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_META)) {
596+
struct flow_match_meta match;
597+
598+
flow_rule_match_meta(rule, &match);
599+
if (match.mask->l2_miss) {
600+
NL_SET_ERR_MSG_MOD(extack, "Can't match on \"l2_miss\"");
601+
return -EOPNOTSUPP;
602+
}
603+
}
604+
595605
/* For VCAP ES0 (egress rewriter) we can match on the ingress port */
596606
if (!ingress) {
597607
ret = ocelot_flower_parse_indev(ocelot, port, f, filter);

include/linux/skbuff.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,7 @@ struct tc_skb_ext {
330330
u8 post_ct_snat:1;
331331
u8 post_ct_dnat:1;
332332
u8 act_miss:1; /* Set if act_miss_cookie is used */
333+
u8 l2_miss:1; /* Set by bridge upon FDB or MDB miss */
333334
};
334335
#endif
335336

include/net/flow_dissector.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,10 +243,12 @@ struct flow_dissector_key_ip {
243243
* struct flow_dissector_key_meta:
244244
* @ingress_ifindex: ingress ifindex
245245
* @ingress_iftype: ingress interface type
246+
* @l2_miss: packet did not match an L2 entry during forwarding
246247
*/
247248
struct flow_dissector_key_meta {
248249
int ingress_ifindex;
249250
u16 ingress_iftype;
251+
u8 l2_miss;
250252
};
251253

252254
/**

include/uapi/linux/pkt_cls.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,8 @@ enum {
594594

595595
TCA_FLOWER_KEY_L2TPV3_SID, /* be32 */
596596

597+
TCA_FLOWER_L2_MISS, /* u8 */
598+
597599
__TCA_FLOWER_MAX,
598600
};
599601

0 commit comments

Comments
 (0)