Skip to content

Commit bf2650d

Browse files
committed
Merge branch 'bridge-allow-keeping-local-fdb-entries-only-on-vlan-0'
Petr Machata says: ==================== bridge: Allow keeping local FDB entries only on VLAN 0 The bridge FDB contains one local entry per port per VLAN, for the MAC of the port in question, and likewise for the bridge itself. This allows bridge to locally receive and punt "up" any packets whose destination MAC address matches that of one of the bridge interfaces or of the bridge itself. The number of these local "service" FDB entries grows linearly with number of bridge-global VLAN memberships, but that in turn will tend to grow quadratically with number of ports and per-port VLAN memberships. While that does not cause issues during forwarding lookups, it does make dumps impractically slow. As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB that just contains these 400K local entries, takes 6.5s. That's _without_ considering iproute2 formatting overhead, this is just how long it takes to walk the FDB (repeatedly), serialize it into netlink messages, and parse the messages back in userspace. This is to illustrate that with growing number of ports and VLANs, the time required to dump this repetitive information blows up. Arguably 4K VLANs per interface is not a very realistic configuration, but then modern switches can instead have several hundred interfaces, and we have fielded requests for >1K VLAN memberships per port among customers. FDB entries are currently all kept on a single linked list, and then dumping uses this linked list to walk all entries and dump them in order. When the message buffer is full, the iteration is cut short, and later restarted. Of course, to restart the iteration, it's first necessary to walk the already-dumped front part of the list before starting dumping again. So one possibility is to organize the FDB entries in different structure more amenable to walk restarts. One option is to walk directly the hash table. The advantage is that no auxiliary structure needs to be introduced. With a rough sketch of this approach, the above scenario gets dumped in not quite 3 s, saving over 50 % of time. However hash table iteration requires maintaining an active cursor that must be collected when the dump is aborted. It looks like that would require changes in the NDO protocol to allow to run this cleanup. Moreover, on hash table resize the iteration is simply restarted. FDB dumps are currently not guaranteed to correspond to any one particular state: entries can be missed, or be duplicated. But with hash table iteration we would get that plus the much less graceful resize behavior, where swaths of FDB are duplicated. Another option is to maintain the FDB entries in a red-black tree. We have a PoC of this approach on hand, and the above scenario is dumped in about 2.5 s. Still not as snappy as we'd like it, but better than the hash table. However the savings come at the expense of a more expensive insertion, and require locking during dumps, which blocks insertion. The upside of these approaches is that they provide benefits whatever the FDB contents. But it does not seem like either of these is workable. However we intend to clean up the RB tree PoC and present it for consideration later on in case the trade-offs are considered acceptable. Yet another option might be to use in-kernel FDB filtering, and to filter the local entries when dumping. Unfortunately, this does not help all that much either, because the linked-list walk still needs to happen. Also, with the obvious filtering interface built around ndm_flags / ndm_state filtering, one can't just exclude pure local entries in one query. One needs to dump all non-local entries first, and then to get permanent entries in another run filter local & added_by_user. I.e. one needs to pay the iteration overhead twice, and then integrate the result in userspace. To get significant savings, one would need a very specific knob like "dump, but skip/only include local entries". But if we are adding a local-specific knobs, maybe let's have an option to just not duplicate them in the first place. All this FDB duplication is there merely to make things snappy during forwarding. But high-radix switches with thousands of VLANs typically do not process much traffic in the SW datapath at all, but rather offload vast majority of it. So we could exchange some of the runtime performance for a neater FDB. To that end, in this patchset, introduce a new bridge option, BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries installed only on VLAN 0, instead of duplicating them across all VLANs. Then to maintain the local termination behavior, on FDB miss, the bridge does a second lookup on VLAN 0. Enabling this option changes the bridge behavior in expected ways. Since the entries are only kept on VLAN 0, FDB get, flush and dump will not perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects forwarding on all VLANs. This patchset is loosely based on a privately circulated patch by Nikolay Aleksandrov. The patchset progresses as follows: - Patch kernel-patches#1 introduces a bridge option to enable the above feature. Then patches kernel-patches#2 to kernel-patches#5 gradually patch the bridge to do the right thing when the option is enabled. Finally patch kernel-patches#6 adds the UAPI knob and the code for when the feature is enabled or disabled. - Patches kernel-patches#7, kernel-patches#8 and kernel-patches#9 contain fixes and improvements to selftest libraries - Patch kernel-patches#10 contains a new selftest ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 1828210 + dbd9134 commit bf2650d

File tree

10 files changed

+559
-28
lines changed

10 files changed

+559
-28
lines changed

include/uapi/linux/if_bridge.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,6 +823,8 @@ struct br_mcast_stats {
823823
/* bridge boolean options
824824
* BR_BOOLOPT_NO_LL_LEARN - disable learning from link-local packets
825825
* BR_BOOLOPT_MCAST_VLAN_SNOOPING - control vlan multicast snooping
826+
* BR_BOOLOPT_FDB_LOCAL_VLAN_0 - local FDB entries installed by the bridge
827+
* driver itself should only be added on VLAN 0
826828
*
827829
* IMPORTANT: if adding a new option do not forget to handle
828830
* it in br_boolopt_toggle/get and bridge sysfs
@@ -832,6 +834,7 @@ enum br_boolopt_id {
832834
BR_BOOLOPT_MCAST_VLAN_SNOOPING,
833835
BR_BOOLOPT_MST_ENABLE,
834836
BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION,
837+
BR_BOOLOPT_FDB_LOCAL_VLAN_0,
835838
BR_BOOLOPT_MAX
836839
};
837840

net/bridge/br.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,23 @@ static struct notifier_block br_switchdev_blocking_notifier = {
259259
.notifier_call = br_switchdev_blocking_event,
260260
};
261261

262+
static int
263+
br_toggle_fdb_local_vlan_0(struct net_bridge *br, bool on,
264+
struct netlink_ext_ack *extack)
265+
{
266+
int err;
267+
268+
if (br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0) == on)
269+
return 0;
270+
271+
err = br_fdb_toggle_local_vlan_0(br, on, extack);
272+
if (err)
273+
return err;
274+
275+
br_opt_toggle(br, BROPT_FDB_LOCAL_VLAN_0, on);
276+
return 0;
277+
}
278+
262279
/* br_boolopt_toggle - change user-controlled boolean option
263280
*
264281
* @br: bridge device
@@ -287,6 +304,9 @@ int br_boolopt_toggle(struct net_bridge *br, enum br_boolopt_id opt, bool on,
287304
case BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION:
288305
br_opt_toggle(br, BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION, on);
289306
break;
307+
case BR_BOOLOPT_FDB_LOCAL_VLAN_0:
308+
err = br_toggle_fdb_local_vlan_0(br, on, extack);
309+
break;
290310
default:
291311
/* shouldn't be called with unsupported options */
292312
WARN_ON(1);
@@ -307,6 +327,8 @@ int br_boolopt_get(const struct net_bridge *br, enum br_boolopt_id opt)
307327
return br_opt_get(br, BROPT_MST_ENABLED);
308328
case BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION:
309329
return br_opt_get(br, BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION);
330+
case BR_BOOLOPT_FDB_LOCAL_VLAN_0:
331+
return br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
310332
default:
311333
/* shouldn't be called with unsupported options */
312334
WARN_ON(1);

net/bridge/br_fdb.c

Lines changed: 108 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,9 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
459459
struct net_bridge_fdb_entry *f;
460460
struct net_bridge *br = p->br;
461461
struct net_bridge_vlan *v;
462+
bool local_vlan_0;
463+
464+
local_vlan_0 = br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
462465

463466
spin_lock_bh(&br->hash_lock);
464467
vg = nbp_vlan_group(p);
@@ -468,11 +471,11 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
468471
/* delete old one */
469472
fdb_delete_local(br, p, f);
470473

471-
/* if this port has no vlan information
472-
* configured, we can safely be done at
473-
* this point.
474+
/* if this port has no vlan information configured, or
475+
* local entries are only kept on VLAN 0, we can safely
476+
* be done at this point.
474477
*/
475-
if (!vg || !vg->num_vlans)
478+
if (!vg || !vg->num_vlans || local_vlan_0)
476479
goto insert;
477480
}
478481
}
@@ -481,7 +484,7 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
481484
/* insert new address, may fail if invalid address or dup. */
482485
fdb_add_local(br, p, newaddr, 0);
483486

484-
if (!vg || !vg->num_vlans)
487+
if (!vg || !vg->num_vlans || local_vlan_0)
485488
goto done;
486489

487490
/* Now add entries for every VLAN configured on the port.
@@ -500,6 +503,9 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
500503
struct net_bridge_vlan_group *vg;
501504
struct net_bridge_fdb_entry *f;
502505
struct net_bridge_vlan *v;
506+
bool local_vlan_0;
507+
508+
local_vlan_0 = br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0);
503509

504510
spin_lock_bh(&br->hash_lock);
505511

@@ -511,7 +517,7 @@ void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr)
511517

512518
fdb_add_local(br, NULL, newaddr, 0);
513519
vg = br_vlan_group(br);
514-
if (!vg || !vg->num_vlans)
520+
if (!vg || !vg->num_vlans || local_vlan_0)
515521
goto out;
516522
/* Now remove and add entries for every VLAN configured on the
517523
* bridge. This function runs under RTNL so the bitmap will not
@@ -576,6 +582,102 @@ void br_fdb_cleanup(struct work_struct *work)
576582
mod_delayed_work(system_long_wq, &br->gc_work, work_delay);
577583
}
578584

585+
static void br_fdb_delete_locals_per_vlan_port(struct net_bridge *br,
586+
struct net_bridge_port *p)
587+
{
588+
struct net_bridge_vlan_group *vg;
589+
struct net_bridge_vlan *v;
590+
struct net_device *dev;
591+
592+
if (p) {
593+
vg = nbp_vlan_group(p);
594+
dev = p->dev;
595+
} else {
596+
vg = br_vlan_group(br);
597+
dev = br->dev;
598+
}
599+
600+
list_for_each_entry(v, &vg->vlan_list, vlist)
601+
br_fdb_find_delete_local(br, p, dev->dev_addr, v->vid);
602+
}
603+
604+
static void br_fdb_delete_locals_per_vlan(struct net_bridge *br)
605+
{
606+
struct net_bridge_port *p;
607+
608+
ASSERT_RTNL();
609+
610+
list_for_each_entry(p, &br->port_list, list)
611+
br_fdb_delete_locals_per_vlan_port(br, p);
612+
613+
br_fdb_delete_locals_per_vlan_port(br, NULL);
614+
}
615+
616+
static int br_fdb_insert_locals_per_vlan_port(struct net_bridge *br,
617+
struct net_bridge_port *p,
618+
struct netlink_ext_ack *extack)
619+
{
620+
struct net_bridge_vlan_group *vg;
621+
struct net_bridge_vlan *v;
622+
struct net_device *dev;
623+
int err;
624+
625+
if (p) {
626+
vg = nbp_vlan_group(p);
627+
dev = p->dev;
628+
} else {
629+
vg = br_vlan_group(br);
630+
dev = br->dev;
631+
}
632+
633+
list_for_each_entry(v, &vg->vlan_list, vlist) {
634+
if (!br_vlan_should_use(v))
635+
continue;
636+
637+
err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
638+
if (err)
639+
return err;
640+
}
641+
642+
return 0;
643+
}
644+
645+
static int br_fdb_insert_locals_per_vlan(struct net_bridge *br,
646+
struct netlink_ext_ack *extack)
647+
{
648+
struct net_bridge_port *p;
649+
int err;
650+
651+
ASSERT_RTNL();
652+
653+
list_for_each_entry(p, &br->port_list, list) {
654+
err = br_fdb_insert_locals_per_vlan_port(br, p, extack);
655+
if (err)
656+
goto rollback;
657+
}
658+
659+
err = br_fdb_insert_locals_per_vlan_port(br, NULL, extack);
660+
if (err)
661+
goto rollback;
662+
663+
return 0;
664+
665+
rollback:
666+
NL_SET_ERR_MSG_MOD(extack, "fdb_local_vlan_0 toggle: FDB entry insertion failed");
667+
br_fdb_delete_locals_per_vlan(br);
668+
return err;
669+
}
670+
671+
int br_fdb_toggle_local_vlan_0(struct net_bridge *br, bool on,
672+
struct netlink_ext_ack *extack)
673+
{
674+
if (!on)
675+
return br_fdb_insert_locals_per_vlan(br, extack);
676+
677+
br_fdb_delete_locals_per_vlan(br);
678+
return 0;
679+
}
680+
579681
static bool __fdb_flush_matches(const struct net_bridge *br,
580682
const struct net_bridge_fdb_entry *f,
581683
const struct net_bridge_fdb_flush_desc *desc)

net/bridge/br_input.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,14 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
202202
break;
203203
case BR_PKT_UNICAST:
204204
dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
205+
if (unlikely(!dst && vid &&
206+
br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0))) {
207+
dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, 0);
208+
if (dst &&
209+
(!test_bit(BR_FDB_LOCAL, &dst->flags) ||
210+
test_bit(BR_FDB_ADDED_BY_USER, &dst->flags)))
211+
dst = NULL;
212+
}
205213
break;
206214
default:
207215
break;

net/bridge/br_private.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -487,6 +487,7 @@ enum net_bridge_opts {
487487
BROPT_MCAST_VLAN_SNOOPING_ENABLED,
488488
BROPT_MST_ENABLED,
489489
BROPT_MDB_OFFLOAD_FAIL_NOTIFICATION,
490+
BROPT_FDB_LOCAL_VLAN_0,
490491
};
491492

492493
struct net_bridge {
@@ -843,6 +844,8 @@ void br_fdb_find_delete_local(struct net_bridge *br,
843844
void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr);
844845
void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr);
845846
void br_fdb_cleanup(struct work_struct *work);
847+
int br_fdb_toggle_local_vlan_0(struct net_bridge *br, bool on,
848+
struct netlink_ext_ack *extack);
846849
void br_fdb_delete_by_port(struct net_bridge *br,
847850
const struct net_bridge_port *p, u16 vid, int do_all);
848851
struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,

net/bridge/br_vlan.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -331,10 +331,12 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags,
331331

332332
/* Add the dev mac and count the vlan only if it's usable */
333333
if (br_vlan_should_use(v)) {
334-
err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
335-
if (err) {
336-
br_err(br, "failed insert local address into bridge forwarding table\n");
337-
goto out_filt;
334+
if (!br_opt_get(br, BROPT_FDB_LOCAL_VLAN_0)) {
335+
err = br_fdb_add_local(br, p, dev->dev_addr, v->vid);
336+
if (err) {
337+
br_err(br, "failed insert local address into bridge forwarding table\n");
338+
goto out_filt;
339+
}
338340
}
339341
vg->num_vlans++;
340342
}

tools/testing/selftests/net/forwarding/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ TEST_PROGS = \
55
bridge_fdb_learning_limit.sh \
66
bridge_igmp.sh \
77
bridge_locked_port.sh \
8+
bridge_fdb_local_vlan_0.sh \
89
bridge_mdb.sh \
910
bridge_mdb_host.sh \
1011
bridge_mdb_max.sh \

0 commit comments

Comments
 (0)