Skip to content

Commit 293b147

Browse files
vladimirolteanNipaLocal
authored and
NipaLocal
committed
net: mscc: ocelot: be resilient to loss of PTP packets during transmission
The Felix DSA driver presents unique challenges that make the simplistic ocelot PTP TX timestamping procedure unreliable: any transmitted packet may be lost in hardware before it ever leaves our local system. This may happen because there is congestion on the DSA conduit, the switch CPU port or even user port (Qdiscs like taprio may delay packets indefinitely by design). The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(), runs out of timestamp IDs eventually, because it never detects that packets are lost, and keeps the IDs of the lost packets on hold indefinitely. The manifestation of the issue once the entire timestamp ID range becomes busy looks like this in dmesg: mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp At the surface level, we need a timeout timer so that the kernel knows a timestamp ID is available again. But there is a deeper problem with the implementation, which is the monotonically increasing ocelot_port->ts_id. In the presence of packet loss, it will be impossible to detect that and reuse one of the holes created in the range of free timestamp IDs. What we actually need is a bitmap of 63 timestamp IDs tracking which one is available. That is able to use up holes caused by packet loss, but also gives us a unique opportunity to not implement an actual timer_list for the timeout timer (very complicated in terms of locking). We could only declare a timestamp ID stale on demand (lazily), aka when there's no other timestamp ID available. There are pros and cons to this approach: the implementation is much more simple than per-packet timers would be, but most of the stale packets would be quasi-leaked - not really leaked, but blocked in driver memory, since this algorithm sees no reason to free them. An improved technique would be to check for stale timestamp IDs every time we allocate a new one. Assuming a constant flux of PTP packets, this avoids stale packets being blocked in memory, but of course, packets lost at the end of the flux are still blocked until the flux resumes (nobody left to kick them out). Since implementing per-packet timers is way too complicated, this should be good enough. Testing procedure: Persistently block traffic class 5 and try to run PTP on it: $ tc qdisc replace dev swp3 parent root taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0xdf 100000 flags 0x2 [ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS $ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3 ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[70.406]: timed out while polling for tx timestamp ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[70.406]: port 1 (swp3): send peer delay response failed ptp4l[70.407]: port 1 (swp3): clearing fault immediately ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1 ptp4l[71.400]: timed out while polling for tx timestamp ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[71.401]: port 1 (swp3): send peer delay response failed ptp4l[71.401]: port 1 (swp3): clearing fault immediately [ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2 ptp4l[72.401]: timed out while polling for tx timestamp ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[72.402]: port 1 (swp3): send peer delay response failed ptp4l[72.402]: port 1 (swp3): clearing fault immediately ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3 ptp4l[73.400]: timed out while polling for tx timestamp ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[73.400]: port 1 (swp3): send peer delay response failed ptp4l[73.400]: port 1 (swp3): clearing fault immediately [ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4 ptp4l[74.400]: timed out while polling for tx timestamp ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[74.401]: port 1 (swp3): send peer delay response failed ptp4l[74.401]: port 1 (swp3): clearing fault immediately ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[75.410]: timed out while polling for tx timestamp ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[75.411]: port 1 (swp3): send peer delay response failed ptp4l[75.411]: port 1 (swp3): clearing fault immediately (...) Remove the blocking condition and see that the port recovers: $ same tc command as above, but use "sched-entry S 0xff" instead $ same ptp4l command as above ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost [ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost [ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost [ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost [ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d ptp4l[104.966]: port 1 (swp3): assuming the grand master role ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER [ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0 (...) Notice that in this new usage pattern, a non-congested port should basically use timestamp ID 0 all the time, progressing to higher numbers only if there are unacknowledged timestamps in flight. Compare this to the old usage, where the timestamp ID used to monotonically increase modulo OCELOT_MAX_PTP_ID. In terms of implementation, this simplifies the bookkeeping of the ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse the list of two-step timestampable skbs for each new packet anyway, the information can already be computed and does not need to be stored. Also, ocelot_port->tx_skbs is always accessed under the switch-wide ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's lock and can use the unlocked primitives safely. This problem was actually detected using the tc-taprio offload, and is causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959) supports but Ocelot (VSC7514) does not. Thus, I've selected the commit to blame as the one adding initial timestamping support for the Felix switch. Fixes: c0bcf53 ("net: dsa: ocelot: add hardware timestamping support for Felix") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: NipaLocal <nipa@local>
1 parent fde5ac6 commit 293b147

File tree

3 files changed

+83
-60
lines changed

3 files changed

+83
-60
lines changed

drivers/net/ethernet/mscc/ocelot_ptp.c

Lines changed: 82 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
#include <soc/mscc/ocelot.h>
1515
#include "ocelot.h"
1616

17+
#define OCELOT_PTP_TX_TSTAMP_TIMEOUT (5 * HZ)
18+
1719
int ocelot_ptp_gettime64(struct ptp_clock_info *ptp, struct timespec64 *ts)
1820
{
1921
struct ocelot *ocelot = container_of(ptp, struct ocelot, ptp_info);
@@ -603,35 +605,89 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port,
603605
}
604606
EXPORT_SYMBOL(ocelot_get_ts_info);
605607

606-
static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port,
608+
static struct sk_buff *ocelot_port_dequeue_ptp_tx_skb(struct ocelot *ocelot,
609+
int port, u8 ts_id,
610+
u32 seqid)
611+
{
612+
struct ocelot_port *ocelot_port = ocelot->ports[port];
613+
struct sk_buff *skb, *skb_tmp, *skb_match = NULL;
614+
struct ptp_header *hdr;
615+
616+
spin_lock(&ocelot->ts_id_lock);
617+
618+
skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) {
619+
if (OCELOT_SKB_CB(skb)->ts_id != ts_id)
620+
continue;
621+
622+
/* Check that the timestamp ID is for the expected PTP
623+
* sequenceId. We don't have to test ptp_parse_header() against
624+
* NULL, because we've pre-validated the packet's ptp_class.
625+
*/
626+
hdr = ptp_parse_header(skb, OCELOT_SKB_CB(skb)->ptp_class);
627+
if (seqid != ntohs(hdr->sequence_id))
628+
continue;
629+
630+
__skb_unlink(skb, &ocelot_port->tx_skbs);
631+
ocelot->ptp_skbs_in_flight--;
632+
skb_match = skb;
633+
break;
634+
}
635+
636+
spin_unlock(&ocelot->ts_id_lock);
637+
638+
return skb_match;
639+
}
640+
641+
static int ocelot_port_queue_ptp_tx_skb(struct ocelot *ocelot, int port,
607642
struct sk_buff *clone)
608643
{
609644
struct ocelot_port *ocelot_port = ocelot->ports[port];
645+
DECLARE_BITMAP(ts_id_in_flight, OCELOT_MAX_PTP_ID);
646+
struct sk_buff *skb, *skb_tmp;
610647
unsigned long flags;
648+
unsigned long n;
611649

612650
spin_lock_irqsave(&ocelot->ts_id_lock, flags);
613651

614-
if (ocelot_port->ptp_skbs_in_flight == OCELOT_MAX_PTP_ID ||
615-
ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) {
652+
/* To get a better chance of acquiring a timestamp ID, first flush the
653+
* stale packets still waiting in the TX timestamping queue. They are
654+
* probably lost.
655+
*/
656+
skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) {
657+
if (time_before(OCELOT_SKB_CB(skb)->ptp_tx_time +
658+
OCELOT_PTP_TX_TSTAMP_TIMEOUT, jiffies)) {
659+
dev_warn_ratelimited(ocelot->dev,
660+
"port %d invalidating stale timestamp ID %u which seems lost\n",
661+
port, OCELOT_SKB_CB(skb)->ts_id);
662+
__skb_unlink(skb, &ocelot_port->tx_skbs);
663+
dev_kfree_skb_any(skb);
664+
ocelot->ptp_skbs_in_flight--;
665+
} else {
666+
__set_bit(OCELOT_SKB_CB(skb)->ts_id, ts_id_in_flight);
667+
}
668+
}
669+
670+
if (ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) {
616671
spin_unlock_irqrestore(&ocelot->ts_id_lock, flags);
617672
return -EBUSY;
618673
}
619674

620-
skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS;
621-
/* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */
622-
OCELOT_SKB_CB(clone)->ts_id = ocelot_port->ts_id;
623-
624-
ocelot_port->ts_id++;
625-
if (ocelot_port->ts_id == OCELOT_MAX_PTP_ID)
626-
ocelot_port->ts_id = 0;
675+
n = find_first_zero_bit(ts_id_in_flight, OCELOT_MAX_PTP_ID);
676+
if (n == OCELOT_MAX_PTP_ID) {
677+
spin_unlock_irqrestore(&ocelot->ts_id_lock, flags);
678+
return -EBUSY;
679+
}
627680

628-
ocelot_port->ptp_skbs_in_flight++;
681+
/* Found an available timestamp ID, use it */
682+
OCELOT_SKB_CB(clone)->ts_id = n;
683+
OCELOT_SKB_CB(clone)->ptp_tx_time = jiffies;
629684
ocelot->ptp_skbs_in_flight++;
630-
631-
skb_queue_tail(&ocelot_port->tx_skbs, clone);
685+
__skb_queue_tail(&ocelot_port->tx_skbs, clone);
632686

633687
spin_unlock_irqrestore(&ocelot->ts_id_lock, flags);
634688

689+
dev_dbg_ratelimited(ocelot->dev, "port %d timestamp id %lu\n", port, n);
690+
635691
return 0;
636692
}
637693

@@ -687,10 +743,12 @@ int ocelot_port_txtstamp_request(struct ocelot *ocelot, int port,
687743
if (!(*clone))
688744
return -ENOMEM;
689745

690-
err = ocelot_port_add_txtstamp_skb(ocelot, port, *clone);
746+
/* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */
747+
err = ocelot_port_queue_ptp_tx_skb(ocelot, port, *clone);
691748
if (err)
692749
return err;
693750

751+
skb_shinfo(*clone)->tx_flags |= SKBTX_IN_PROGRESS;
694752
OCELOT_SKB_CB(skb)->ptp_cmd = ptp_cmd;
695753
OCELOT_SKB_CB(*clone)->ptp_class = ptp_class;
696754
}
@@ -726,28 +784,15 @@ static void ocelot_get_hwtimestamp(struct ocelot *ocelot,
726784
spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags);
727785
}
728786

729-
static bool ocelot_validate_ptp_skb(struct sk_buff *clone, u16 seqid)
730-
{
731-
struct ptp_header *hdr;
732-
733-
hdr = ptp_parse_header(clone, OCELOT_SKB_CB(clone)->ptp_class);
734-
if (WARN_ON(!hdr))
735-
return false;
736-
737-
return seqid == ntohs(hdr->sequence_id);
738-
}
739-
740787
void ocelot_get_txtstamp(struct ocelot *ocelot)
741788
{
742789
int budget = OCELOT_PTP_QUEUE_SZ;
743790

744791
while (budget--) {
745-
struct sk_buff *skb, *skb_tmp, *skb_match = NULL;
746792
struct skb_shared_hwtstamps shhwtstamps;
747793
u32 val, id, seqid, txport;
748-
struct ocelot_port *port;
794+
struct sk_buff *skb_match;
749795
struct timespec64 ts;
750-
unsigned long flags;
751796

752797
val = ocelot_read(ocelot, SYS_PTP_STATUS);
753798

@@ -762,36 +807,15 @@ void ocelot_get_txtstamp(struct ocelot *ocelot)
762807
txport = SYS_PTP_STATUS_PTP_MESS_TXPORT_X(val);
763808
seqid = SYS_PTP_STATUS_PTP_MESS_SEQ_ID(val);
764809

765-
port = ocelot->ports[txport];
766-
767-
spin_lock(&ocelot->ts_id_lock);
768-
port->ptp_skbs_in_flight--;
769-
ocelot->ptp_skbs_in_flight--;
770-
spin_unlock(&ocelot->ts_id_lock);
771-
772810
/* Retrieve its associated skb */
773-
try_again:
774-
spin_lock_irqsave(&port->tx_skbs.lock, flags);
775-
776-
skb_queue_walk_safe(&port->tx_skbs, skb, skb_tmp) {
777-
if (OCELOT_SKB_CB(skb)->ts_id != id)
778-
continue;
779-
__skb_unlink(skb, &port->tx_skbs);
780-
skb_match = skb;
781-
break;
782-
}
783-
784-
spin_unlock_irqrestore(&port->tx_skbs.lock, flags);
785-
786-
if (WARN_ON(!skb_match))
787-
continue;
788-
789-
if (!ocelot_validate_ptp_skb(skb_match, seqid)) {
790-
dev_err_ratelimited(ocelot->dev,
791-
"port %d received stale TX timestamp for seqid %d, discarding\n",
792-
txport, seqid);
793-
dev_kfree_skb_any(skb);
794-
goto try_again;
811+
skb_match = ocelot_port_dequeue_ptp_tx_skb(ocelot, txport, id,
812+
seqid);
813+
if (!skb_match) {
814+
dev_warn_ratelimited(ocelot->dev,
815+
"port %d received TX timestamp (seqid %d, ts id %u) for packet previously declared stale\n",
816+
txport, seqid, id);
817+
dev_kfree_skb_any(skb_match);
818+
goto next_ts;
795819
}
796820

797821
/* Get the h/w timestamp */
@@ -802,7 +826,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot)
802826
shhwtstamps.hwtstamp = ktime_set(ts.tv_sec, ts.tv_nsec);
803827
skb_complete_tx_timestamp(skb_match, &shhwtstamps);
804828

805-
/* Next ts */
829+
next_ts:
806830
ocelot_write(ocelot, SYS_PTP_NXT_PTP_NXT, SYS_PTP_NXT);
807831
}
808832
}

include/linux/dsa/ocelot.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
struct ocelot_skb_cb {
1616
struct sk_buff *clone;
1717
unsigned int ptp_class; /* valid only for clones */
18+
unsigned long ptp_tx_time; /* valid only for clones */
1819
u32 tstamp_lo;
1920
u8 ptp_cmd;
2021
u8 ts_id;

include/soc/mscc/ocelot.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -778,15 +778,13 @@ struct ocelot_port {
778778

779779
phy_interface_t phy_mode;
780780

781-
unsigned int ptp_skbs_in_flight;
782781
struct sk_buff_head tx_skbs;
783782

784783
unsigned int trap_proto;
785784

786785
u16 mrp_ring_id;
787786

788787
u8 ptp_cmd;
789-
u8 ts_id;
790788

791789
u8 index;
792790

0 commit comments

Comments
 (0)