Skip to content

ttyAMA0 (PL011) driver data corruption #4453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DodoB opened this issue Jul 13, 2021 · 19 comments
Open

ttyAMA0 (PL011) driver data corruption #4453

DodoB opened this issue Jul 13, 2021 · 19 comments

Comments

@DodoB
Copy link

DodoB commented Jul 13, 2021

The Bug

Data received on ttyAMA0 (on the Raspberry Pi 4B GPIO header) exhibits data corruption on received data. The os is Raspbian with kernel verison 5.4.72-v7l+ (Linux raspberrypi 5.4.72-v7l+ #1356 SMP Thu Oct 22 13:57:51 BST 2020 armv7l GNU/Linux). I am seeing this at high data rates (3Mb/s) but I am sure it is not a hardware issue (see details below). The type of corruption is not related to changed values, due to fliped bits or such. It involves missing and added bytes.

To reproduce

Send data to the serial port. This may be done from an external device or simply by looping the port back to itself. Compare the received data against the original data. I have verified the signal quality with an oscilloscope and do not see any issues. The signal is very clean. I also use a USB serial adapter to simultaneously receive from the same line. The USB adapter (FTDI based) receives the data clean, while ttyAMA0 shows the problem. Note that the occurrence is unpredictable. Sometimes it shows up after only a few transmissions, sometimes it requires well over 100k transmissions before it shows.

The occurrence seems to be much higher with a high transmission frequency (every few ms) and data packets of varying sizes (random data). But this might simply be a time thing. Other activity on the machine also seems to have an influence, the more activity the more likely a data corruption occurs. It is my impression that a race condition in the driver may be involved.

In situations where transmissions are single byte or very short, this might look like the transmission was not received or the data byte was corrupted and might be mistaken by an observer for a hardware issue.

I am attaching C source code for an application that can run the test, including simultaneous reception from a second serial port. If started with -s on the command line it will automatically stop upon encountering the problem and produce a dump of the faulty data. To run the test, jumper the serial port on the GPIO adapter to loop it back to itself. To verify good data reception, connect a 3.3V logic level serial/USB adapter to the Raspberry Pi and list the path for that adapter (probably /dev/ttyUSB0) on the command line.

Expected behavior

Data is received exactly as sent.

Actual behavior

The received data may be corrupted in all sorts of ways. There may be more data than sent (quite rare), there may be less data than sent with changed bytes (quite frequent) there might be the same amount, but bytes have changed.

A pattern as the following is quite frequent.

  • T denotes the transmitted packet of data
  • P denotes the data receved on the primary channel (the une under test, i.e. ttyAMA0)
  • R denotes the reference channel (the serial/USB adapter)
    ^ indicates the location where the (first) error was detected.

T |44|AE|93|2E|F7|49|47|90|1C|48|DD|BE|06|40|82|48|85|43|BF|16|DD|27|79|53|9F|08|6D|F4|9D|03|74|61|30|87|0E|A6|CF|54|B5|6A|
P |44|AE|93|2E|F7|49|47|90|1C|48|DD|BE|06|40|82|48|85|43|BF|16|DD|27|79|53|9F|08|6D|F4|9D|03|74|61|87|00|0E|A6|CF|54|B5|6A|
R |44|AE|93|2E|F7|49|47|90|1C|48|DD|BE|06|40|82|48|85|43|BF|16|DD|27|79|53|9F|08|6D|F4|9D|03|74|61|30|87|0E|A6|CF|54|B5|6A|
^

Note that 30h is missing, 87h is correct, but shows in place of the missing 30h, followed by a 00h that does not exist in the original data. Then the data stream goes on without error. The reference data received from the USB adapter matches the transmitted packet. The missing byte and the next byte followed by 00h shows up a lot, also in connection with other types of corruption, such as fewer total bytes received.

Logs

The following logs show a few select scenarios.


Error showed after 119845 packet transmissions. In this case data was received from the tty driver in two fragments.

Transmission count = 119845
Error count = 1
Max skip count = 16
Error type = 5
Bytes transmitted = 45
Primary received = 44
Reference received = 45
Data:
T |91|D8|C6|4B|F2|8E|50|0A|19|B0|F2|5B|D0|C8|7C|50|B4|74|30|8F|03|39|83|6E|13|8E|97|FA|E8|4F|A4|F8|A6|E9|C3|99|F6|92|22|10|42|14|6A|92|5B|
P |91|D8|C6|4B|F2|8E|50|0A|19|B0|F2|5B|D0|C8|7C|50|B4|74|30|8F|03|39|83|6E|13|8E|97|FA|E8|4F|A4|F8|C3|00|99|F6|92|22|10|42|14|6A|92|5B|
R |91|D8|C6|4B|F2|8E|50|0A|19|B0|F2|5B|D0|C8|7C|50|B4|74|30|8F|03|39|83|6E|13|8E|97|FA|E8|4F|A4|F8|A6|E9|C3|99|F6|92|22|10|42|14|6A|92|5B|
^
Primary channel fragments
|91|D8|C6|4B|F2|8E|50|0A|19|B0|F2|5B|D0|C8|7C|50|B4|74|30|8F|03|39|83|6E|13|8E|97|FA|E8|4F|A4|F8|C3|00|99|
|F6|92|22|10|42|14|6A|92|5B|


Transmission count = 118443
Error count = 1
Max skip count = 16
Error type = 5
Bytes transmitted = 59
Primary received = 57
Reference received = 59
Data:
T |E8|8B|82|B6|24|D9|4E|2C|38|C3|B8|DE|C1|7C|57|43|0B|50|BF|2A|E5|2E|B9|BD|BE|15|1B|39|08|39|23|EF|43|A4|26|67|FD|F2|12|35|B5|C9|93|76|C4|E9|B8|CF|39|F6|F8|1F|25|31|5B|62|46|75|1A|
P |E8|8B|82|B6|24|D9|4E|2C|38|C3|B8|DE|C1|7C|57|43|0B|50|BF|2A|E5|2E|B9|BD|BE|15|1B|39|08|39|23|EF|43|A4|26|67|FD|F2|12|35|B5|C9|93|76|C4|E9|B8|CF|39|F6|F8|5B|00|62|46|75|1A|
R |E8|8B|82|B6|24|D9|4E|2C|38|C3|B8|DE|C1|7C|57|43|0B|50|BF|2A|E5|2E|B9|BD|BE|15|1B|39|08|39|23|EF|43|A4|26|67|FD|F2|12|35|B5|C9|93|76|C4|E9|B8|CF|39|F6|F8|1F|25|31|5B|62|46|75|1A|
^
Primary channel fragments
|E8|8B|82|B6|24|D9|4E|2C|38|C3|B8|DE|C1|7C|57|43|0B|50|BF|
|2A|E5|2E|B9|BD|BE|15|1B|39|08|39|23|EF|43|A4|26|67|FD|F2|12|35|B5|C9|93|76|C4|E9|B8|CF|39|F6|F8|5B|00|62|46|75|1A|


Transmission count = 1
Error count = 1
Max skip count = 0
Error type = 1
Bytes transmitted = 29
Primary received = 30
Reference received = 29
Data:
T |98|A3|56|54|BF|F2|FD|FA|7A|6C|53|15|14|EA|E3|2E|52|8F|20|57|09|58|28|A8|06|D5|D1|53|83|
P |98|00|A3|56|54|BF|F2|FD|FA|7A|6C|53|15|14|EA|E3|2E|52|8F|20|57|09|58|28|A8|06|D5|D1|53|83|
R |98|A3|56|54|BF|F2|FD|FA|7A|6C|53|15|14|EA|E3|2E|52|8F|20|57|09|58|28|A8|06|D5|D1|53|83|
^
UARTTest.c.txt

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2021

Your results are consistent with FIFO overflow - when data is received while the UART RX FIFO is full the UART sets the overflow flag and drops the data. The inserted 00 is the point where the overflow is detected by the UART driver. Whether you appear to lose, gain or simply change data bytes depends on the number of contiguous overflowed bytes and the alignment with respect to the start of your test blocks.

Overflow is always going to be a possibility unless you connect the RTS and CTS lines and enable flow control on the port (CRTSCTS).

@DodoB
Copy link
Author

DodoB commented Jul 14, 2021

Thanks for your comment pelwell. That would make sense in terms of the situations when this is more/less likely to occur. Is there some way to influence that? i.e. can I compile the driver with a setting that increases the FIFO space? I have no clue where in the driver code I would start looking for the details and how to find out what to change and how.

Ultimately this should not be a show stopper for me, since I am working on my own driver for the HAT I have developed. Using the tty driver is just a stop gap measure until I have implemented direct UART access in the driver proper. Handshake lines are no solution in this case, as data will fall on the floor one way or the other (I can't stop or otherwise throttle my ultimate data sources). The only upside would be that I might be able to avoid packet destruction. But my data stream is resistent to that and will automatically re-synchronize.

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2021

The FIFO is in the UART hardware, so there is no possibility to increase its size. I'm not convinced that writing your own driver is going to help significantly - the issue here is interrupt latency. At 3Mbaud it take approximately 100us to fill the FIFO. Once one of the cores is in the interrupt handler it will be able to drain the FIFO pretty quickly, but if it misses that 100us window then it's game over. You might be able to use the interrupt affinity mechanism to allocate the UART interrupt (and only the UART interrupt) to one of the ARM cores; take a look at /proc/interrupts and /proc/irq//smp_affinity*:

pi@raspberrypi:~$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  9:          0          0          0          0     GICv2  25 Level     vgic
 12:       4861       3619       5069      12070     GICv2  30 Level     arch_timer
 13:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
 19:          0          0          0          0     GICv2 107 Level     fe004000.txp
 20:        693          0          0          0     GICv2  65 Level     fe00b880.mailbox
 23:         69          0          0          0     GICv2 153 Level     uart-pl011
...
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity
f
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity_list
0-3
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity
f
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity_list
0-3
pi@raspberrypi:~$ sudo sh -c "echo 3 > /proc/irq/23/smp_affinity_list"
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity_list
3
pi@raspberrypi:~$ cat /proc/irq/23/smp_affinity
8
pi@raspberrypi:~$ grep 23: /proc/interrupts
 23:        626          0          0         93     GICv2 153 Level     uart-pl011

Notice how core 3 has started to service UART interrupts.

I suggest you write a short script that searches through /proc/irq/*/smp_affinity_list, replacing "0-3" with "0-2", then changes smp_affinity_list of the uart-pl011 interrupt to 3.

@DodoB
Copy link
Author

DodoB commented Jul 14, 2021

Thanks for the suggestion. I'll have a look at that. At least at this point I know what I am up against.

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2021

Note that flow control might help you if the sending device has some additional buffering beyond whatever UART FIFOs it might have - the problem for the receiving Pi isn't that it can't keep up with the overall data rate but that it is occasionally slightly too late in responding.

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2021

In fact it's better than that - the latency tolerance of a flow controlled system is based on the total size of the FIFOs on both sides (for the direction in question, i.e. source TX FIFO and sink RX FIFO), so even a small TX FIFO should help provided the source side is writing in small chunks rather than a whole FIFO at a time.

@DodoB
Copy link
Author

DodoB commented Jul 14, 2021

Yes, I hear you. As almost always, the real world is a bit more involved though. First of all, I am writing the driver not because I think I can do UART better than someone else. There are numerous technical reasons why I need that driver. For one thing, one of my ultimate data sources/sinks on the HAT is CAN. I need to present that on the Pi as SocketCAN. That alone demands that I implement my interface as a driver. Besides, the driver is written and working. Using the tty driver is just a stop gap measure until I have the UART interface fully integrated in the driver. At the moment it's actually quite awkward routing all the data through user space. In fact, I suspect that is part of the FIFO problem. Knowing now what the problem is, perhaps I can improve on that. but mostly this path is a bit of a time sink.

I am actually surprised that the tty driver is not using DMA to clear out the FIFO. So far, in my mind, the FIFO was in RAM. That's why I was assuming it could be enlarged. The processing on my HAT is implemented on an M3 core. I am not using interrupts there, it's all handled with DMA. And yes, I can make that a bit larger. I had in mind to use DMA in interfacing with the UART.

One thing I was thinking of, was implementing a sort of XON/XOFF software handshake using my exchange protocol. But that feels more like a last resort. It might still end up dropping data. The only up side would be that it would drop complete transmission packets, instead of randomly destroying them.

@DodoB
Copy link
Author

DodoB commented Jul 14, 2021

In fact, I am just thinking I might be able to tune the way I am transmitting data on the HAT to make the job a bit easier for the Pi by modifying how I do bursts. I have to see...

@ktgoto
Copy link

ktgoto commented Mar 10, 2022

I am experiencing a similar issue on the Raspberry Pi 3B+ board running Raspberry Pi OS Lite with kernel 5.4.83-v7+.
The issue is quite similar in that data corruption occurs on ttyAMA0 port when receiving data, but it differs in the following points:

  1. Framing errors rather than overrun errors are detected when data corruption ocurrs. Overrun errors are sometimes detected too, but they are not so often.
  2. It occurs at low data speed such as 9600 bps and 19200 bps.
  3. Although it occurs on a board receiving data from another board connected with the serial port, it does not occur on a board with looping the serial port itself.

It looks like the baud rate or the UART clock rate of ttyAMA0 becomes unstable. Is there any workaround? Any information would be appreciated.

Test programs used for reproducing the issue

Sender program:

/*
 * test_send.c
 *
 * Description
 *	test_send sends 256-byte data to ttyAMA0 every 200 milliseconds, where the
 *	data only contain printable and null characters.
 *
 * Compile with
 *	gcc -o test_send test_send.c
 */

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <termios.h>
#include <unistd.h>

#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>

#define PORT_NAME "/dev/ttyAMA0"

static sig_atomic_t end;

static void signal_handler(int sig)
{
	(void)sig;

	end = 1;
}

int main()
{
	struct termios newtio;
	struct termios curtio;
	char buf[256];
	int fd;

	if (signal(SIGINT, signal_handler) == SIG_ERR)
		err(1, "signal() error");
	if (signal(SIGTERM, signal_handler) == SIG_ERR)
		err(1, "signal() error");

	fd = open(PORT_NAME , O_RDWR | O_NOCTTY);
	if (fd < 0)
		err(1, "open() error at line %d", __LINE__);

	if (tcgetattr(fd, &curtio) < 0)
		err(1, "tcgetattr() error at line %d", __LINE__);

	if (tcgetattr(fd, &newtio) < 0)
		err(1, "tcgetattr() error at line %d", __LINE__);

	/* set terminal to raw mode */
	newtio.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
		       | INLCR | IGNCR | ICRNL | IXON);
	newtio.c_oflag &= ~OPOST;
	newtio.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
	newtio.c_cflag &= ~(CSIZE | PARENB);
	newtio.c_cflag |= CS8;
	newtio.c_cc[VMIN] = 0;
	newtio.c_cc[VTIME] = 0;

	cfsetspeed(&newtio, B9600);

	if (tcsetattr(fd, TCSAFLUSH, &newtio) < 0)
		err(1, "tcsetattr() error at line %d", __LINE__);

	memset(buf, 0, sizeof(buf));
	snprintf(buf, sizeof(buf), "%s",
		 "0123456789abcdefghijklmnopqrstuvwxy"
		 "0123456789abcdefghijklmnopqrstuvwxy"
		 "0123456789abcdefghijklmnopqrstuvwxy"
		 "0123456789abcdefghijklmnopqrstuvwxy");

	while (!end) {
		write(fd, buf, sizeof(buf));
		tcdrain(fd);
		usleep(200 * 1000);
	}

	/* restore old terminal settings */
	if (tcsetattr(fd, TCSAFLUSH, &curtio) < 0)
		err(1, "tcsetattr() error at line %d", __LINE__);

	close(fd);

	return 0;
}

Receiver program:

/*
 * test_recv.c
 *
 * Description
 *	test_recv repeatedly does thing that opens ttyAMA0 and receives data from
 *	it after 3-second sleep. It shows the numbers of framing errors,
 *	overrun errors, and parity errors detected during 3-second sleep using
 *	TIOCGICOUNT ioctl command. It also informs that it received a character
 *	other than printable or null characters.
 *
 * Compile with
 *	gcc -o test_recv test_recv.c
 */

#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <linux/serial.h>
#include <fcntl.h>
#include <termios.h>
#include <unistd.h>

#include <err.h>
#include <ctype.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

#define PORT_NAME "/dev/ttyAMA0"

static sig_atomic_t end;

static void signal_handler(int sig)
{
	(void)sig;

	end = 1;
}

static bool check_corruption(const char *data, int count)
{
	int i;

	for (i = 0; i < count; i++) {
		if (data[i] == '\0')
			continue;
		if (!isprint(data[i]))
			return false;
	}

	return true;
}

static void recv_data(void)
{
	struct termios curtio;
	struct termios newtio;
	struct serial_icounter_struct ic[2];
	char buf[256];
	ssize_t n;
	int fd;

	fd  = open(PORT_NAME , O_RDWR | O_NOCTTY);
	if (fd < 0) {
		warn("open() error at line %d", __LINE__);
		return;
	}

	if (tcgetattr(fd, &curtio) < 0) {
		warn("tcgetattr() error at line %d", __LINE__);
		goto close_and_exit;
	}
	if (tcgetattr(fd, &newtio) < 0) {
		warn("tcgetattr() error at line %d", __LINE__);
		goto close_and_exit;
	}

	/* set terminal to raw mode */
	newtio.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
		       | INLCR | IGNCR | ICRNL | IXON);
	newtio.c_oflag &= ~OPOST;
	newtio.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
	newtio.c_cflag &= ~(CSIZE | PARENB);
	newtio.c_cflag |= CS8;
	newtio.c_cc[VMIN] = 0;
	newtio.c_cc[VTIME] = 0;

	cfsetspeed(&newtio, B9600);

	if (tcsetattr(fd, TCSAFLUSH, &newtio) < 0) {
		warn("tcsetattr() error at line %d", __LINE__);
		goto close_and_exit;
	}

	memset(ic, 0, sizeof(ic));
	if (ioctl(fd, TIOCGICOUNT, &ic[0]) < 0)
		warn("ioctl() error at line %d", __LINE__);

	sleep(3);

	if (ioctl(fd, TIOCGICOUNT, &ic[1]) < 0)
		warn("ioctl() error at line %d", __LINE__);

	/* Show errors occurred during sleep */
	if (ic[1].frame != ic[0].frame || ic[1].overrun != ic[0].overrun ||
	    ic[1].parity != ic[0].parity) {
		printf("Framing errors=%d, overrun errors=%d, parity errors=%d\n",
		       ic[1].frame - ic[0].frame, ic[1].overrun - ic[0].overrun,
		       ic[1].parity - ic[0].parity);
	}

	memset(buf, 0, sizeof(buf));
	n = read(fd, buf, sizeof(buf) - 1);
	if (n < 0)
		warn("read() error at line %d", __LINE__);
	else {
		if (!check_corruption(buf, n))
			printf("Data corruption occurred\n");
	}

	if (tcflush(fd, TCIOFLUSH) < 0)
		warn("tcflush() error at line %d", __LINE__);

	if (tcsetattr(fd, TCSAFLUSH, &curtio) < 0)
		warn("tcsetattr() error at line %d", __LINE__);

close_and_exit:
	close(fd);
}

int main() 
{
	int i;

	if (signal(SIGINT, signal_handler) == SIG_ERR)
		err(1, "signal() error");
	if (signal(SIGTERM, signal_handler) == SIG_ERR)
		err(1, "signal() error");

	for (i = 0; i < 1000 && !end; i++) {
		printf("-- %dth try\n", i + 1);
		recv_data();
		sleep(1);
	}

	return 0;
}

Log from the receiver program when the issue was reproduced:

-- 1th try
-- 2th try
Framing errors=149, overrun errors=0, parity errors=0
Data corruption occurred
-- 3th try
Framing errors=1, overrun errors=0, parity errors=0
-- 4th try
-- 5th try
Framing errors=149, overrun errors=0, parity errors=0
Data corruption occurred
-- 6th try
Framing errors=1, overrun errors=0, parity errors=0
-- 7th try
-- 8th try
Framing errors=149, overrun errors=0, parity errors=0
Data corruption occurred
-- 9th try
Framing errors=1, overrun errors=0, parity errors=0
-- 10th try
-- 11th try
Framing errors=148, overrun errors=0, parity errors=0
Data corruption occurred
-- 12th try

@pelwell
Copy link
Contributor

pelwell commented Mar 10, 2022

ttyAMA0 has a dedicated clock, so doesn't suffer from variable baud rates.

What is the sending device? Do you have a ground connection between the two?

@ktgoto
Copy link

ktgoto commented Mar 10, 2022

Thanks for your comment pelwell. Sending device is also a RPi 3B+ board, and a ground is connected between the two boards.

I just tested same programs on a CM3 I/O board where ttyAMA0 (pin14, pin15) and ttyS0 (pin32, pin33) are wired together, then data corruption ocurred when sending data from ttyS0 to ttyAMA0 but did not occur if the direction was reversed.

@pelwell
Copy link
Contributor

pelwell commented Mar 10, 2022

The clock of UART1 (which appears as ttyS0) is dependent on the VPU core clock. The firmware should prevent the core clock from changing if UART1 is enabled in the Device Tree (.dtb) file, but if for some reason that isn't working you can lock it to a particular frequency by setting core_freq and core_freq_min to the same value, e.g.:

core_freq=250
core_freq_min=250

Add those lines to the config.txt file on the devices where ttyS0 is being used.

@ktgoto
Copy link

ktgoto commented Mar 11, 2022

Thanks for your suggestion about locking the clock of ttyS0, but that did not solve the problem. It occurs even if not involving ttyS0, i.e. sending data from an external RPi through ttyAMA0 or ttyUSB0 (a USB serial adapter), of course a ground is connected.

It would be appreciated if you could try the test programs and see what's happening.

@jclark
Copy link

jclark commented Sep 7, 2023

I'm seeing a problem similar to what @ktgoto reported in #4453 (comment) This is with kernel 6.1.21-v8+ on a Raspberry Pi CM4.

My scenario is that I have the UART output of a GPS board connected to /dev/ttyAMA0. The GPS is producing output continuously while the CM4 is powered on. When I open the device with

(stty 9600 -echo -icrnl; cat) </dev/ttyAMA0

the output starts with some corrupted data. The amount of corrupted data varies from time to time: it can be anything from 0 to hundreds of bytes. But it always eventually recovers and starts delivering correct data. If I do

sudo cat /proc/tty/driver/ttyAMA

before and after, I can see the number of framing errors has increased when there is corruption.

The problem occurs when I have dtoverlay=disable-bt in my config.txt. If I remove this line, so I'm using /dev/ttyS0 instead of /dev/ttyAMA0 the problem does not occur.

The board I'm seeing it on right now is the TimeBeat LEA-M8F module: https://store.timebeat.app/products/gnss-raspberry-pi-cm4-module?variant=42280855699627 (from @lasselj)

I have connected lots of other modules without seeing this problem. But the problem is not specific to this board. I've seen it with a cheap $10 GPS also at a speed of 9600.

If I change the speed to 38400, the problem happens but significantly less frequently.

Here's an example of corrupted output (with od -c):

0000000 261 301 301 325 331 305 341 271 301 301 261 005 261 005 251 335
0000020 031   5   ) 221 035   9   i 005 261 301 301 325 331 305 341 271
0000040 301 301 261 301 335 261 301 345 261 311 301 222 232   b 202 202
0000060   b 202 202   R 272   2   5   ) 377   $   G   N   R   M   C   ,

The $GNRMC shows the point at which it recovers.

@pelwell
Copy link
Contributor

pelwell commented Sep 7, 2023

(stty 9600 -echo -icrnl; cat) </dev/ttyAMA0 is bit tricksy for my liking. The shell is going to open /dev/ttyAMA0, then stty modifies its baudrate, then cat uses the configured UART. Try performing the configuration before opening the data stream:

$ sudo stty -F /dev/ttyAMA0 9600 -echo -icrnl
$ cat /dev/ttyAMA0

@jclark
Copy link

jclark commented Sep 7, 2023

@pelwell I tried doing as you suggest, and I get the same problem.

When doing one stty command, then multiple cat commands, each of the cat commands can produce corrupted output at the start.

@pelwell
Copy link
Contributor

pelwell commented Sep 7, 2023

I wonder if something else is holding the UART open. Try this:

$ sudo apt install lsof
$ sudo stty -F /dev/ttyAMA0 9600 -echo -icrnl
$ lsof /dev/ttyAMA0

@jclark
Copy link

jclark commented Sep 7, 2023

lsof /dev/ttyAMA0 (with and without sudo) show nothing.

@pelwell
Copy link
Contributor

pelwell commented Sep 7, 2023

It's strange that ttyS0 is better than ttyAMA0 - in my experience the break detection (and therefore synchronisation) is better on ttyAMA0. It's also strange that you are the first person with this problem. A number of Timebeat devs have been active here - @lasselj, @chronosfin - and I'm sure they would have reported this issue if they'd come across it.

Have you looked at the signal on a scope or logic analyser? It should be echoed to pins 8 & 10 on the 40-pin header.

I was going to suggest trying to starting the cat (or other form of read) before connecting the GPS unit, but that's not going to be possible in this form factor. What you could do though is use raspi-gpio to make the pins GPIO inputs, which will disconnect the internal UART, then start the read, then change the pin muxing back:

$ raspi-gpio set 14-15 ip
$ cat /dev/ttyAMA0
# Does anything appear?
# Ctrl-C
# Start reader, e.g. $ cat /dev/ttyAMA0 &
$ raspi-gpio set 14-15 a0

Is there any corruption when invoked this way? Feel free to change the way you schedule the various tasks - multiple parallel shells, background jobs etc. - as long as the order is the same.

One difference running at 38400 baud is going to be the larger gaps between the characters (or groups of characters). The larger the gap the easier it is for UART to detect a break and resynchronise, and I'm wondering how much downtime there is in the 9600 baud case - it may only be between lines (NMEA sentences). I don't think it's coincidence that the first clear character is the $ at the start of a sentence - notice that the preceding byte is 0377 (= 0xff, 0b11111111) as UARTs hold the line high in any gap between characters.

I suspect you are expected to drain everything up to the first $ (or ! for encapsulation sentences). This is required anyway because even without the lack of byte synchronisation you are likely to begin receiving in the middle of a sentence. The fact that the characters before that first $ are garbage should be of no significance, but you can confirm that the length of the corruption is always less than the length of valid sentences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants