conn.go panics if TLS connection drops #644

banks · 2017-08-07T13:01:09Z

I have started hitting a panic in production where a service is monitoring many thousand database instances, hundreds of which are PostgreSQL and using this driver to connect.

The panic comes from: https://github.com/lib/pq/blob/master/conn.go#L1030

which is for convenience:

	b := cn.scratch[:1]
	_, err := io.ReadFull(cn.c, b)
	if err != nil {
		panic(err)
	}

I don't understand how panicing on a IO error is ever an acceptable way to handle errors in a DB driver? Connection errors must surely be an expected occurrence and one that the driver should handle gracefully if not transparently?

I didn't see any other issues closed or open that seemed to document this, is panicing a design choice that consumers are expected to rescue from for normal operation of their DB?

In my case the logs for my service look like:

panic: EOF

goroutine 13745112 [running]:
[my service path]/vendor/github.com/lib/pq.(*conn).ssl(0xc420868b00, 0xc4214cbe90)
	/go/src/[my service path]/vendor/github.com/lib/pq/conn.go:1030 +0x2a5
[my service path]/vendor/github.com/lib/pq.(*conn).cancel(0xc428d61600, 0x0, 0x0)
	/go/src/[my service path]/vendor/github.com/lib/pq/conn_go18.go:111 +0x164
[my service path]/vendor/github.com/lib/pq.(*conn).watchCancel.func1(0xc42b25f860, 0xc428d61600, 0xc42b682540)
	/go/src/[my service path]/vendor/github.com/lib/pq/conn_go18.go:85 +0xa7
created by [my service path]/vendor/github.com/lib/pq.(*conn).watchCancel
	/go/src/[my service path]/vendor/github.com/lib/pq/conn_go18.go:89 +0x9c

The git sha of the vendored package for the record is dd1fe20 which is master at time of posting.

The text was updated successfully, but these errors were encountered:

banks · 2017-08-07T14:05:34Z

A panic cannot be recovered by a different goroutine.

After closer inspection, I think it's impossible to avoid this issue as a consumer of this library since it occurs in the internal watchCancel go routine.

I will see if I can find a simple way to reproduce. I only see it after hours of running in production against hundreds of TLS-enabled Postgres instances and I don't have very good tracing to identify what specific error condition causes the EOF.

Just from reading the code though, I suspect it might require a race between TCP connection dropping and the Context being cancelled. That makes is sound similar to #614 but I believe the error and root-cause are entirely different despite same effect.

It also interrelates somewhat with my other bug report #620 where using Context cancelation violates timeouts on errors due to this same cancellation procedure which is causing the crash here.

It may be relevant to the reproduction to note that I'm using https://github.com/Kount/pq-timeouts to wrap pq in a customer Dialler/Conn that support read/write timeouts as a work around to #620.

I'll update if I manage to find more info on how to reproduce, but it seems to me like an inherent bug for networking code to ever panic on IO errors in a normal read path. If there is a good rationale I'm missing please let me know though!

hexdigest · 2017-10-23T20:13:04Z

Hi, @banks

Did you find any solution to this problem?

banks · 2017-10-23T21:39:53Z

@hexdigest yes. Sadly it was to switch to https://github.com/jackc/pgx

ainar-g · 2018-03-27T08:23:02Z

@banks @hexdigest #734 might have fixed the issue.

banks · 2018-03-27T08:48:08Z

Looks like it would fix this issue so Fee free to close when that lands. I’ve moved on from the codebase and company at this point so I can’t confirm that use-case now works flawlesly although the issue here was panics being bad and the PR seems to fix that.

ainar-g · 2018-03-27T08:49:55Z

@mjibson Should we do something else, or can this issue be closed?

maddyblue closed this as completed Mar 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

conn.go panics if TLS connection drops #644

conn.go panics if TLS connection drops #644

banks commented Aug 7, 2017

banks commented Aug 7, 2017

Uh oh!

hexdigest commented Oct 23, 2017

Uh oh!

banks commented Oct 23, 2017

Uh oh!

ainar-g commented Mar 27, 2018

Uh oh!

banks commented Mar 27, 2018

Uh oh!

ainar-g commented Mar 27, 2018

Uh oh!

conn.go panics if TLS connection drops #644

conn.go panics if TLS connection drops #644

Comments

banks commented Aug 7, 2017

banks commented Aug 7, 2017

Uh oh!

hexdigest commented Oct 23, 2017

Uh oh!

banks commented Oct 23, 2017

Uh oh!

ainar-g commented Mar 27, 2018

Uh oh!

banks commented Mar 27, 2018

Uh oh!

ainar-g commented Mar 27, 2018

Uh oh!