-
Notifications
You must be signed in to change notification settings - Fork 923
conn.go panics if TLS connection drops #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After closer inspection, I think it's impossible to avoid this issue as a consumer of this library since it occurs in the internal I will see if I can find a simple way to reproduce. I only see it after hours of running in production against hundreds of TLS-enabled Postgres instances and I don't have very good tracing to identify what specific error condition causes the EOF. Just from reading the code though, I suspect it might require a race between TCP connection dropping and the Context being cancelled. That makes is sound similar to #614 but I believe the error and root-cause are entirely different despite same effect. It also interrelates somewhat with my other bug report #620 where using Context cancelation violates timeouts on errors due to this same cancellation procedure which is causing the crash here. It may be relevant to the reproduction to note that I'm using https://github.com/Kount/pq-timeouts to wrap I'll update if I manage to find more info on how to reproduce, but it seems to me like an inherent bug for networking code to ever panic on IO errors in a normal read path. If there is a good rationale I'm missing please let me know though! |
Hi, @banks Did you find any solution to this problem? |
@hexdigest yes. Sadly it was to switch to https://github.com/jackc/pgx |
@banks @hexdigest #734 might have fixed the issue. |
Looks like it would fix this issue so Fee free to close when that lands. I’ve moved on from the codebase and company at this point so I can’t confirm that use-case now works flawlesly although the issue here was panics being bad and the PR seems to fix that. |
@mjibson Should we do something else, or can this issue be closed? |
I have started hitting a panic in production where a service is monitoring many thousand database instances, hundreds of which are PostgreSQL and using this driver to connect.
The panic comes from: https://github.com/lib/pq/blob/master/conn.go#L1030
which is for convenience:
I don't understand how panicing on a IO error is ever an acceptable way to handle errors in a DB driver? Connection errors must surely be an expected occurrence and one that the driver should handle gracefully if not transparently?
I didn't see any other issues closed or open that seemed to document this, is panicing a design choice that consumers are expected to rescue from for normal operation of their DB?
In my case the logs for my service look like:
The git sha of the vendored package for the record is dd1fe20 which is master at time of posting.
The text was updated successfully, but these errors were encountered: