-
Notifications
You must be signed in to change notification settings - Fork 782
Description
Proposal
Use case
Zalando's postgres-operator supports configuring sidecar containers, e.g. postgres_exporter
, to enable observability. After the instantiation of a Postgresql
resource, the operator injects the sidecar into the Pod spec, both containers start simultaneously.
When looking into the metrics endpoint provided by the exporter (http://localhost:9187/metrics
), we've seen, that (in some cases) metrics are missing: In our setup the pg_stat_database_xact_commit
was affected (along with many other pg_stat_database_*
metrics).
After playing around and trying to narrow down the root case we've seen, that a simple restart of the exporter (by issuing)
/ $ kill 1
in the sidecar container's shell fixed the problem. All metrics are now visible.
This leads to the assumption, that the exporter has started sometime in between the postgres instance was in the start up process and could not prepare/return all metrics.
We'd appreciate a feature by which it is possible to enforce the start of the postgres_exporter
only, until the postgres instance is ready. Most likely this could be provided by setting an environment variable DATA_SOURCE_WAIT_UNTIL_READY
and, additionally, configuring a timeout flag, if the database won't come up in time: DATA_SOURCE_WAIT_UNTIL_READY_TIMEOUT
.
Thanks for comments/help/criticism :)
Cheers!
Activity
bsv798 commentedon Aug 11, 2023
I'm experiencing exactly the same problem now.
Had to modify exporter docker image and include script which waits for postgres is ready to accept connections.
kyleli666 commentedon Aug 16, 2023
same issue here. Had to manually kill 1
YannickDevos commentedon Sep 8, 2023
Same issue here relying on an extraContainers running
CloudSQL_proxy
to access a managed DB instance.Looking at the
prometheus-postgres-exporter
container logs I can only find this single error line (no other error pop up):ts=2023-09-08T08:26:04.059Z caller=main.go:142 level=warn msg="Failed to create PostgresCollector" err="dial tcp 127.0.0.1:5432: connect: connection refused"
My assumption is that the
cloudsql-proxy
container is not ready to accept connection.Manually kill 1 solve the issue
sysadmind commentedon Sep 8, 2023
This should be resolved by #882. We have changed the logic to connect to the database during metrics collection. This means that each scrape is a new attempt to connect to the database. Of note, #902 fixes a connection leak that #882 introduced. If this doesn't solve your use case, feel free to re-open with a description of the use case so we can discuss.
Hope this helps!