-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Expected Behavior
When restarting the Redis server, due to the default of MaxRetries: 3
, I'd expect go-redis to attempt to reconnect to the server and retry the query.
Current Behavior
If there are enough old connections in the connection pool, all retry attempts are done on broken connections and an error is returned without even attempting to reconnect.
Possible Solution
Some options:
- Perform a ping on the connection before reusing it (adds latency if the connection is fine)
- Don't count retries on connections from the pool towards the retry limit (might lead to a huge number of retries for a single query)
- Clear the pool on errors (might be overly pessimistic)
- Always use a fresh connection for retries
Steps to Reproduce
The following code reproduces the issue. Instead of restarting the server, it sets a dialer that simply closes each connection after 5 seconds.
package main
import (
"context"
"github.com/go-redis/redis/v8"
"log"
"net"
"time"
)
func main() {
ctx := context.Background()
// Redis client with a dialer that kills connections after 5 seconds (to simulate a server restart).
client := redis.NewClient(&redis.Options{
Dialer: func(ctx context.Context, network, addr string) (net.Conn, error) {
log.Print("Dialer called")
conn, err := net.Dial(network, addr)
if err == nil {
go func() {
time.Sleep(5 * time.Second)
conn.Close()
}()
}
return conn, err
},
})
// Function to ping the server and log errors if they occur.
ping := func() {
_, err := client.Ping(ctx).Result()
if err != nil {
log.Print(err)
}
}
// Perform some pings to fill the connection pool.
for i := 0; i < 10; i++ {
go ping()
}
// Wait for connections to die.
time.Sleep(10 * time.Second)
// Perform another ping and log pool stats before and after.
log.Printf("%#v", client.PoolStats())
ping()
log.Printf("%#v", client.PoolStats())
}
Example output (note that the dialer is not called for the last ping and the hit count in the pool stats increases by 4, i.e. the initial attempt and all 3 retries were done using stale connections from the pool):
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:40:50 Dialer called
2021/04/23 13:41:00 &redis.PoolStats{Hits:0x0, Misses:0xa, Timeouts:0x0, TotalConns:0xa, IdleConns:0xa, StaleConns:0x0}
2021/04/23 13:41:00 set tcp 127.0.0.1:45526: use of closed network connection
2021/04/23 13:41:00 &redis.PoolStats{Hits:0x4, Misses:0xa, Timeouts:0x0, TotalConns:0x6, IdleConns:0x6, StaleConns:0x0}
You get similar behavior when you restart Redis at the right time instead of using this dialer. In this case, the error will be EOF instead.
When adding MaxRetries: 20
to the redis.Options
, the last 3 lines of the output look like this instead (note that there were 10 hits and then 1 miss that called the dialer):
2021/04/23 13:48:46 &redis.PoolStats{Hits:0x0, Misses:0xa, Timeouts:0x0, TotalConns:0xa, IdleConns:0xa, StaleConns:0x0}
2021/04/23 13:48:49 Dialer called
2021/04/23 13:48:49 &redis.PoolStats{Hits:0xa, Misses:0xb, Timeouts:0x0, TotalConns:0x1, IdleConns:0x1, StaleConns:0x0}
Context (Environment)
I want my application to gracefully handle Redis starts and had hoped that this was handled automatically by go-redis due to the default of MaxRetries: 3
.