Skip to content

make error message more reasonable #2591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

make error message more reasonable #2591

wants to merge 3 commits into from

Conversation

ethanvc
Copy link

@ethanvc ethanvc commented May 12, 2023

Look at the following code snippet:

func main() {
	rdb := redis.NewClient(&redis.Options{
		Addr:     "3.3.3.3:10000", // make connection io timeout
		PoolSize: 1, // make connection pool timeout
	})
	for i := 0; i < 3; i++ {
		go get(rdb, "0")
	}
	time.Sleep(10000 * time.Hour)
}

func get(rdb *redis.Client, key string) {
	c, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
	defer cancel()
	v := rdb.Get(c, key)
	fmt.Println(v)
}

output for the code snippet with this pr:

get 0: redis: connection pool timeout
get 0: redis: connection pool timeout
get 0: dial tcp 3.3.3.3:10000: i/o timeout

When QPS become very high, rdb.Get may return error with the old logic:
context deadline exceeded
which is very confusing and hard to lookup the real problem.

In this pr, i referenced the implementation of context.WithTimeout, to use the smaller duration of configed timeout and timeout in current context. by doing this, the error will be redis: connection pool timeout, so users will know why timeout occurred.

@ethanvc
Copy link
Author

ethanvc commented May 12, 2023

@vmihailenco please help to cr this code, this pr will be real helpful to let developer know what the timeout reason if they missed config the PoolSize parameter.

@monkey92t
Copy link
Collaborator

I'm sorry, I didn't understand specifically what issue you are describing....

@ethanvc
Copy link
Author

ethanvc commented May 15, 2023

I'm sorry, I didn't understand specifically what issue you are describing....

shortly speaking, when you set timeout smaller then the PoolTimeout, redis may report context deadline exceeded error message. This message is not that useful.

But if redis library report redis: connection pool timeout error message, user will know maybe it's the time to increase the PoolSize.

@monkey92t

@monkey92t
Copy link
Collaborator

monkey92t commented May 15, 2023

But if redis library report redis: connection pool timeout error message, user will know maybe it's the time to increase the PoolSize.

I don't believe modifying the error message would be beneficial. in fact, it could potentially mislead users.

@ethanvc
Copy link
Author

ethanvc commented May 16, 2023

@monkey92t maybe you can reference the implementation of context.WitDeadline, the related doc is:

WithDeadline returns a copy of the parent context with the deadline adjusted to be no later than d. If the parent's deadline is already earlier than d, WithDeadline(parent, d) is semantically equivalent to parent. The returned context's Done channel is closed when the deadline expires, when the returned cancel function is called, or when the parent context's Done channel is closed, whichever happens first.

In our case, return context deadline exceeded does not give any information about the reason why redis cost time unexpected.

And for my case, return connection pool timeout is very helpful to indicate the need to increase pool size.

Or maybe we can study the sql package's method to expose the time wait problem to users:

type DBStats struct {
	MaxOpenConnections [int](https://pkg.go.dev/builtin#int) // Maximum number of open connections to the database.

	// Pool Status
	OpenConnections [int](https://pkg.go.dev/builtin#int) // The number of established connections both in use and idle.
	InUse           [int](https://pkg.go.dev/builtin#int) // The number of connections currently in use.
	Idle            [int](https://pkg.go.dev/builtin#int) // The number of idle connections.

	// Counters
	WaitCount         [int64](https://pkg.go.dev/builtin#int64)         // The total number of connections waited for.
	WaitDuration      [time](https://pkg.go.dev/time).[Duration](https://pkg.go.dev/time#Duration) // The total time blocked waiting for a new connection.
	MaxIdleClosed     [int64](https://pkg.go.dev/builtin#int64)         // The total number of connections closed due to SetMaxIdleConns.
	MaxIdleTimeClosed [int64](https://pkg.go.dev/builtin#int64)         // The total number of connections closed due to SetConnMaxIdleTime.
	MaxLifetimeClosed [int64](https://pkg.go.dev/builtin#int64)         // The total number of connections closed due to SetConnMaxLifetime.
}

The WaitDuration is the total wait time for connections, user can report the metrics to prometheus to detect connection count problems.

@monkey92t
Copy link
Collaborator

We have recently encountered other issues related to timeouts, and it is possible that we may change our logic for timeouts and command reading/writing.

#2586

@ethanvc ethanvc closed this May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants