You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a subtle bug in the settings empty_pool_response_code_503 and empty_pool_timeout.
If both are enabled, gorouter will return a 503 service temporarily unavailable for empty_pool_timeout duration and will reap the empty pool with the next pruning cycle thereafter.
The Bug
The logic in registry/container/trie.go.MatchUri doesn't take empty pools into account when trying to find a match for a given URI. This leads to situations where a route with a specific path such as a.b.com/foo has been unmapped by the user with the expectation that gorouter would fall back to the less specific route a.b.com but instead receives a 503.
This is because the MatchUri function will treat empty pools as valid during the path traversal and returns them during registry.Lookup
The problem didn't exist before the two flags, because empty pools could not exist, they were reaped immediately, so the algorithm could not run into empty pools during traversal.
A customer uses a B/G scenario where they have route service for rate limiting. They bind the route service to blue.cf-app.com
under a specific path e.g. blue.cf-app.com/foo. When they do a switch from blue to green, they first unmap the route service from blue and remap it to green again. They have noticed that before routing-release 0.232.0 any requests to blue.cf-app.com/foo would still be handled, even though the route to /foo was no longer mapped. The request would fall back to blue.cf-app.com.
However, after we rolled out 0.232.0 they complained about a time window where customers would receive a 503 instead of the fallback kicking in.
Steps to Reproduce
cf map-route blue.cf-app.com --hostname blue --path foo
cf unmap-route blue.cf-app.com --hostname blue --path foo
curl https://blue.cf-app.com/foo
Expected result
200 OK (from blue.cf-app.com fallback)
Current result
503 Service Unavailable: Requested route ('blue.cf-app.com') has no available endpoints.
Possible Fix
MatchUri during path traversal should prefer pools with endpoints over empty pools if such pools exist. (avoiding 503)
If only empty pools exist, an empty pool may be returned (producing 503).
If no pools exist, nil may be returned (producing 404).
Thank you for submitting this issue and accompanying PR. Unfortunately, I am not able to reproduce this issue on a cf-deployment v21.5.0 env with routing-release 0.235.0 as the app continues to return HTTP Status Code 200. Is there a more specific version of cf-deployment/routing release that I should try?
@jrussett have you tried turning on empty_pool_response_code_503 and setting empty_pool_timeout to a non-zero value, e.g. 30s?
Then it should show this behavior. The empty_pool_timeout is set at 0s by default so it won't show it unless changed.
Issue
There is a subtle bug in the settings
empty_pool_response_code_503
andempty_pool_timeout
.If both are enabled, gorouter will return a
503 service temporarily unavailable
forempty_pool_timeout
duration and will reap the empty pool with the next pruning cycle thereafter.The Bug
The logic in
registry/container/trie.go.MatchUri
doesn't take empty pools into account when trying to find a match for a given URI. This leads to situations where a route with a specific path such asa.b.com/foo
has been unmapped by the user with the expectation that gorouter would fall back to the less specific routea.b.com
but instead receives a 503.This is because the
MatchUri
function will treat empty pools as valid during the path traversal and returns them duringregistry.Lookup
The problem didn't exist before the two flags, because empty pools could not exist, they were reaped immediately, so the algorithm could not run into empty pools during traversal.
Affected Versions
https://github.com/cloudfoundry/routing-release/releases/tag/0.232.0 and up
https://github.com/cloudfoundry/cf-deployment/releases/tag/v20.2.0 and up
Context
A customer uses a B/G scenario where they have route service for rate limiting. They bind the route service to
blue.cf-app.com
under a specific path e.g.
blue.cf-app.com/foo
. When they do a switch from blue to green, they first unmap the route service from blue and remap it to green again. They have noticed that before routing-release 0.232.0 any requests toblue.cf-app.com/foo
would still be handled, even though the route to/foo
was no longer mapped. The request would fall back toblue.cf-app.com
.However, after we rolled out 0.232.0 they complained about a time window where customers would receive a 503 instead of the fallback kicking in.
Steps to Reproduce
Expected result
Current result
Possible Fix
MatchUri
during path traversal should prefer pools with endpoints over empty pools if such pools exist. (avoiding 503)If only empty pools exist, an empty pool may be returned (producing 503).
If no pools exist, nil may be returned (producing 404).
I've provided a fix PR
The text was updated successfully, but these errors were encountered: