Skip to content

Route a.b.com/foo should fall back to a.b.com instead of returning 503 #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
domdom82 opened this issue Jul 11, 2022 · 2 comments · Fixed by cloudfoundry/gorouter#321
Closed
Assignees

Comments

@domdom82
Copy link
Contributor

domdom82 commented Jul 11, 2022

Issue

There is a subtle bug in the settings empty_pool_response_code_503 and empty_pool_timeout.
If both are enabled, gorouter will return a 503 service temporarily unavailable for empty_pool_timeout duration and will reap the empty pool with the next pruning cycle thereafter.

The Bug
The logic in registry/container/trie.go.MatchUri doesn't take empty pools into account when trying to find a match for a given URI. This leads to situations where a route with a specific path such as a.b.com/foo has been unmapped by the user with the expectation that gorouter would fall back to the less specific route a.b.com but instead receives a 503.

This is because the MatchUri function will treat empty pools as valid during the path traversal and returns them during registry.Lookup

The problem didn't exist before the two flags, because empty pools could not exist, they were reaped immediately, so the algorithm could not run into empty pools during traversal.

Affected Versions

https://github.com/cloudfoundry/routing-release/releases/tag/0.232.0 and up
https://github.com/cloudfoundry/cf-deployment/releases/tag/v20.2.0 and up

Context

A customer uses a B/G scenario where they have route service for rate limiting. They bind the route service to blue.cf-app.com
under a specific path e.g. blue.cf-app.com/foo. When they do a switch from blue to green, they first unmap the route service from blue and remap it to green again. They have noticed that before routing-release 0.232.0 any requests to blue.cf-app.com/foo would still be handled, even though the route to /foo was no longer mapped. The request would fall back to blue.cf-app.com.

However, after we rolled out 0.232.0 they complained about a time window where customers would receive a 503 instead of the fallback kicking in.

Steps to Reproduce

cf map-route blue.cf-app.com --hostname blue --path foo

cf unmap-route blue.cf-app.com --hostname blue --path foo

curl https://blue.cf-app.com/foo

Expected result

200 OK (from blue.cf-app.com fallback)

Current result

503 Service Unavailable: Requested route ('blue.cf-app.com') has no available endpoints.

Possible Fix

MatchUri during path traversal should prefer pools with endpoints over empty pools if such pools exist. (avoiding 503)
If only empty pools exist, an empty pool may be returned (producing 503).
If no pools exist, nil may be returned (producing 404).

I've provided a fix PR

@jrussett
Copy link
Contributor

Hi @domdom82

Thank you for submitting this issue and accompanying PR. Unfortunately, I am not able to reproduce this issue on a cf-deployment v21.5.0 env with routing-release 0.235.0 as the app continues to return HTTP Status Code 200. Is there a more specific version of cf-deployment/routing release that I should try?

Thanks
@jrussett

@domdom82
Copy link
Contributor Author

domdom82 commented Sep 8, 2022

@jrussett have you tried turning on empty_pool_response_code_503 and setting empty_pool_timeout to a non-zero value, e.g. 30s?
Then it should show this behavior. The empty_pool_timeout is set at 0s by default so it won't show it unless changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants