Skip to content

fix: grpc connections #2386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ require (
github.com/stretchr/testify v1.7.0
golang.org/x/net v0.0.0-20210520170846-37e1c6afe023
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac
google.golang.org/grpc v1.38.0
google.golang.org/grpc v1.41.0
gopkg.in/yaml.v2 v2.4.0
helm.sh/helm/v3 v3.6.1
k8s.io/api v0.22.0
Expand Down
5 changes: 4 additions & 1 deletion go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ github.com/clbanning/x2j v0.0.0-20191024224557-825249438eec/go.mod h1:jMjuTZXRI4
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
github.com/cncf/udpa/go v0.0.0-20201120205902-5459f2c99403/go.mod h1:WmhPx2Nbnhtbo57+VJT5O0JRkEi1Wbu0z5j0R8u5Hbk=
github.com/cncf/xds/go v0.0.0-20210805033703-aa0b78936158/go.mod h1:eXthEFrGJvWHgFFCl3hGmgk+/aYT6PnTQLykKQRLhEs=
github.com/cockroachdb/apd v1.1.0/go.mod h1:8Sl8LxpKi29FqWXR16WEFZRNSz3SoPzUzeMeY4+DwBQ=
github.com/cockroachdb/cockroach-go v0.0.0-20181001143604-e0a95dfd547c/go.mod h1:XGLbWH/ujMcbPbhZq52Nv6UrCghb1yGn//133kEsvDk=
github.com/cockroachdb/datadriven v0.0.0-20190809214429-80d97fb3cbaa/go.mod h1:zn76sxSg3SzpJ0PPJaLDCu+Bu0Lg3sKTORVIj19EIF8=
Expand Down Expand Up @@ -309,6 +310,7 @@ github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.m
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
github.com/envoyproxy/go-control-plane v0.9.9-0.20201210154907-fd9021fe5dad/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk=
github.com/envoyproxy/go-control-plane v0.9.9-0.20210217033140-668b12f5399d/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk=
github.com/envoyproxy/go-control-plane v0.9.10-0.20210907150352-cf90f659a021/go.mod h1:AFq3mo9L8Lqqiid3OhADV3RfLJnjiw63cSpi+fDTRC0=
github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
github.com/evanphx/json-patch v0.5.2/go.mod h1:ZWS5hhDbVDyob71nXKNL0+PWn6ToqBHMikGIFbs31qQ=
github.com/evanphx/json-patch v4.2.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
Expand Down Expand Up @@ -1533,8 +1535,9 @@ google.golang.org/grpc v1.30.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM
google.golang.org/grpc v1.33.1/go.mod h1:fr5YgcSWrqhRRxogOsw7RzIpsmvOZ6IcH4kBYTpR3n0=
google.golang.org/grpc v1.36.0/go.mod h1:qjiiYl8FncCW8feJPdyg3v6XW24KsRHe+dy9BAGRRjU=
google.golang.org/grpc v1.37.0/go.mod h1:NREThFqKR1f3iQ6oBuvc5LadQuXVGo9rkm5ZGrQdJfM=
google.golang.org/grpc v1.38.0 h1:/9BgsAsa5nWe26HqOlvlgJnqBuktYOLCgjCPqsa56W0=
google.golang.org/grpc v1.38.0/go.mod h1:NREThFqKR1f3iQ6oBuvc5LadQuXVGo9rkm5ZGrQdJfM=
google.golang.org/grpc v1.41.0 h1:f+PlOh7QV4iIJkPrx5NQ7qaNGFQ3OTse67yaDHfju4E=
google.golang.org/grpc v1.41.0/go.mod h1:U3l9uK9J0sini8mHphKoXyaqDA/8VyGnDee1zzIUK6k=
google.golang.org/grpc/cmd/protoc-gen-go-grpc v0.0.0-20200709232328-d8193ee9cc3e/go.mod h1:6Kw0yEErY5E/yWrBtf03jp27GLLJujG4z/JK95pnjjw=
google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8=
google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0=
Expand Down
6 changes: 5 additions & 1 deletion pkg/controller/operators/catalog/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -698,9 +698,13 @@ func (o *Operator) syncRegistryServer(logger *logrus.Entry, in *v1alpha1.Catalog

logger.Debugf("check registry server healthy: %t", healthy)

// if the pod isn't healthy, don't check the connection
// checking the connection before the dns is ready may lead dns to cache the miss
// (pod readiness is used as a hint that dns should be ready to avoid coupling this to dns)
continueSync = healthy
Comment on lines +701 to +704
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me - I feel like this is what I've been seeing locally and delaying this check until the pod is reporting a health state seems like like the most logical fix 👍


if healthy && in.Status.RegistryServiceStatus != nil {
logger.Debug("registry state good")
continueSync = true
// return here if catalog does not have polling enabled
if !out.Poll() {
return
Expand Down
14 changes: 13 additions & 1 deletion pkg/controller/registry/grpc/source.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ import (
"time"

"github.com/operator-framework/operator-registry/pkg/client"

"github.com/sirupsen/logrus"
"golang.org/x/net/http/httpproxy"
"golang.org/x/net/proxy"
"google.golang.org/grpc"
"google.golang.org/grpc/backoff"
"google.golang.org/grpc/connectivity"

"github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry"
Expand Down Expand Up @@ -153,6 +153,18 @@ func grpcConnection(address string) (*grpc.ClientConn, error) {
return nil, err
}

// add aggressive connection backoff for improved gRPC connection times
dialOptions = append(dialOptions, grpc.WithConnectParams(grpc.ConnectParams{
Backoff: backoff.Config{
BaseDelay: 500 * time.Millisecond,
Multiplier: 1.2,
Jitter: 0.2,
MaxDelay: 5 * time.Second,
},
MinConnectTimeout: 1 * time.Second,
},
))

if proxyURL != nil {
dialOptions = append(dialOptions, grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) {
dialer, err := proxy.FromURL(proxyURL, &net.Dialer{})
Expand Down
8 changes: 6 additions & 2 deletions pkg/controller/registry/reconciler/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ func (s *configMapCatalogSourceDecorator) Service() *v1.Service {
Namespace: s.GetNamespace(),
},
Spec: v1.ServiceSpec{
ClusterIP: "None",
Ports: []v1.ServicePort{
{
Name: "grpc",
Expand Down Expand Up @@ -419,15 +420,18 @@ func (c *ConfigMapRegistryReconciler) CheckRegistryServer(catalogSource *v1alpha
// Check on registry resources
// TODO: more complex checks for resources
// TODO: add gRPC health check
pods := c.currentPods(source, c.Image)

if c.currentServiceAccount(source) == nil ||
c.currentRole(source) == nil ||
c.currentRoleBinding(source) == nil ||
c.currentService(source) == nil ||
len(c.currentPods(source, c.Image)) < 1 {
len(pods) < 1 ||
len(pods[0].Status.ContainerStatuses) < 1 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this panic if there are not any pods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, in that scenario len(pods) < 1should evaluate totrueand the expression should short-circuit beforepods[0]` is evaluated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it short circuit though if the expression is OR-ed rather than AND-ed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is prone to a panic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this again, I believe this may not actually be a problem as Nick mentioned. Thoughts on this?

https://play.golang.org/p/yPXWTIrT2Y6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood the check when I had initially reviewed - if we're essentially just checking for whether the registry-server container is reporting a "healthy" state, then I don't think this kind of check is problematic (albeit difficult to read) due to the short circuiting mentioned earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else I just noticed - if we're firing off Services with spec.ClusterIP: None, then presumably we cannot also check whether the status.ClusterIP has been populated to determine whether the service DNS has been established yet. Do we need to also query for the Endpoint object and ensure that value has been populated as another condition check for healthiness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable -- based on the fact the endpoint has the same name as the service it should be straightforward to query and check that the endpoints.subsets field is non-nil.

!pods[0].Status.ContainerStatuses[0].Ready {
healthy = false
return
}

healthy = true
return
}
6 changes: 5 additions & 1 deletion pkg/controller/registry/reconciler/grpc.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ func (s *grpcCatalogSourceDecorator) Service() *corev1.Service {
Namespace: s.GetNamespace(),
},
Spec: corev1.ServiceSpec{
ClusterIP: "None",
Ports: []corev1.ServicePort{
{
Name: "grpc",
Expand Down Expand Up @@ -425,7 +426,10 @@ func (c *GrpcRegistryReconciler) CheckRegistryServer(catalogSource *v1alpha1.Cat
source := grpcCatalogSourceDecorator{catalogSource}
// Check on registry resources
// TODO: add gRPC health check
if len(c.currentPodsWithCorrectImageAndSpec(source, source.ServiceAccount().GetName())) < 1 ||
pods := c.currentPodsWithCorrectImageAndSpec(source, source.ServiceAccount().GetName())
if len(pods) < 1 ||
len(pods[0].Status.ContainerStatuses) < 1 ||
!pods[0].Status.ContainerStatuses[0].Ready ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would update the service check here as well. In reality configmap based catalogs are deprecated (and only used in our e2e tests anymore from what I can tell) so maybe we should add more to this health check.

c.currentService(source) == nil {
healthy = false
return
Expand Down
26 changes: 24 additions & 2 deletions pkg/controller/registry/reconciler/grpc_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ func grpcCatalogSourceWithSecret(secretNames []string) *v1alpha1.CatalogSource {
}
}

func injectPodReadiness(objs []runtime.Object) []runtime.Object {
pod, ok := objs[0].(*corev1.Pod)
if !ok {
return nil
}
pod.Status.ContainerStatuses = append(pod.Status.ContainerStatuses, corev1.ContainerStatus{Name: "ready", Ready: true})
objs[0] = pod
return objs
}

func grpcCatalogSourceWithAnnotations(annotations map[string]string) *v1alpha1.CatalogSource {
catsrc := validGrpcCatalogSource("image", "")
catsrc.ObjectMeta.Annotations = annotations
Expand Down Expand Up @@ -428,14 +438,26 @@ func TestGrpcRegistryChecker(t *testing.T) {
testName: "Grpc/ExistingRegistry/Image/Healthy",
in: in{
cluster: cluster{
k8sObjs: objectsForCatalogSource(validGrpcCatalogSource("test-img", "")),
k8sObjs: injectPodReadiness(objectsForCatalogSource(validGrpcCatalogSource("test-img", ""))),
},
catsrc: validGrpcCatalogSource("test-img", ""),
},
out: out{
healthy: true,
},
},
{
testName: "Grpc/ExistingRegistry/Image/NotHealthy/PodNotReportingReady",
in: in{
cluster: cluster{
k8sObjs: objectsForCatalogSource(validGrpcCatalogSource("test-img", "")),
},
catsrc: validGrpcCatalogSource("test-img", ""),
},
out: out{
healthy: false,
},
},
{
testName: "Grpc/NoExistingRegistry/Image/NotHealthy",
in: in{
Expand Down Expand Up @@ -494,7 +516,7 @@ func TestGrpcRegistryChecker(t *testing.T) {
testName: "Grpc/ExistingRegistry/AddressAndImage/Healthy",
in: in{
cluster: cluster{
k8sObjs: objectsForCatalogSource(validGrpcCatalogSource("img-catalog", "catalog.svc.cluster.local:50001")),
k8sObjs: injectPodReadiness(objectsForCatalogSource(validGrpcCatalogSource("img-catalog", "catalog.svc.cluster.local:50001"))),
},
catsrc: validGrpcCatalogSource("img-catalog", "catalog.svc.cluster.local:50001"),
},
Expand Down
5 changes: 4 additions & 1 deletion pkg/package-server/provider/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,10 @@ func (p *RegistryProvider) syncCatalogSource(obj interface{}) (syncError error)
"namespace": source.GetNamespace(),
})

if source.Status.RegistryServiceStatus == nil {
// packageserver seems to compete with catalog operator for the initial connection
// wait for the catalog operator to be connected properly before connecting here
notReady := source.Status.RegistryServiceStatus == nil || source.Status.GRPCConnectionState == nil
if notReady {
logger.Debug("registry service is not ready for grpc connection")
return
}
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/subscription_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2916,7 +2916,7 @@ func updateInternalCatalog(t GinkgoTInterface, c operatorclient.ClientInterface,
before := fetchedInitialCatalog.Status.ConfigMapResource
after := catalog.Status.ConfigMapResource
if after != nil && after.LastUpdateTime.After(before.LastUpdateTime.Time) && after.ResourceVersion != before.ResourceVersion &&
catalog.Status.GRPCConnectionState.LastConnectTime.After(after.LastUpdateTime.Time) && catalog.Status.GRPCConnectionState.LastObservedState == "READY" {
catalog.Status.GRPCConnectionState.LastConnectTime.After(after.LastUpdateTime.Time) {
fmt.Println("catalog updated")
return true
}
Expand Down
2 changes: 1 addition & 1 deletion test/e2e/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ func registryPodHealthy(address string) bool {
func catalogSourceRegistryPodSynced(catalog *v1alpha1.CatalogSource) bool {
registry := catalog.Status.RegistryServiceStatus
connState := catalog.Status.GRPCConnectionState
if registry != nil && connState != nil && !connState.LastConnectTime.IsZero() && connState.LastObservedState == "READY" {
if registry != nil && connState != nil && !connState.LastConnectTime.IsZero() {
fmt.Printf("catalog %s pod with address %s\n", catalog.GetName(), registry.Address())
return registryPodHealthy(registry.Address())
}
Expand Down
42 changes: 0 additions & 42 deletions vendor/google.golang.org/grpc/.travis.yml

This file was deleted.

5 changes: 3 additions & 2 deletions vendor/google.golang.org/grpc/MAINTAINERS.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions vendor/google.golang.org/grpc/Makefile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions vendor/google.golang.org/grpc/NOTICE.txt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vendor/google.golang.org/grpc/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading