-
Notifications
You must be signed in to change notification settings - Fork 378
Clarify Identity Service vs Probe request #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The primary utility of the Probe calls is deployment verification . This can be a one time or periodic check to ensure volume plugin is healthy (and has all dependencies available). This information can be used, for example, to monitor the health of the plugin and redeploy the plugin (or take other automated measures) when it becomes unhealthy. On Kubernetes, for example, this could be hooked in to the Liveness/Readiness mechanism on plugin deployment. That said, since the purpose of this call is to check the health of the plugin, we don't really need a separate Additionally, if we consolidate |
Thanks @saad-ali. |
We had a big discussion about consolidating @akutz brought up that this can't be done because volume plugins may not know what mode they are running in (controller, node, both) since the spec does not dictate that plugins should know nor does it offer any mechanism for them to find out in a standard way. A naive plugin, therefore, could depend entirely on the CO to make the appropriate calls, and would not know how to handle a generic I agree that this is an issue. I don't think this mode of operation should be supported. We should modify the spec to dictate that a plugin must know (somehow) what mode it is running in (controller, node, or both). This will let us have a cleaner API, and reduce the matrix of possible deployment configurations slightly. That said we should not dictate the exact mechanism for how the plugin discovers its mode, as that is an implementation detail for plugin author: a plugin may choose to use whatever they like: an environment variable, binary flag, etc. Once we can make the assumption that an instance of a plugin knows what mode it is operating in, it makes sense to consolidate I'm also open to @chakri-nelluri's proposal to collapse CC'ing some of the people who seemed interested in the discussion: @childsb @chakri-nelluri @akutz @jieyu @jdef @julian-hj @lpabon. Apologies if I missed anyone. |
Hi @saad-ali, We should not collapse We cannot act as if CSI is not already in the wild. It is. It's not prudent to make drastic changes anymore. It may not yet be 1.0.0, but we should act like it unless there's an enormous reason to ignore that behavior. |
Hi @saad-ali, So here's the thing. The spec already defines How is this any different than, let's say,
|
Value | Behavior |
---|---|
<not_set> |
Controller+Node+Identity |
<set_and_empty> |
Controller+Node+Identity |
controller |
Controller+Identity |
node |
Node+Identity |
Hi @saad-ali, Re: Probe... I had a long chat with @codenrhoden, and he and I both agree, unless The value of
The example above demonstrates why Other possibilities:
The Additionally, |
After the CSI meeting on Wednesday, I had some thoughts on this I wanted to share. I agree with @saad-ali's comment above that:
Whether it's through However, during the call on Tuesday someone asked "do we expect that a probe is required after an SP restarts?". And the answer was no. I contend that if that is the case, Probe is useless. Right now the spec says of Probe: No matter how you look at it, the SP is required per the spec to reject all incoming RPC requests to the Node or Controller service if it has not been probed. If a Controller request comes in prior to a Probe, the CO has violated the spec, and it is appropriate for the SP to return an error. I think this applies no matter what the lifecycle of the SP is. Whether it is starting up for the first time, the 30th time, or starting anew after an upgrade, the SP cannot accept requests until it has been probed, so that it can verify dependencies, configuration, etc. I believe that is what Probe is for. Every time an SP is restarted, system state (e.g. kernel modules, binaries) or configuration could have changed, so a new Probe is necessary to revalidate that the SP is "good to go". So to me, saying that CO doesn't have to re-Probe after a restart makes no sense. I contend that all RPCs in the Node and Controller services should have an error code that indicates that a I in fact implemented this sort of gate for the |
We used to have CSI_MODE and got rid of it. The spec focuses on protocol
over everything else, intentionally. Requiring a CSI_ENDPOINT envvar was a
compromise. I strongly disagree that we should include any other envvars as
CSI standards.
…On Fri, Feb 9, 2018 at 2:02 AM, Schley Andrew Kutz ***@***.*** > wrote:
Hi @saad-ali <https://github.com/saad-ali>,
So here's the thing. The spec already defines CSI_ENDPOINT as a global
environment variable that SPs should reference. The fact is that
CSI_ENDPOINT is set/configured by *something*. Whether that's an admin or
a CO config file that deploys SPs, it's defined in the spec, set, and used
by the SP. The spec defines CSI_ENDPOINT because it's a piece of
information that all SPs need to use.
How is this *any* different than, let's say, CSI_MODE? Yes, it's an
implementation detail, but so is CSI_ENDPOINT! The CSI spec should
absolutely define CSI_MODE as a standard environment variable that all
SPs should use in order to determine the desired mode of operation. That
way there is an agreed upon mechanic, just like CSI_ENDPOINT, that
COs/admins can use to configure an SP and an SP looks for to determine its
desired mode of operation.
CSI_MODE
The environment variable CSI_MODE defines the configured service model of
the SP process:
Value Behavior
<not_set> Controller+Node+Identity
<set_and_empty> Controller+Node+Identity
controller Controller+Identity
node Node+Identity
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#171 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPVLJ87yy-2loW5XxwwMCJ-M-4nwQCAks5tS-2ggaJpZM4Rrb_w>
.
|
I know this is ugly........... But the common way to solve most of the issues discussed here is by including the API version in the function names... such as: ControllerProbeV1(...), ControllerProbeV2(...) |
Hi @childsb, You're right, it's ugly :) The issue is that Go prevents multiple gRPC bindings, and I don't think the version in the RPC name is a tenable solution either. |
Hi @jdef, Then either remove |
how does multiple RPC bindings play into this? If the endpoint is versioned to the spec, it would bind to the endpoint which it supports..
this is a deployment/packaging issue.. while it could be standardized it, deployment & packaging is listed as out-of-scope for the CSI spec.. |
Hi @childsb, The reason multiple bindings are an issue is to do with the way Go's gRPC implementation registers types (the relevant CSI type registration). The What could happen is if we include the API version in the proto's defined namespace so that all of the CSI proto messages include the version in the proto type namespace. For example, here is what it is now: syntax = "proto3";
package csi; We can change it to: syntax = "proto3";
package csi.v020;
option go_package="csi";
option java_package="csi";
option csharp_namespace="csi"; This way the CSI message types would change from proto.RegisterType((*GetSupportedVersionsRequest)(nil), "csi.v020.GetSupportedVersionsRequest")
proto.RegisterType((*GetSupportedVersionsResponse)(nil), "csi.v020.GetSupportedVersionsResponse")
proto.RegisterType((*Version)(nil), "csi.v020.Version")
proto.RegisterType((*GetPluginInfoRequest)(nil), "csi.v020.GetPluginInfoRequest")
proto.RegisterType((*GetPluginInfoResponse)(nil), "csi.v020.GetPluginInfoResponse")
proto.RegisterType((*CreateVolumeRequest)(nil), "csi.v020.CreateVolumeRequest")
proto.RegisterType((*CreateVolumeResponse)(nil), "csi.v020.CreateVolumeResponse") Anyway, I'm on board with the idea of including the API version in the package namespace so that types registered with protobuf are isolated in their own versioned namespace. |
We have to be able to make changes to the spec that are going to be breaking. |
Hi @cpuguy83, I agree. I disagree that many of these changes are necessary or even useful. It seems many of them are due to preference, not necessity. |
Hi @cpuguy83,
I'm also confused how you arrive at this remark when my previous comment was aligned with how to do just what you suggested in a safe manner by versioning the CSI protobuf message namespace. |
@akutz It's a huge complication in order to support 0.1 plugins going forward. |
Hi @cpuguy83, That's fine, except the spec also calls out multiple supported versions. I'm suggesting an approach to be flexible. You're suggesting dropping backwards support. I don't care if we do, but I don't see what one has to do with the other. I'm just trying to propose a potential means of supporting multiple API versions in the bindings. |
@akutz Aren't the protos backwards compatible by themselves (once we hit stable and stop moving things around)? |
I'm not familiar enough with gRPC to know if @akutz suggestion is standard practice, but it seems legit....I do like the idea of allowing for breaking change AND keeping early API..
@cpuguy83 wasn't your earlier point that we will never be stable and should expect to always move things around? |
Typically you'd version the RPC service itself, but versioning the types tends to be problematic. |
Hi @cpuguy83,
I don't have faith that things will stop moving around sufficiently for it to ever be backwards compatible due to the way Go and gRPC handle message type registration. Working with Go means that there is no room for any error if the bindings share a type namespace. My proposal is simple, causes no real issues, and enables support for multiple CSI API bindings in a single Go process. |
Hi @cpuguy83,
Is that accurate though?. SemVer allows for breaking changes between major versions. So it's perfectly reasonable for you or anyone to recommend a breaking change between 1.0 and 2.0. All I'm suggesting is to make it possible for Go devs to handle that much more gracefully than requiring separate binaries that support different, major CSI versions. |
How about instead of versioning the API namespace for each iteration of MAJOR, MINOR, or PATCH we version it only for major changes? For example:
That would reduce the number of namespaces and also enable in-proc support for backwards compatibility across major versions. |
Hi @childsb,
I am not either. I do know enough about gRPC to know that breaking changes breaks backwards compatibility if you change message names or field order. How many times have message names changed already? I am just trying to open people up to the idea of a way of allowing for that without breaking support across the board for languages like Go that make supporting multiple API versions problematic. |
@akutz Sometimes there are common types, in this case it ends up not really making sense to version them since they are consumed by multiple services. |
Hi @cpuguy83, Okay then. Can we then reach a compromised agreement where the proto namespace will have |
Given this version scheme, does it make sense to consider the current spec work a true V2, or should it be a V1.1 |
Hi @childsb, Until the spec is |
Thanks Guys. Couple of quick thoughts I had when I opened the issue: Probe - What it the role of the probe? 1st Argument: 2nd Argument: |
If we really want the CO to provide an init call to the plugin, i think we should call it init(...) or something that implies such (though packaging and deployment is a NON goal in the SPEC)... Probe to me is always a status check and not a bootstrap call to action. |
Hi @childsb, I disagree. I think probe can be both a liveness and bootstrap check. An SP should be able to, via any RPC, indicate that a probe is required. That would enable both scenarios. Today the spec does say explicitly that probe calls should be invoked ahead of any other RPCs:
That must imply when after an SP process starts or as written is useless. Thus today the spec does indicate that probe is both liveness and in effect an init call. I disagree with that language, however, as it isn't being given any initial configuration. It's just indicating to the process to prepare itself. |
Sure, we can conflate init and probe, but whats the benefit @akutz ? |
Hi @childsb, Because I believe myself and @codenrhoden outlined all of this in our comments above: Please take another look at those comments for detailed reasons related to your question. |
I think the point I was trying to make is that probe is a readiness check from API POV, and thats it.. if a volume plugin uses it for something else (such as init) its at its own peril (and i'm not saying lets add init to the spec) Why the distinction? if the CO acknowledges it as a bootstrap process it implies some sort of recovery handling from the CO. If its only readiness check the responsibility is on the CSI driver to resolve its own state/readiness if it does use probe(..) to init. |
I never took
Is there agreement on that? |
Many plugins will probably choose probe(...) to start bootstrap, but dictating the behavior as API is problematic. |
sure
sure
sure. However, the spec requires that a probe comes first. All we're asking for is the ability for an SP to reliably communicate to the CO that a probe hasn't happened yet, so that it can send one. |
Sorry i'm slow... Why does the CO HAVE to re-probe on request? Can't the plugin probe itself if it needs it? |
Hi @childsb, Because the SP can either fail to start with the error as to why or probe itself ahead of every RPC and return some error to the CO that the spec does not define and the CO may not understand. |
If a plugin is going to probe itself, why have the RPC defined at all? From the spec: Could the plugin probe itself? Yes, it could. But then how do you inform the CO what is wrong? I would think the CO would be interested in the error details returned from a Unless the language of the spec changes to no longer say that a CO calling this RPC MUST happen first, I don't understand the pushback here. How do you suggest a CO go about meeting its requirement to first call a Probe? |
i'm not sure that i agree (with the specs assertion) that probe should be called before any other request.. let me go back over it. |
This is a good argument. I agree we should leave
The spec specifically avoided specifying packaging, except in this case. And now that I reflect on it, defining the exact mechanism by which a plugin would discover initialization information (like socket path) in the spec is unnecessary and we probably shouldn't have done it. The spec should dictate that communication must occur via a unix domain socket, but it doesn't need to dictate that it must be set via a
Agreed, we should modify the spec to dictate that a plugin must know (somehow) what mode it is running in (controller, node, or both). I disagree that the exact exact mechanism for how the plugin discovers its mode should be dictated. As stated above, I think we should loosen the existing env var requirements to allow the plugin to decide what mechanism it will choose to fulfill this requirement.
@chakri-nelluri and I had this thought which led him to open this issue. Upon further discussion we realized that the value of
Agreed. A SP should fail with non-0 exit code if pre-conditions are not met at initialization and spit out information in to stderr. This should be guidance (not a requirement and probably not part of the spec).
Yes!
Yes!
I'd prefer to just clarifying the definition instead of (or in addition to) renaming it.
No, this is unnecessary. The purpose is to check the health of this specific instance (deployment) of the plugin only. Only action should be what ever "rectification" logic that makes sense and is defined by the deployment (e.g. plugin reported unhealthy restart it).
For the reasons @akutz listed above it makes sense to not combine
No, Initialization configuration information should be passed in via packaging and deployment mechanisms, not via the interface.
Yes!
I think this wording is incorrect and part of what led to the confusion around the purpose of probe.
Yes! Yes! Yes!
That was a mistake. The spec should not require this. Probe should not be used as a "initialization signal". Let's summarize, there are four distinct duties, and it is worth clarifying what component is responsible for them:
This is a neat idea. It reminds me of the way that Kubernetes API does versioning, and allows for alpha and beta releases in the future:
We have had lots of discussion, I want to make sure we come to a conclusion and actually get something done. Concrete proposal:
The code freeze for Kubernetes is coming up on Feb 26, and I'd like to get these changes in. So could you please chime in if you disagree ASAP. |
Agreed with @saad-ali's proposal. |
This patch removes all of the SP implementation information previously detailed in the specification. With respect to issue container-storage-interface#171 and comments made by @saad-ali, @childsb, @jdef, and others, it has become apparent that the specification should not supply any suggestions or restrictions on matters related to an SP's packaging, deployment, or runtime configuration. Removed information includes, but not limited to: * `CSI_ENDPOINT` - SPs are now responsible for defining the manner by which they grok the gRPC endpoint on which they serve the CSI services. CO admins are now responsible for deploying and configuring SPs using that SP's documentation. * Env var names - The spec no longer claims ownership of the `CSI_` prefix for environment variables. * Permissions - The spec no longer asserts any recommendations with respect to container permissions that an SP might require. * Logging - All logging details are the responsibility of the SP and may vary SP to SP. The removed information may very well be suitable to live in a separate document that provides guidelines for SP creation and configuration, such as common environment variable names, logging and packaging details, etc. cc @codenrhoden
This patch removes all of the SP implementation information previously detailed in the specification. With respect to issue container-storage-interface#171 and comments made by @saad-ali, @childsb, @jdef, and others, it has become apparent that the specification should not supply any suggestions or restrictions on matters related to an SP's packaging, deployment, or runtime configuration. Removed information includes, but not limited to: * `CSI_ENDPOINT` - SPs are now responsible for defining the manner by which they grok the gRPC endpoint on which they serve the CSI services. CO admins are now responsible for deploying and configuring SPs using that SP's documentation. * Env var names - The spec no longer claims ownership of the `CSI_` prefix for environment variables. * Permissions - The spec no longer asserts any recommendations with respect to container permissions that an SP might require. * Logging - All logging details are the responsibility of the SP and may vary SP to SP. The removed information may very well be suitable to live in a separate document that provides guidelines for SP creation and configuration, such as common environment variable names, logging and packaging details, etc. cc @codenrhoden
I spoke with @jieyu about the proposal above. While he agrees with not specifying the mechanism by which a plugin will discover mode (proposal 1), he explained that Mesos needs To accommodate that use case we agreed on a revised proposal: Concrete proposal:
|
@saad-ali Shall we use the standard GRPC health checking model: https://github.com/grpc/grpc/blob/master/doc/health-checking.md |
@akutz @saad-ali @childsb Is the versioning going to be at a service level or at an API level inside the same service? Lets clarify. I have seen some people doing at service level rather than at api level. Ex: https://developers.google.com/assistant/sdk/reference/rpc/ |
Oh nice. So far we've not imported any external protos. |
Please review #195 and add comments there. |
@chakri-nelluri @saad-ali something that I find interesting about the gRPC health check protocol is that there's an OPTIONAL service name in the request, which allows for (optionally) more granular, per-service level health checks. For example, if a plugin instance implemented controller and node services, and the Probe RPC supported a similar, optional service name in the request, then a CO could be configured to probe the health of the controller and node services independently. In the end, for CSI, I'm not convinced that this would be a huge win (yet) because the only thing a CO can do w/ an unhealthy plugin is to respawn it -- strategy is independent of which specific gRPC service is failing. Or am I missing something obvious? Anyway, since we've simplified the API, we can always add an optional service name to the Probe request at some later point, if needed, without breaking the API. |
Today, we have Identity service which establishes the identity and discovers the list of supported versions.
Also, a service can crash and spin back up anytime and the error codes defined in the probe can ideally be returned for any RPC call.
Given that do we really need a Probe call? Can we just use Identity service to establish the identity and readiness of the plugin?
The text was updated successfully, but these errors were encountered: