-
Notifications
You must be signed in to change notification settings - Fork 575
proto structs for routing rules and proxy config #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
message matchSpec { | ||
message dstMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The routing rule proposal in https://docs.google.com/document/d/1nEVaJanPoI0PsEVy11xgWla2bG6jsh9Y0UcI4_D5rjo/edit#heading=h.1mrqkg688g3g moves the match information into the selector expression, so we wouldn't need any of the match types defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not label selector matches. These are matching based on type of protocol, URI, etc. The label selector list has been abstracted out and appears as a separate entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A selector in the config spec is more general than a label selector. From the config doc: "A selector is a boolean expression that evaluates a set of attributes to produce a boolean result.". I was attempting to align the istio routing spec with this istio mixer config, so the whole match portion for a routing rule is represented in a selector expression. Here's an example of a routing rule selector:
- selector: dst.proto == "http" &&
dst.host == "serviceB.cluster.local" &&
hdrs.cookie == ".*?user=tester1"
Whether or not this is the best approach is open for discussion, but it is the current proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the selector would just be a simple string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. But now I'm not clear on where and how the format of that string is described and how it is validated. But this is the same design approach that mixer config is using
@@ -0,0 +1,183 @@ | |||
// Copyright 2016 IBM Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I think by convention this file should be named cfg.proto
} | ||
|
||
message httpHeader { | ||
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all proto types supposed to start with an upper case letter (by convention)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Yes I agree. Except I couldn't find a selector description in the existing
specs in API repo, one that aligned with Sven's proposal for extracting
request attributes (which among other things included pod labels as well).
If it's already there, could you please point me to it? Or if you wish,
feel free to edit the PR (my repo directly).
On Thu, Jan 12, 2017 at 10:48 AM Frank Budinsky ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In proxy/v1/config/routing.proto <#21>:
> + }
+ message delay {
+ string type = 1;
+ oneof delay_type {
+ httpFixedDelay fixeddelay=2;
+ httpExponentialDelay expdelay=3;
+ tcpBandwidthThrottle tcpdelay=4;
+ }
+ }
+ repeated httpHeader headers = 1;
+ abort abortSpec = 2;
+ delay delaySpec = 3;
+}
+
+message matchSpec {
+ message dstMatch {
A selector in the config spec is more general than a label selector. From
the config doc: "A selector is a boolean expression that evaluates a set of
attributes to produce a boolean result.". I was attempting to align the
istio routing spec with this istio mixer config, so the whole match portion
for a routing rule is represented in a selector expression. Here's an
example of a routing rule selector:
- selector: dst.proto == "http" &&
dst.host == "serviceB.cluster.local" &&
hdrs.cookie == ".*?user=tester1"
Whether or not this is the best approach is open for discussion, but it is
the current proposal.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd4bymbe4Z8BUYg1OmXwk2_-X600jks5rRksvgaJpZM4Lhafj>
.
--
~shriram
|
So, I think there are two parts to this. 1) how to define the selector expression syntax corresponding to the match part of the routing rules. I don't know how or where that is done. 2) the route and action parts need to be converted form JSON schema to proto, and put that in istio.io/api (maybe that is just a subset of what you have already done?). I'll take a look and then take a pass at updating what you've done for #2 and also align it with what I think are the proper conventions, etc. |
// - key: target_service_name | ||
// Inputs: | ||
// Attr.target_service_name: target_service_name || target_service_id | ||
message Aspect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit odd to use aspects here and their utility in the context of routing seems far from obvious. Strip for the moment and add back when there is a need
|
||
// Encapsulates the routing related configurations for the proxy | ||
message RoutingConfig { | ||
string subject = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define subject
|
||
// A single routing rule | ||
message RoutingRule { | ||
string selector = 1; // expression based on attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we selecting here
// A single routing rule | ||
message RoutingRule { | ||
string selector = 1; // expression based on attributes | ||
Aspect aspect = 2; // Aspect.kind == "routing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
repeated AspectRule rules = 3; | ||
} | ||
|
||
message httpHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move leaf message types below root message types for readability
string value = 2; | ||
} | ||
|
||
message dstServiceSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/dst/Destination
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
message dstServiceSpec { | ||
message lbPolicy { | ||
string name = 1; | ||
google.protobuf.Any customAttributes=2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the use of 'Any' to allow for an extensible/vendor-specific LB policy mechanism.
Consider renaming field to 'policyImpl'
} | ||
|
||
message connectTimeout { | ||
uint32 duration = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning? rename to duration_ms or should I switch to using the duration type?
|
||
message connectTimeout { | ||
uint32 duration = 1; | ||
string overrideHeaderName = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for injection or receiving the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... also you have Connect/Read/Write here and below. How would a header be present for 'connect' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the connect timeout is for outbound connections from the proxy to another service, in a scenario such as A->proxy->B
. A
tells the proxy that it should use a connect timeout of Xms.
So, these timeouts are a bit too fine grained. Nginx for example supports setting connect/read/write timeouts. Envoy on the other hand provides a single coarse grained timeout field that includes all retries or per attempt.
What do you think about sticking to one coarse grained timeout, with the intent that its timeout per request. For a config, where there are N retries per request, the total timeout would be N*timeout_per_req
. For proxies that support finer timeouts, we could set connect/read/write timeout to be the same.
string overrideHeaderName = 2; | ||
} | ||
|
||
message readTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean Connect/Read/Write seem semantically different from typical request timeouts. Do we have a real use-cases for timing out writes to a slow network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use case I can think of is when using non request oriented protocols. Say, using kafka, mysql, etc. Depending on what language framework the user is using, it may not even matter to most. See comment above for more detail.
string overrideHeaderName = 2; | ||
} | ||
|
||
message retriesSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we could probably be more concrete here and use a oneof to handle adding more types over time. This seems a little different to LB where the possibility of variation may be higher
} | ||
|
||
message circuitBreakerSpec { | ||
uint32 successThreshold = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Units?
tcpBandwidthThrottle tcpdelay=4; | ||
} | ||
} | ||
repeated httpHeader headers = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are injected headers? clarify name
|
||
message matchSpec { | ||
message dstMatch { | ||
string proto = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would adopt the nomenclature of H2 for these fields. scheme, port, authority, path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would confine this to HTTP, which is good.
For other protocols (e.g., mongo, mysql, redis - that envoy supports), we might need a separate routing spec.
CLAs look good, thanks! |
// attributes, etc. | ||
// TODO: Merge with the proposal for deriving request metadata in K8S. | ||
MatchCondition match = 1; | ||
// 1 or more upstream clusters for this route. Each cluster represents a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to have a simple glossary in the heading to clearly define 'upstream', 'downstream' etc.
Ambiguity about the meaning of cluster here with that in k8s could be resolve in a glossary too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
// N.B. Tags can be arbitrary strings (includes key=value pairs) | ||
repeated string tags = 3; | ||
|
||
message HTTPAttributes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this me called HttpMatchCondition.
Also we could eliminate the enum if we moved this into a oneof {HttpMatchCondition, TCPMatchCondition, ....}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per @enricoschiattarella 's comments, I thought it might be okay for the meantime to keep both http and tcp based matching. See newer format.
} | ||
Protocol protocol = 1; | ||
uint32 port = 2; | ||
// One or more tags that identify the calling service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we call the calling service 'downstream' or the called service (nit = I've always hated how nginx refers to things)
Either way we should use a consistent term here in the field name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I hate that as well. I tried to avoid using those terms completely (except for the typo in the first part). So, I thought I would stick with caller / target. (No need for a glossary either).
If you think that is convoluted, I can switch to the uniform nginx style convention, add a glossary on top. Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with upstream and downstream as long as we're clear about it.
I think Caddy has nicer docs than nginx and they use the same terminology so I'm not totally averse.
https://caddyserver.com/docs/proxy
So our permutations are:
(source, client, downstream, caller) X (destination, service, server, upstream, receiver, target)
The H2 spec mostly uses 'server' though it also refers to 'upstream sever' and I don't particularly like client-server, client-service, source-dest, caller-receiver etc.
I think I've convinced myself that I like downstream/upstream more than anything else :/
|
||
message HTTPAttributes { | ||
string scheme = 1; //http|https | ||
string authority = 2; // host|authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host is an authority, do you mean ip|host|... in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it there as a reminder for myself :), to make sure I handle "host" header (or have an option for it) when generating configs for proxies that don't do http2 proxying (e.g. Nginx).
I'll remove it. Seems to just cause confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely LGTM.
Terminology looks pretty solid.
// and downstream service invoking the API call. The choice of the specific | ||
// upstream cluster will be determined by the routing rule. | ||
message RouteRule { | ||
oneof rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit:
@wora is the recommended style for oneof's to be so verbose on the nested fields. For example when referencing a route_rule from a proxy config you would logically have
some_proxyconfig_obj.route_rules[0].rule.http_route_rule
any reason not to eliminate the trailing _route_rule so we get
some_proxyconfig_obj.route_rules[0].rule.http
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the JSON/YAML mapping the "rule" part doesn't appear at all:
route_rules:
- http_route_rule:
match:
although for this example the short nested field name might be nicer anyway:
route_rules:
- http:
match:
// Set of Http request level match attributes | ||
string scheme = 3; | ||
|
||
oneof amatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wora - Seems like this could become a type
message StringMatch {
enum MatchType {
EXACT, PREFIX, REGEX
}
MatchType match_type
string match_string;
}
and then be re-used by having
...
StringMatch authority;
StringMatch uri
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be used for header matches later
IP_HASH = 2; | ||
RANDOM = 3; | ||
} | ||
oneof lbpolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wora isn't it snake_case for oneof field names. Also same comment here about redundancy in field names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good. There are few places missing documentation. Please give example as much as possible.
double throttle_for_seconds = 6; | ||
} | ||
|
||
message Terminate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not documented.
// Proxy level global configurations go here | ||
message ProxyConfig { | ||
// config specification version. defaults to 1.0 | ||
string version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proto package name already has version. This should be "int32 revision" to indicate minor changes. We can't have breaking change within this proto, so version "2.0" would not be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, would it be okay to just remove the version from here? if the proto package for istio already has version field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep int32 revision here and default to 0. In some cases, you do need revision to indicate some behavior change. proto3 can not transfer 0 over wire, so it adds no cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the changes. Thanks for such a major work.
One general issue is I am not sure how multiple percent based policies work together. It needs to be documented clearly.
I had a few design pattern suggestions, such as URI for cluster identifier. We found they work generally well.
// Proxy level global configurations go here | ||
message ProxyConfig { | ||
// config specification version. | ||
int32 version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version => revision
The proto package name contains the version. Using revision here to avoid conflict. See https://github.com/googleapis/googleapis/blob/master/google/type/postal_address.proto#L46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename the package to indicate that this is 'alpha' quality
|
||
// Abruptly reset (terminate) the Tcp connection after it has been | ||
// established, emulating remote server crash or link failure. | ||
message Terminate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message does not say when to drop the connection, do we need reset_after_seconds etc?
My recommendation is we should terminate N% of connections after M seconds of delay.
uint32 http_status = 4; | ||
} | ||
// Specify abort code as part of Http request | ||
string override_header_name = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can only specifies http_status, right? If so, document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be http_status/gRPC code or HTTP2_ERROR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics or syntax of the header are unspecified right now. Probably best to mark this and the others as a TODO.
// delay duration in seconds.nanoseconds | ||
double fixed_delay_seconds = 2; | ||
// Specify delay duration as part of Http request | ||
string override_header_name = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should become istio attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? The mixer is not involved in any of this. The proxy parses the header and identifies the delay parameter.
At the moment, the routing and mixer are separate. No overlaps. Any collusion will need lot more thinking and probably be in istio v2.
PREFIX = 1; | ||
// posix style regex match. Throw error if user's chosen proxy does not | ||
// support posix regex. | ||
REGEX = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to do
StringMatch {
oneof ... {
string exact;
string prefix;
string suffix;
string regexp;
}
}
When you write config, you only need to set 1 field instead of 2 fields, which is a big difference.
// tag string could be parsed into a pod label (key:value)) | ||
message ClusterIdentifier { | ||
string name = 1; | ||
repeated string tags = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which system uses this kind of identifier? Why don't we use URL, such as k8s://service?label=1&label=2. This would be more generic and standard. These days, you can use URL for pretty much anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more for how the user specifies the configuration in YAML/JSON, and how the proxy interprets it. Secondly, the routing has to be platform agnostic, else the promise of cross platform portability breaks down.
The format you have boils down to clunky specs on the end user front.
dst_cluster: k8s://service?environment=production&backend=stripe&version=v2&az=us-east-1&foo=bar
On the other hand, a spec of the form below is nicer and easier for user consumption, and its portable across platforms. For e.g., if I want to run the same istio config over a mesos deployment, I do not have to change my config at all.
dst_cluster:
name: service
tags:
- environment: production
- backend:stripe
- version:v2
- az:us-east-1
- foo:bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know URL can handle pretty much all resources and people are used to copy them around even it is ugly.
My question is how generic is the ClusterIdentifier
. For example, if IBM and Google have internal cluster management systems, do we know if ClusterIdentifier can handle them? How the mapping works?
The term "cluster" is also a bit strong. In k8s, this seems to be more like job or service than cluster. Can we ask k8s people to see what they prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what here covers the internal Google cases pretty well.
Any reason not to use map<string, string> for the tags so we're not forcing parsing all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was because tags could be just plain strings ("foo", or be "foo: bar"). I vaguely remember using map<> and then dumped it. if you think map<string, string> can capture just plain foo and foo:bar, then we can change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, leave as is, Can always revisit.
|
||
// Stop throttling after the given duration. Set to 0 to throttle the | ||
// connection for its lifetime. | ||
double throttle_for_seconds = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally people use -1 to mean forever. 0 means 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would likely have the opposite effect people would intend if they forgot to set the field. Better to use a wrapper type here perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... and by that I mean
https://github.com/google/protobuf/blob/master/src/google/protobuf/wrappers.proto#L51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, a proto newbie here. What does the wrapper buy us? It didn't seem to have any defaults/min/max in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapper types allow you to detect the presence or absence of a primitive field. This can be important when the traditional default values for a primitive field might have the opposite effect to user expectation.
In proto3 all primitive numeric fields have a default value of 0. Using 0 to represent things like infinity is a good way to introduce coding mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// the connection) and/or abruptly reset (terminate) the Tcp connection | ||
// after it has been established. | ||
Throttle throttle = 1; | ||
Terminate terminate = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really hard to tell how the two condition are triggered. Do we do throttle first then terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throttle or terminate. will document and convert to oneof
// percentage of requests on which the delay will be injected | ||
float percent = 1; | ||
// mean delay needed to derive the exponential delay values | ||
double mean_delay_seconds = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this one does not support override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move override_header_name up into 'Delay'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. It kind of made no sense to me to provide an override for exponential delay, unless the caller's requests always ended up in teh same upstream cluster, and the caller went through the pain of providing the same header value for all requests. I am not averse to adding one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we haven't actually said what the override header does I think it makes sense to promote it.
Most often I would expect the override to specify a fixed delay rather than attempt to override a property of the exponential delay config, largely because the caller probably won't know what delay type is actually being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// N.B. Both delay and abort can be specified simultaneously. In such | ||
// cases, a request would be first delayed and then aborted with an error | ||
// code. | ||
Abort abort = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 2 percent-based action, how do they interact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order but with independent evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will document.
// destination port | ||
uint32 dst_port = 4; | ||
// protocol Tcp|Udp | ||
string protocol = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, presumably we're ignoring SCTP, at least for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this round - yes
// TODO: This should move into UpstreamCluster | ||
HttpFaultInjection fault = 3; | ||
|
||
// Other things? such as tracing, etc. ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably start with a repeated Any and promote to first-class over time as things clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// just a.b.c.d | ||
repeated string src_ip_subnet = 1; | ||
// source port | ||
uint32 src_port = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without using a wrapper type here we will end up with default values of 0 for src & dest port which seems less than desirable. Alternative would be to use a repeated field but I think the need for multiple ports seems unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Wait for X seconds after the connection is established, before | ||
// terminating the connection. Set to 0 to terminate immediately on | ||
// connection establishment. | ||
double terminate_after_seconds = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we would use the 'Delay' type here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is aborting a connection. What Delay type are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message Delay {
defined above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so, I don't think that fits well. With Delay, we hold request in proxy, and wait for N seconds before forwarding.
In the terminate_after_seconds case, it would allow data to traverse on both sides. After the set number of seconds, we drop the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'Delay' became 'Duration' was just a generic time for expressing a unit of time then the semantic can be defined in the referring field name.
Duration terminate_after = 1;
and for http
Duration delay_request =
Might be overkill so I'm fine to defer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know where your comment about Duration went, but yes, I do think its overkill for the moment. But I am open to that option, once we get a chance to try out a few use cases and see how the configs pan out. So, I put a TODO there as a reminder to myself.
import "google/protobuf/any.proto"; | ||
import "google/protobuf/wrappers.proto"; | ||
|
||
package istio.proxy.v1.config; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to istio.proxy.v1alpha.config
@wora any objection, just want to avoid any stability connotations right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@louiscryan if there are no more comments, can we go ahead and get this merged so that we can proceed with rest of config/routing related stuff in manager. |
Agreed. LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for being late to the party on this one.
My complexity alarm bells are going off a bit with this API, and I'd like to make sure all the complexity is warranted.
As a strawman, here's a more minimalist take:
message ClientConfig {
int32 revision = 1;
repeated Route routes = 2;
repeated Destination destinations = 3;
}
message Route {
// Condition under which the route is used.
string condition = 1;
// Map from cluster id to weight to use with that id.
repeated map<string, int> destinations = 2;
}
message Destination {
string id = 1;
string health_check_endpoint = 2;
... // rest of existing UpstreamCluster
}
Something like that seems to have more power and generality and also be simpler to reason about. Why add so much complexity?
Also, generally speaking this file needs a ton of examples with use cases and how they would work, which would make it a lot easier to follow and understand what is trying to be accomplished.
// (either end). | ||
|
||
// TODO: This should be done on per-cluster basis. | ||
L4FaultInjection fault = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit strange to bake into the proxy configuration. I wonder if we're better having fault injection through a separate mechanism?
|
||
// Basic routing rule match criterion using Tcp attributes and downstream | ||
// cluster identifier | ||
message L4MatchCondition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using a different match syntax in the proxy config vs. the mixer config?
Can we have a unified syntax so our users only need to learn one model?
I could see the conditions just being replaced with a string, and the attribute match is just attributes listed in the condition, just like they are with aspect configuration.
Same with cluster identifier, that's just an attribute that should be matched like anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proxy matching features will be hardcoded into the proxy runtimes they won't be supported via a generic expression engine mechanism so I think things are a little different here.
Given that this is an internal representation for the moment I think it makes sense to explicitly spell out in schema what is or is not supported.
L4Protocol protocol = 5; | ||
} | ||
|
||
// A name and one or more tags uniquely identify an upstream or downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty confusing to me. Why is the identifier something other than a unique string?
What I'd expect is that UpstreamCluster is where you define a mapping from some unique string to the actual things represented there, and then in the rules you just refer to that unique string.
Do we have cases where a cluster identifer is not mentioned in an UpstreamCluster?
// in the definition of the UpstreamClusterPolicy. | ||
ClusterIdentifier dst_cluster = 1; | ||
|
||
// The proportion of connections to be forwarded to the upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: relative weights vs. absolute, the problem with absolute is that you require a validation step. But that may also be a good thing. With relative you always have a valid config, whether you want it or not, so there is a user experience gain but also increased risk to the user. Hrm.
// indicate the time for the entire response to arrive. | ||
message TimeoutPolicy { | ||
message SimpleTimeoutPolicy { | ||
// timeout is per attempt, when retries are specified as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some interesting discussions about how timeouts + retries aren't a very good approach for distributed systems with lots of components, and instead you need a total deadline + a load budget. Otherwise you can easily get cascading failures, long delays, etc.
We should spend some time thinking about what the right way to handle these are in our configuration. And also how to propagate things like deadlines and load budget remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we should think through budgeting. I do think point-to-point features are still useful for a large number of cases.
Large deployments that utilize some generic context propagation mechanism are where budgets can provide the most value
// itself from the evolution of dependent services. | ||
|
||
// Proxy level global configurations go here | ||
message ProxyConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProxyConfig seems like the wrong name for what we're doing here.
We're configuring a bunch of information about how a service should send traffic to upstream services, what timeouts to use, how to do load balancing, etc. The fact that a proxy is doing that work isn't actually relevant, a grpc library could just as easily use this information to configure itself.
Ideally we'd have some name that represents what this information is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming suggestions?
RoutingMesh?
Oh and also, is this intended only for manager -> proxy, and is an internal API, or is this meant as the canonical configuration of routing, and for use by humans? |
Right now this is an internal representation only. Hopefully as it bakes it becomes suitable for use as an API but we're clearly not there yet |
@smawson @louiscryan I am clearly lost here. I was under the assumption that this proto is what one uses to generate config files. For example, https://github.com/istio/istio/blob/master/demos/config_examples/routing.yaml This is work in progress. But that PR was on the table for 10 days 😜. Over time, we could refine the proto. Frankly, the process seems backwards. I would expect to define a clean JSON or swagger schema at end user level and work my way down. This proto stuff goes in the other direction. Having arbitrary "string" fields and sticking a bunch of config elements there seems fraught with peril. I have to ensure I implement everything that I documented in "informal English". And there is no formal way to verify these things. |
@smawson, you are ‘really' late to the party 🙂. We started this 5 weeks ago (take a look at the proposed Routing Config section at the bottom of this doc: https://docs.google.com/document/d/1nEVaJanPoI0PsEVy11xgWla2bG6jsh9Y0UcI4_D5rjo/edit#heading=h.1mrqkg688g3g.), and we are running short of time, as this routing proto issue has been waiting for feedback in the doc for a month, and this PR has been going through 7-8 changes over the last 10 days. FYI, the first feedback was to drop the aspects/attributes business. Instead of thrashing around back and forth, I suggest we just forge ahead with one point of view, and refine things along the way, as it gets the wheels moving. I was also of the impression that this is not just an internal representation. The examples doc that @rshriram pointed to in his comment, above, is the YAML config data that I assume would be provided by the user. |
I'm going to take my comments off this PR (which is fine submitted as is) and I'll move them to issues. I'm fine moving forward with getting everything working with the current format, but would like to continue iterating on the design and not take this as the final state. I'm currently concerned about the following items:
Will go create issues for them now. |
message RouteRule { | ||
oneof route_rule { | ||
L4RouteRule layer4 = 1; | ||
HttpRouteRule http = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be called L7RouteRule for symmetry as well as if protocol changes - lets say you'll have gRPC rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HTTP is OK here as it is the basis for protocols like GRPC
// in the definition of the UpstreamClusterPolicy. | ||
ClusterIdentifier dst_cluster = 1; | ||
|
||
// The proportion of connections to be forwarded to the upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt " The proportion of connections to be forwarded to the upstream". This should be connections for L4 rules and requests for L7 rules?
ClusterIdentifier cluster = 1; | ||
|
||
// Should be either http://.. or tcp://.. | ||
string health_check_endpoint = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically you need to decide what does it mean to be healthy, i.e. number until healthy, number until unhealthy, timeout interval, ping interval - see examples of public cloud APIs
http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
https://cloud.google.com/compute/docs/reference/latest/healthChecks
enum SimpleLBPolicy { | ||
// These four simple load balancing policies have literally no | ||
// additional configuration. | ||
ROUND_ROBIN = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about weighted round robin?
…iles (istio#21) * Echo service for testing common gRPC code * vendor the third party protobufs * Version the echo service, add a makefile using the vendored protos. Check in the generated files. * make the makefiles more similar, and generate all the files * fix the field type of body * finish up tests for the test grpc server, add some helpers to mem * address linter complaints in the files I touched Mirrored from https://github.com/tetrateio/tetrate @ 1fe8ed76658d682b22bd1d2eaea07f6c3b5e5211
This PR specs out the routing rules as discussed in the Better ingress proposal, augmenting it with details from the previous version of the routing doc (https://docs.google.com/document/d/1hUo_qjPf3zfFGH6yB0RLwWixfgVyJhktcKkzfsbqMSs/edit#)
This is a work in progress and serves a place holder. Do not merge yet. I am sure that I have missed a bunch of things, esp some things that @smawson pointed out in terms of how to derive attributes of a request (labels, et al).
I would appreciate feedback. It needs some refactoring, w.r.t. abstracting some common types out of mixer config as well.
cc @kyessenov @mandarjog @frankbu @ijsnellf @GregHanson