Skip to content

proto structs for routing rules and proxy config #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 26, 2017
Merged

proto structs for routing rules and proxy config #21

merged 16 commits into from
Jan 26, 2017

Conversation

rshriram
Copy link
Member

This PR specs out the routing rules as discussed in the Better ingress proposal, augmenting it with details from the previous version of the routing doc (https://docs.google.com/document/d/1hUo_qjPf3zfFGH6yB0RLwWixfgVyJhktcKkzfsbqMSs/edit#)

This is a work in progress and serves a place holder. Do not merge yet. I am sure that I have missed a bunch of things, esp some things that @smawson pointed out in terms of how to derive attributes of a request (labels, et al).

I would appreciate feedback. It needs some refactoring, w.r.t. abstracting some common types out of mixer config as well.

cc @kyessenov @mandarjog @frankbu @ijsnellf @GregHanson

}

message matchSpec {
message dstMatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routing rule proposal in https://docs.google.com/document/d/1nEVaJanPoI0PsEVy11xgWla2bG6jsh9Y0UcI4_D5rjo/edit#heading=h.1mrqkg688g3g moves the match information into the selector expression, so we wouldn't need any of the match types defined here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not label selector matches. These are matching based on type of protocol, URI, etc. The label selector list has been abstracted out and appears as a separate entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A selector in the config spec is more general than a label selector. From the config doc: "A selector is a boolean expression that evaluates a set of attributes to produce a boolean result.". I was attempting to align the istio routing spec with this istio mixer config, so the whole match portion for a routing rule is represented in a selector expression. Here's an example of a routing rule selector:

 - selector: dst.proto == "http" && 
            dst.host == "serviceB.cluster.local" &&
            hdrs.cookie == ".*?user=tester1"

Whether or not this is the best approach is open for discussion, but it is the current proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the selector would just be a simple string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. But now I'm not clear on where and how the format of that string is described and how it is validated. But this is the same design approach that mixer config is using

@@ -0,0 +1,183 @@
// Copyright 2016 IBM Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think by convention this file should be named cfg.proto

}

message httpHeader {
string name = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all proto types supposed to start with an upper case letter (by convention)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@rshriram
Copy link
Member Author

rshriram commented Jan 12, 2017 via email

@frankbu
Copy link
Contributor

frankbu commented Jan 12, 2017

So, I think there are two parts to this. 1) how to define the selector expression syntax corresponding to the match part of the routing rules. I don't know how or where that is done. 2) the route and action parts need to be converted form JSON schema to proto, and put that in istio.io/api (maybe that is just a subset of what you have already done?). I'll take a look and then take a pass at updating what you've done for #2 and also align it with what I think are the proper conventions, etc.

// - key: target_service_name
// Inputs:
// Attr.target_service_name: target_service_name || target_service_id
message Aspect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit odd to use aspects here and their utility in the context of routing seems far from obvious. Strip for the moment and add back when there is a need


// Encapsulates the routing related configurations for the proxy
message RoutingConfig {
string subject = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define subject


// A single routing rule
message RoutingRule {
string selector = 1; // expression based on attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we selecting here

// A single routing rule
message RoutingRule {
string selector = 1; // expression based on attributes
Aspect aspect = 2; // Aspect.kind == "routing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

repeated AspectRule rules = 3;
}

message httpHeader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move leaf message types below root message types for readability

string value = 2;
}

message dstServiceSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dst/Destination

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.

message dstServiceSpec {
message lbPolicy {
string name = 1;
google.protobuf.Any customAttributes=2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of 'Any' to allow for an extensible/vendor-specific LB policy mechanism.

Consider renaming field to 'policyImpl'

}

message connectTimeout {
uint32 duration = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Units

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning? rename to duration_ms or should I switch to using the duration type?


message connectTimeout {
uint32 duration = 1;
string overrideHeaderName = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for injection or receiving the value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... also you have Connect/Read/Write here and below. How would a header be present for 'connect' ?

Copy link
Member Author

@rshriram rshriram Jan 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the connect timeout is for outbound connections from the proxy to another service, in a scenario such as A->proxy->B. A tells the proxy that it should use a connect timeout of Xms.

So, these timeouts are a bit too fine grained. Nginx for example supports setting connect/read/write timeouts. Envoy on the other hand provides a single coarse grained timeout field that includes all retries or per attempt.

What do you think about sticking to one coarse grained timeout, with the intent that its timeout per request. For a config, where there are N retries per request, the total timeout would be N*timeout_per_req . For proxies that support finer timeouts, we could set connect/read/write timeout to be the same.

string overrideHeaderName = 2;
}

message readTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Connect/Read/Write seem semantically different from typical request timeouts. Do we have a real use-cases for timing out writes to a slow network?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use case I can think of is when using non request oriented protocols. Say, using kafka, mysql, etc. Depending on what language framework the user is using, it may not even matter to most. See comment above for more detail.

string overrideHeaderName = 2;
}

message retriesSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we could probably be more concrete here and use a oneof to handle adding more types over time. This seems a little different to LB where the possibility of variation may be higher

}

message circuitBreakerSpec {
uint32 successThreshold = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Units?

tcpBandwidthThrottle tcpdelay=4;
}
}
repeated httpHeader headers = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are injected headers? clarify name


message matchSpec {
message dstMatch {
string proto = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would adopt the nomenclature of H2 for these fields. scheme, port, authority, path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would confine this to HTTP, which is good.

For other protocols (e.g., mongo, mysql, redis - that envoy supports), we might need a separate routing spec.

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

// attributes, etc.
// TODO: Merge with the proposal for deriving request metadata in K8S.
MatchCondition match = 1;
// 1 or more upstream clusters for this route. Each cluster represents a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to have a simple glossary in the heading to clearly define 'upstream', 'downstream' etc.

Ambiguity about the meaning of cluster here with that in k8s could be resolve in a glossary too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

// N.B. Tags can be arbitrary strings (includes key=value pairs)
repeated string tags = 3;

message HTTPAttributes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this me called HttpMatchCondition.

Also we could eliminate the enum if we moved this into a oneof {HttpMatchCondition, TCPMatchCondition, ....}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per @enricoschiattarella 's comments, I thought it might be okay for the meantime to keep both http and tcp based matching. See newer format.

}
Protocol protocol = 1;
uint32 port = 2;
// One or more tags that identify the calling service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we call the calling service 'downstream' or the called service (nit = I've always hated how nginx refers to things)
Either way we should use a consistent term here in the field name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I hate that as well. I tried to avoid using those terms completely (except for the typo in the first part). So, I thought I would stick with caller / target. (No need for a glossary either).

If you think that is convoluted, I can switch to the uniform nginx style convention, add a glossary on top. Thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with upstream and downstream as long as we're clear about it.

I think Caddy has nicer docs than nginx and they use the same terminology so I'm not totally averse.

https://caddyserver.com/docs/proxy

So our permutations are:
(source, client, downstream, caller) X (destination, service, server, upstream, receiver, target)

The H2 spec mostly uses 'server' though it also refers to 'upstream sever' and I don't particularly like client-server, client-service, source-dest, caller-receiver etc.

I think I've convinced myself that I like downstream/upstream more than anything else :/


message HTTPAttributes {
string scheme = 1; //http|https
string authority = 2; // host|authority
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host is an authority, do you mean ip|host|... in the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it there as a reminder for myself :), to make sure I handle "host" header (or have an option for it) when generating configs for proxies that don't do http2 proxying (e.g. Nginx).
I'll remove it. Seems to just cause confusion.

Copy link
Contributor

@louiscryan louiscryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely LGTM.

Terminology looks pretty solid.

// and downstream service invoking the API call. The choice of the specific
// upstream cluster will be determined by the routing rule.
message RouteRule {
oneof rule {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit:

@wora is the recommended style for oneof's to be so verbose on the nested fields. For example when referencing a route_rule from a proxy config you would logically have

some_proxyconfig_obj.route_rules[0].rule.http_route_rule

any reason not to eliminate the trailing _route_rule so we get

some_proxyconfig_obj.route_rules[0].rule.http

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JSON/YAML mapping the "rule" part doesn't appear at all:

route_rules:
- http_route_rule:
    match:

although for this example the short nested field name might be nicer anyway:

route_rules:
- http:
    match:

// Set of Http request level match attributes
string scheme = 3;

oneof amatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wora - Seems like this could become a type

message StringMatch {
enum MatchType {
EXACT, PREFIX, REGEX
}
MatchType match_type
string match_string;
}

and then be re-used by having

...
StringMatch authority;
StringMatch uri
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be used for header matches later

IP_HASH = 2;
RANDOM = 3;
}
oneof lbpolicy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wora isn't it snake_case for oneof field names. Also same comment here about redundancy in field names

Copy link
Contributor

@wora wora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good. There are few places missing documentation. Please give example as much as possible.

double throttle_for_seconds = 6;
}

message Terminate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not documented.

// Proxy level global configurations go here
message ProxyConfig {
// config specification version. defaults to 1.0
string version = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proto package name already has version. This should be "int32 revision" to indicate minor changes. We can't have breaking change within this proto, so version "2.0" would not be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, would it be okay to just remove the version from here? if the proto package for istio already has version field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep int32 revision here and default to 0. In some cases, you do need revision to indicate some behavior change. proto3 can not transfer 0 over wire, so it adds no cost.

Copy link
Contributor

@wora wora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the changes. Thanks for such a major work.

One general issue is I am not sure how multiple percent based policies work together. It needs to be documented clearly.

I had a few design pattern suggestions, such as URI for cluster identifier. We found they work generally well.

// Proxy level global configurations go here
message ProxyConfig {
// config specification version.
int32 version = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version => revision

The proto package name contains the version. Using revision here to avoid conflict. See https://github.com/googleapis/googleapis/blob/master/google/type/postal_address.proto#L46

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should rename the package to indicate that this is 'alpha' quality


// Abruptly reset (terminate) the Tcp connection after it has been
// established, emulating remote server crash or link failure.
message Terminate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message does not say when to drop the connection, do we need reset_after_seconds etc?

My recommendation is we should terminate N% of connections after M seconds of delay.

uint32 http_status = 4;
}
// Specify abort code as part of Http request
string override_header_name = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only specifies http_status, right? If so, document it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be http_status/gRPC code or HTTP2_ERROR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics or syntax of the header are unspecified right now. Probably best to mark this and the others as a TODO.

// delay duration in seconds.nanoseconds
double fixed_delay_seconds = 2;
// Specify delay duration as part of Http request
string override_header_name = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should become istio attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? The mixer is not involved in any of this. The proxy parses the header and identifies the delay parameter.

At the moment, the routing and mixer are separate. No overlaps. Any collusion will need lot more thinking and probably be in istio v2.

PREFIX = 1;
// posix style regex match. Throw error if user's chosen proxy does not
// support posix regex.
REGEX = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to do

StringMatch {
oneof ... {
string exact;
string prefix;
string suffix;
string regexp;
}
}

When you write config, you only need to set 1 field instead of 2 fields, which is a big difference.

// tag string could be parsed into a pod label (key:value))
message ClusterIdentifier {
string name = 1;
repeated string tags = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which system uses this kind of identifier? Why don't we use URL, such as k8s://service?label=1&label=2. This would be more generic and standard. These days, you can use URL for pretty much anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more for how the user specifies the configuration in YAML/JSON, and how the proxy interprets it. Secondly, the routing has to be platform agnostic, else the promise of cross platform portability breaks down.

The format you have boils down to clunky specs on the end user front.

dst_cluster: k8s://service?environment=production&backend=stripe&version=v2&az=us-east-1&foo=bar

On the other hand, a spec of the form below is nicer and easier for user consumption, and its portable across platforms. For e.g., if I want to run the same istio config over a mesos deployment, I do not have to change my config at all.

dst_cluster: 
 name: service
 tags:
   - environment: production
   - backend:stripe
   - version:v2
   - az:us-east-1
   - foo:bar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know URL can handle pretty much all resources and people are used to copy them around even it is ugly.

My question is how generic is the ClusterIdentifier. For example, if IBM and Google have internal cluster management systems, do we know if ClusterIdentifier can handle them? How the mapping works?

The term "cluster" is also a bit strong. In k8s, this seems to be more like job or service than cluster. Can we ask k8s people to see what they prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what here covers the internal Google cases pretty well.

Any reason not to use map<string, string> for the tags so we're not forcing parsing all the time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was because tags could be just plain strings ("foo", or be "foo: bar"). I vaguely remember using map<> and then dumped it. if you think map<string, string> can capture just plain foo and foo:bar, then we can change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, leave as is, Can always revisit.


// Stop throttling after the given duration. Set to 0 to throttle the
// connection for its lifetime.
double throttle_for_seconds = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally people use -1 to mean forever. 0 means 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would likely have the opposite effect people would intend if they forgot to set the field. Better to use a wrapper type here perhaps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, a proto newbie here. What does the wrapper buy us? It didn't seem to have any defaults/min/max in the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapper types allow you to detect the presence or absence of a primitive field. This can be important when the traditional default values for a primitive field might have the opposite effect to user expectation.

In proto3 all primitive numeric fields have a default value of 0. Using 0 to represent things like infinity is a good way to introduce coding mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// the connection) and/or abruptly reset (terminate) the Tcp connection
// after it has been established.
Throttle throttle = 1;
Terminate terminate = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really hard to tell how the two condition are triggered. Do we do throttle first then terminate?

Copy link
Member Author

@rshriram rshriram Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throttle or terminate. will document and convert to oneof

// percentage of requests on which the delay will be injected
float percent = 1;
// mean delay needed to derive the exponential delay values
double mean_delay_seconds = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this one does not support override?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move override_header_name up into 'Delay'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. It kind of made no sense to me to provide an override for exponential delay, unless the caller's requests always ended up in teh same upstream cluster, and the caller went through the pain of providing the same header value for all requests. I am not averse to adding one though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we haven't actually said what the override header does I think it makes sense to promote it.

Most often I would expect the override to specify a fixed delay rather than attempt to override a property of the exponential delay config, largely because the caller probably won't know what delay type is actually being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// N.B. Both delay and abort can be specified simultaneously. In such
// cases, a request would be first delayed and then aborted with an error
// code.
Abort abort = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 2 percent-based action, how do they interact?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order but with independent evaluation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will document.

// destination port
uint32 dst_port = 4;
// protocol Tcp|Udp
string protocol = 5;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enum?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, presumably we're ignoring SCTP, at least for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this round - yes

// TODO: This should move into UpstreamCluster
HttpFaultInjection fault = 3;

// Other things? such as tracing, etc. ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably start with a repeated Any and promote to first-class over time as things clarify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// just a.b.c.d
repeated string src_ip_subnet = 1;
// source port
uint32 src_port = 2;
Copy link
Contributor

@louiscryan louiscryan Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without using a wrapper type here we will end up with default values of 0 for src & dest port which seems less than desirable. Alternative would be to use a repeated field but I think the need for multiple ports seems unlikely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Wait for X seconds after the connection is established, before
// terminating the connection. Set to 0 to terminate immediately on
// connection establishment.
double terminate_after_seconds = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we would use the 'Delay' type here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is aborting a connection. What Delay type are you referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message Delay {

defined above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, I don't think that fits well. With Delay, we hold request in proxy, and wait for N seconds before forwarding.
In the terminate_after_seconds case, it would allow data to traverse on both sides. After the set number of seconds, we drop the connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'Delay' became 'Duration' was just a generic time for expressing a unit of time then the semantic can be defined in the referring field name.

Duration terminate_after = 1;

and for http

Duration delay_request =

Might be overkill so I'm fine to defer

Copy link
Member Author

@rshriram rshriram Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know where your comment about Duration went, but yes, I do think its overkill for the moment. But I am open to that option, once we get a chance to try out a few use cases and see how the configs pan out. So, I put a TODO there as a reminder to myself.

import "google/protobuf/any.proto";
import "google/protobuf/wrappers.proto";

package istio.proxy.v1.config;
Copy link
Contributor

@louiscryan louiscryan Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to istio.proxy.v1alpha.config

@wora any objection, just want to avoid any stability connotations right now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rshriram
Copy link
Member Author

@louiscryan if there are no more comments, can we go ahead and get this merged so that we can proceed with rest of config/routing related stuff in manager.

@louiscryan
Copy link
Contributor

Agreed.

LGTM

@louiscryan louiscryan closed this Jan 26, 2017
@louiscryan louiscryan reopened this Jan 26, 2017
@louiscryan louiscryan merged commit ef71ba2 into istio:master Jan 26, 2017
Copy link
Contributor

@smawson smawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for being late to the party on this one.

My complexity alarm bells are going off a bit with this API, and I'd like to make sure all the complexity is warranted.

As a strawman, here's a more minimalist take:

message ClientConfig {
int32 revision = 1;
repeated Route routes = 2;
repeated Destination destinations = 3;
}

message Route {
// Condition under which the route is used.
string condition = 1;
// Map from cluster id to weight to use with that id.
repeated map<string, int> destinations = 2;
}

message Destination {
string id = 1;

string health_check_endpoint = 2;
... // rest of existing UpstreamCluster
}

Something like that seems to have more power and generality and also be simpler to reason about. Why add so much complexity?

Also, generally speaking this file needs a ton of examples with use cases and how they would work, which would make it a lot easier to follow and understand what is trying to be accomplished.

// (either end).

// TODO: This should be done on per-cluster basis.
L4FaultInjection fault = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit strange to bake into the proxy configuration. I wonder if we're better having fault injection through a separate mechanism?


// Basic routing rule match criterion using Tcp attributes and downstream
// cluster identifier
message L4MatchCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a different match syntax in the proxy config vs. the mixer config?

Can we have a unified syntax so our users only need to learn one model?

I could see the conditions just being replaced with a string, and the attribute match is just attributes listed in the condition, just like they are with aspect configuration.

Same with cluster identifier, that's just an attribute that should be matched like anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy matching features will be hardcoded into the proxy runtimes they won't be supported via a generic expression engine mechanism so I think things are a little different here.

Given that this is an internal representation for the moment I think it makes sense to explicitly spell out in schema what is or is not supported.

L4Protocol protocol = 5;
}

// A name and one or more tags uniquely identify an upstream or downstream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing to me. Why is the identifier something other than a unique string?

What I'd expect is that UpstreamCluster is where you define a mapping from some unique string to the actual things represented there, and then in the rules you just refer to that unique string.

Do we have cases where a cluster identifer is not mentioned in an UpstreamCluster?

// in the definition of the UpstreamClusterPolicy.
ClusterIdentifier dst_cluster = 1;

// The proportion of connections to be forwarded to the upstream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: relative weights vs. absolute, the problem with absolute is that you require a validation step. But that may also be a good thing. With relative you always have a valid config, whether you want it or not, so there is a user experience gain but also increased risk to the user. Hrm.

// indicate the time for the entire response to arrive.
message TimeoutPolicy {
message SimpleTimeoutPolicy {
// timeout is per attempt, when retries are specified as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some interesting discussions about how timeouts + retries aren't a very good approach for distributed systems with lots of components, and instead you need a total deadline + a load budget. Otherwise you can easily get cascading failures, long delays, etc.

We should spend some time thinking about what the right way to handle these are in our configuration. And also how to propagate things like deadlines and load budget remaining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should think through budgeting. I do think point-to-point features are still useful for a large number of cases.

Large deployments that utilize some generic context propagation mechanism are where budgets can provide the most value

// itself from the evolution of dependent services.

// Proxy level global configurations go here
message ProxyConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProxyConfig seems like the wrong name for what we're doing here.

We're configuring a bunch of information about how a service should send traffic to upstream services, what timeouts to use, how to do load balancing, etc. The fact that a proxy is doing that work isn't actually relevant, a grpc library could just as easily use this information to configure itself.

Ideally we'd have some name that represents what this information is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming suggestions?

RoutingMesh?

@smawson
Copy link
Contributor

smawson commented Jan 26, 2017

Oh and also, is this intended only for manager -> proxy, and is an internal API, or is this meant as the canonical configuration of routing, and for use by humans?

@louiscryan
Copy link
Contributor

Right now this is an internal representation only. Hopefully as it bakes it becomes suitable for use as an API but we're clearly not there yet

@rshriram
Copy link
Member Author

@smawson @louiscryan I am clearly lost here. I was under the assumption that this proto is what one uses to generate config files. For example, https://github.com/istio/istio/blob/master/demos/config_examples/routing.yaml

This is work in progress. But that PR was on the table for 10 days 😜.

Over time, we could refine the proto. Frankly, the process seems backwards. I would expect to define a clean JSON or swagger schema at end user level and work my way down. This proto stuff goes in the other direction. Having arbitrary "string" fields and sticking a bunch of config elements there seems fraught with peril. I have to ensure I implement everything that I documented in "informal English". And there is no formal way to verify these things.

@frankbu
Copy link
Contributor

frankbu commented Jan 27, 2017

@smawson, you are ‘really' late to the party 🙂. We started this 5 weeks ago (take a look at the proposed Routing Config section at the bottom of this doc: https://docs.google.com/document/d/1nEVaJanPoI0PsEVy11xgWla2bG6jsh9Y0UcI4_D5rjo/edit#heading=h.1mrqkg688g3g.), and we are running short of time, as this routing proto issue has been waiting for feedback in the doc for a month, and this PR has been going through 7-8 changes over the last 10 days. FYI, the first feedback was to drop the aspects/attributes business. Instead of thrashing around back and forth, I suggest we just forge ahead with one point of view, and refine things along the way, as it gets the wheels moving.

I was also of the impression that this is not just an internal representation. The examples doc that @rshriram pointed to in his comment, above, is the YAML config data that I assume would be provided by the user.

@smawson
Copy link
Contributor

smawson commented Jan 27, 2017

I'm going to take my comments off this PR (which is fine submitted as is) and I'll move them to issues. I'm fine moving forward with getting everything working with the current format, but would like to continue iterating on the design and not take this as the final state.

I'm currently concerned about the following items:

  • Alignment with control plane config (selector, aspect model, etc.)
  • General API complexity (ClusterID representation, Cluster as a name at all)
  • Budgeting / Deadline propagation
  • L4 / HTTP specialization

Will go create issues for them now.

message RouteRule {
oneof route_rule {
L4RouteRule layer4 = 1;
HttpRouteRule http = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be called L7RouteRule for symmetry as well as if protocol changes - lets say you'll have gRPC rules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HTTP is OK here as it is the basis for protocols like GRPC

// in the definition of the UpstreamClusterPolicy.
ClusterIdentifier dst_cluster = 1;

// The proportion of connections to be forwarded to the upstream
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrt " The proportion of connections to be forwarded to the upstream". This should be connections for L4 rules and requests for L7 rules?

ClusterIdentifier cluster = 1;

// Should be either http://.. or tcp://..
string health_check_endpoint = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically you need to decide what does it mean to be healthy, i.e. number until healthy, number until unhealthy, timeout interval, ping interval - see examples of public cloud APIs

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html
https://cloud.google.com/compute/docs/reference/latest/healthChecks

enum SimpleLBPolicy {
// These four simple load balancing policies have literally no
// additional configuration.
ROUND_ROBIN = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about weighted round robin?

incfly pushed a commit to incfly/api that referenced this pull request Jun 13, 2018
nacx pushed a commit to nacx/api that referenced this pull request Apr 15, 2020
…iles (istio#21)

* Echo service for testing common gRPC code

* vendor the third party protobufs

* Version the echo service, add a makefile using the vendored protos. Check in the generated files.

* make the makefiles more similar, and generate all the files

* fix the field type of body

* finish up tests for the test grpc server, add some helpers to mem

* address linter complaints in the files I touched

Mirrored from https://github.com/tetrateio/tetrate @ 1fe8ed76658d682b22bd1d2eaea07f6c3b5e5211
nacx pushed a commit to nacx/api that referenced this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.