Skip to content

Faster rolling upgrades. #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 13, 2017
Merged

Faster rolling upgrades. #325

merged 5 commits into from
Mar 13, 2017

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Mar 9, 2017

Fixes #321

It works! And its quick...

TODO

  • move ingester lifecycle code into ingester, work out the coupling a bit better
  • move ingester shutdown code into ingester (from main.go)
  • fallback to flush it transfer fails.
  • naming
  • check ring logic for new ingester states
  • ensure ingester flushing stops before transferring chunks - shouldn't do this, as we don't want chunks that won't flush to block the transfer.
  • add an ever-extending timeout on the receiving ingester, when we do timeout auto-join - going to leave this for now, can always kill ingesters if they get stuck, will fall back to flushing.

@tomwilkie tomwilkie self-assigned this Mar 9, 2017
@tomwilkie tomwilkie force-pushed the 321-faster-rolling-upgrade branch from f06ac7b to 2c1a25c Compare March 9, 2017 14:24
cortex.proto Outdated
// Claim allows leaving ingester to stream chunks directly to joining ingesters.
rpc ClaimStart(ClaimStartRequest) returns (ClaimStartResponse) {};
rpc ClaimStream(stream TimeSeriesChunk) returns (ClaimStreamResponse) {};
rpc ClaimFinish(ClaimFinishRequest) returns (ClaimFinishResponse) {};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to Transfer{StartStreamFinish}?

Also, can the {Start,Finish} methods be avoided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return &cortex.ClaimFinishResponse{}, nil
}

func toWireChunks(descs []*desc) ([]cortex.Chunk, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to util/compat.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, uses the ingester/desc type which I want to keep internal.

@tomwilkie tomwilkie changed the title [WIP] Faster rolling upgrades. Faster rolling upgrades. Mar 10, 2017
@tomwilkie tomwilkie force-pushed the 321-faster-rolling-upgrade branch from 4f81de5 to 587f34a Compare March 10, 2017 14:16
Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Really cool. There are enough changes & comments that I won't LGTM, but I don't anticipate problems.

cortex.proto Outdated
@@ -13,6 +13,9 @@ service Ingester {
rpc LabelValues(LabelValuesRequest) returns (LabelValuesResponse) {};
rpc UserStats(UserStatsRequest) returns (UserStatsResponse) {};
rpc MetricsForLabelMatchers(MetricsForLabelMatchersRequest) returns (MetricsForLabelMatchersResponse) {};

// TransferChunks allows leaving ingester to stream chunks directly to joining ingesters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say which of the two is supposed to call this.

@@ -88,7 +80,7 @@ type Config struct {
IngestionBurstSize int

// for testing
ingesterClientFactory func(string) cortex.IngesterClient
ingesterClientFactory func(string, time.Duration) (cortex.IngesterClient, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update comment say what string and time.Duration represent

ringConfig ring.Config
userStatesConfig UserStatesConfig

// Config for the ingester lifecycle contol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"control"

ChunkEncoding string

// For testing
Addr string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addr of what?

userStatesConfig UserStatesConfig

// Config for the ingester lifecycle contol
ListenPort *int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ring/model.go Outdated
@@ -56,3 +58,39 @@ func (d *Desc) removeIngester(id string) {
}
d.Tokens = output
}

// ClaimTokens transfters all the tokens from one ingester to another.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"returning the claimed tokens"

ring/model.go Outdated
return result
}

// FindIngestersByState returns the list of ingesterd in the given state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingesters

ingester/util.go Outdated
func (ts sortableUint32) Less(i, j int) bool { return ts[i] < ts[j] }

// getFirstAddressOf returns the first IPv4 address of the supplied interface name.
func getFirstAddressOf(name string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the implementation that's in alertmanager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do different things - alertmangers gets the first non-empty address, this gets the address for a given interface. I'll file a ticket to reconcile them later.

ingester, err := New(config, nil)
if err != nil {
t.Fatal(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use testify, so we can spelling this require.NoError(err)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If you're really keen, you can also import assert from testify and use its Equal function to replace some of the DeepEqual business.

}
}

func TestIngesterRestart(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about restart are we testing? (Please add comment)

@tomwilkie tomwilkie force-pushed the 321-faster-rolling-upgrade branch from 8113f83 to b14c4a5 Compare March 13, 2017 11:58
@tomwilkie tomwilkie removed their assignment Mar 13, 2017

// We can't send "extra" fields with a streaming call, so we repeat
// wireSeries.FromIngesterId and assume it is the same every time
// tound this loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"round"

return i.ready
}

// ChangeState of the ingester, for use off the loop() goroutine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, "by" or "from" are probably better than the "off of the", but it's such an incredibly minor point.

@jml
Copy link
Contributor

jml commented Mar 13, 2017 via email

@tomwilkie tomwilkie merged commit bdbd0f5 into master Mar 13, 2017
@tomwilkie tomwilkie deleted the 321-faster-rolling-upgrade branch March 13, 2017 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants