Skip to content

Add flush job mode to ingester. #1885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gouthamve
Copy link
Contributor

This is my attempt to simplify things compared to #1747

/cc @jtlisi @codesome

Signed-off-by: Goutham Veeramachaneni [email protected]

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  1. I would argue that flush mode "job" should not even register in the ring (even though the tokens are set to 0). This might create it's own problem (I guess?).
  2. There is a long wait before we start flushing even when we know it's a flush job (right?)

The conclusion that I got from this is that the t.Stop() has to be initiated inside the cortex package, and it hard to do it from the ingester package itself. I will check how I can combine ideas from this PR and to include the above 2 points.

@@ -238,7 +241,20 @@ func (t *Cortex) initModule(cfg *Config, m moduleName) error {

// Run starts Cortex running, and blocks until a signal is received.
func (t *Cortex) Run() error {
return t.server.Run()
var err error
done := make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not waiting for this anywhere (I will create a follow-up PR to your branch, just a note to myself)

@gouthamve
Copy link
Contributor Author

This might create it's own problem

Can you think of cases? I see this flush-job as something you run after the cluster has stabilised and I don't see any issues with it. Even if the cluster is in flux, I don't see it causing an major issues.

There is a long wait before we start flushing even when we know it's a flush job

How / Where?

@codesome
Copy link
Contributor

codesome commented Dec 6, 2019

you run after the cluster has stabilised

If we assume this then should be fine. But it is still an assumption. I am afraid it might interfere with transfer otherwise? I prefer running this job without worrying about the state of the cluster, as that just adds another dependency for the job.

How / Where?

Apparently it does not after the TransferOut is disabled :)

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this out next week


// Initiate shutdown if its a job to flush data.
if t.cfg.Ingester.IsFlushJob {
_ = t.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that calling t.Stop() twice is not a problem. Currently, main.go would also call this and try stopping the modules again.

@codesome
Copy link
Contributor

I have rebased #1747 on top of this with a small fix. I think we can close this now.

This is my attempt to simplify things compared to cortexproject#1747

/cc @jtlisi @codesome

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve
Copy link
Contributor Author

Closing this as most of the changes have been incorporated in #1747

@gouthamve gouthamve closed this Jan 7, 2020
@gouthamve gouthamve deleted the flush-job-take2 branch January 7, 2020 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants