Skip to content

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Jun 11, 2020

What this PR does: This is a proposal for adding some options for making switch of ingesters and querier between chunks and blocks possible.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I left few nits.

pstibrany and others added 2 commits June 16, 2020 10:58
Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds super risky tbh, but I can't come up with better ideas.

- Ingesters using WAL don’t flush in-memory chunks to storage on shutdown.
- Rollout should be as automated as possible.

How do we handle ingesters with WAL? There are several possibilities, but the simplest option seems to be adding a new flag to ingesters to flush chunks on shutdown. This is trivial change to ingester, and allows us to do automated migration by:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something simpler, just don't send any data for 30mins (idle-timeout) before shutting down :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, both the ingester deployments will share the same ring? That makes sense. Can we call out that we will be using the same statefulset and not going to create a new one for blocks?

We also have a shutdown endpoint on the chunks ingester which just flushes and shuts down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have a shutdown endpoint on the chunks ingester which just flushes and shuts down.

Yes we do, but I don't quite see how to use it for automation. On calling /shutdown, ingester flushes everything and stops. Kubernetes will just restart it. But we need restart with new configuration.

Copy link
Contributor

@gouthamve gouthamve Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it idles out, and doesn't stop iirc. @codesome ? But yes, I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gouthamve you are right. Kubernetes won't restart it. /shutdown endpoint will flush all the chunks and remove itself from the ring and stay idle.

@gouthamve
Copy link
Contributor

So somethings that are missing here but would provide more context would be:

  1. Mention that we are only tackling WAL (or statefulsets). Maybe a section on how it would work with deployments?
  2. Mention that we share the same ring and queriers will only see a single set of ingesters (chunks + blocks) at times.
  3. Mention that the queriers will be configured to read from blocks on object store + chunks on NoSQL store at the same time.
  4. Mention how the potential schema config will look like, specially in the case of a rollback. Currently we support splitting it by the UTC day boundary, but how will that work because the rollout won't align to that. I see that for a particular time-range we need to query both.

These are some concerns I see and we can iterate on the proposal again once these are addressed?

@codesome
Copy link
Contributor

Maybe a section on how it would work with deployments?

I think it will look similar to migration from deployments to WAL.

@pstibrany
Copy link
Contributor Author

I think it will look similar to migration from deployments to WAL.

I think it's actually bit easier, since we cannot transfer data from chunks ingesters to blocks ingesters, so they don't need to be scaled up and down in lock-step. But chunks ingesters must be reconfigured to avoid transfers.

Signed-off-by: Peter Štibraný <[email protected]>
@pstibrany
Copy link
Contributor Author

  1. Mention how the potential schema config will look like, specially in the case of a rollback. Currently we support splitting it by the UTC day boundary, but how will that work because the rollout won't align to that. I see that for a particular time-range we need to query both.

Schemas are low-level thing for configuring chunks store. We will not use schema for anything in this proposal.

Querier needs single timestamp to decide whether to query chunks store or not. Blocks store can always be queried, because querier knows whether to hit any block based on block metadata.

See my PR #2747 which implements this already.

I've updated "Querying" section of the document to better explain this.

Signed-off-by: Peter Štibraný <[email protected]>
@pstibrany
Copy link
Contributor Author

@gouthamve I've updated the document with your feedback. Please take a look again when you have time.

@pracucci
Copy link
Contributor

Let's merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants