Update roadmap for 2021 (#4036)

bboreham · pracucci · web-flow · commit d50ce221ccb6 · 2021-04-16T16:00:17.000+02:00
* Update roadmap for 2021

Implemented since last update:
 - Per-tenant retention,
 - Soft Multitenancy,
 - Prometheus metadata support,
 - Bulk loading Thanos data,
 - Alertmanager sharding

I also made some changes for better readability.

Signed-off-by: Bryan Boreham &lt;bjboreham@gmail.com&gt;

* Remove bulk-loading from roadmap

It can be done using thanosconvert tool

Signed-off-by: Bryan Boreham &lt;bjboreham@gmail.com&gt;

* Fixed whitespace noise

Signed-off-by: Marco Pracucci &lt;marco@pracucci.com&gt;

Co-authored-by: Marco Pracucci &lt;marco@pracucci.com&gt;
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -5,7 +5,9 @@ weight: 10
 slug: roadmap
 ---
 
-The following is only a selection of some of the major features we plan to implement in the near future. To get a more complete overview of planned features and current work, see the issue trackers for the various repositories, for example, the [Cortex repo](https://github.com/cortexproject/cortex/issues). Note that these are not ordered by priority.
+This document highlights some ideas for major features we'd like to implement in the near future.
+To get a more complete overview of planned features and current work, see the [issue tracker](https://github.com/cortexproject/cortex/issues).
+Note that these are not ordered by priority.
 
 ## Helm charts and other packaging
 
@@ -27,22 +29,19 @@ tenants:
 
 We have all the metrics to track how many series, samples and queries each tenant is sending but don't have dashboards that help with this. We plan to have dashboards and UIs that will help operators monitor and control each tenants usage out of the box.
 
-## Downsampling and Per tenant/metric retention
+## Downsampling
+Downsampling means storing fewer samples, e.g. one per minute instead of one every 15 seconds.
+This makes queries over long periods more efficient. It can reduce storage space slightly if the full-detail data is discarded.
 
-Currently, we only support a single retention period for all metrics and tenants. For most operators, the ability to set per tenant retention and also custom retention for subsets of metrics is important. We will add support per tenant and metric retention policies. Also, we currently store all the samples we ingested, and there is no way to reduce the resolution for the metrics. We plan to add downsampling to allow users to store less data when needed.
+## Per-metric retention
 
-## Soft Multitenancy
+Cortex blocks storage supports deleting all data for a tenant after a time period (e.g. 3 months, 1 year), but we would also like to have custom retention for subsets of metrics (e.g. delete server metrics but retain business metrics).
 
-Currently our multitenancy allows a tenant to view _all_ their metrics and only their metrics. There is no way for an "admin" tenant to view all the metrics in the system but for particular teams to only view theirs. This is another feature we plan to add into Cortex.
-
-## Exemplar and Prometheus metadata support
-
-There is currently an ongoing effort in Prometheus to add [exemplar support](https://docs.google.com/document/d/1ymZlc9yuTj8GvZyKz1r3KDRrhaOjZ1W1qZVW_5Gj7gA/edit) and we should be an active stakeholder in the discussion. The plan is to propagate the exemplars through remote write and make them available for querying in Cortex. We currently have experimental metadata support for Prometheus but this is using the Grafana Cloud Agent. We should help move this [PR forward](https://github.com/prometheus/prometheus/pull/6815) and also add persistence of the metadata (right now it's only in-mem).
-
-## Bulk loading historical data
-
-This is another highly requested features. There is currently no way to backfill the existing data in local Prometheus to Cortex. The plan is to add an API for users to ship the TSDB blocks to Cortex and a side-car / command to do this.
+## Exemplar support
+[Exemplars](https://docs.google.com/document/d/1ymZlc9yuTj8GvZyKz1r3KDRrhaOjZ1W1qZVW_5Gj7gA/edit)
+let you link metric samples to other data, such as distributed tracing.
+As of early 2021 Prometheus will collect exemplars and send them via remote write, but Cortex needs to be extended to handle them.
 
 ## Scalability
 
-Scalability has always been a focus for the project, but there is a lot more work to be done. We can now scale to 100s of Millions of active series but 1 Billion active series is still an unknown. We also need to make the Alertmanager horizontally scalable with the number of users.
+Scalability has always been a focus for the project, but there is a lot more work to be done. We can now scale to 100s of Millions of active series but 1 Billion active series is still an unknown.