Experimental TSDB: Enable Azure Storage Backend (#2083)

khaines · web-flow · commit ec2d25b92a35 · 2020-02-06T17:27:23.000+01:00
* Adding Microsoft Azure backend support for TSDB storage engine.

Signed-off-by: Ken Haines &lt;khaines@microsoft.com&gt;

* correcting minor typo caught by linter

Signed-off-by: Ken Haines &lt;khaines@microsoft.com&gt;

* updating the vendor's module for thanos to include azure file

Signed-off-by: Ken Haines &lt;khaines@microsoft.com&gt;

* a few more doc tweaks for consistency

Signed-off-by: Ken Haines &lt;khaines@microsoft.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,6 +13,7 @@
 * [ENHANCEMENT] Experimental TSDB: Export TSDB Syncer metrics from Compactor component, they are prefixed with `cortex_compactor_`. #2023
 * [ENHANCEMENT] Experimental TSDB: Added dedicated flag `-experimental.tsdb.bucket-store.tenant-sync-concurrency` to configure the maximum number of concurrent tenants for which blocks are synched. #2026
 * [ENHANCEMENT] Experimental TSDB: Expose metrics for objstore operations (prefixed with `cortex_<component>_thanos_objstore_`, component being one of `ingester`, `querier` and `compactor`). #2027
+* [ENHANCEMENT] Experiemental TSDB: Added support for Azure Storage to be used for block storage, in addition to S3 and GCS. #2083
 * [ENHANCEMENT] Cassanda Storage: added `max_retries`, `retry_min_backoff` and `retry_max_backoff` configuration options to enable retrying recoverable errors. #2054
 * [ENHANCEMENT] Allow to configure HTTP and gRPC server listen address, maximum number of simultaneous connections and connection keepalive settings.
   * `-server.http-listen-address`
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -33,15 +33,16 @@ The chunks storage stores each single time series into a separate object called
 For this reason, the chunks storage consists of:
 
 * An index for the Chunks. This index can be backed by:
-    * [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
-    * [Google Bigtable](https://cloud.google.com/bigtable)
-    * [Apache Cassandra](https://cassandra.apache.org)
+  * [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
+  * [Google Bigtable](https://cloud.google.com/bigtable)
+  * [Apache Cassandra](https://cassandra.apache.org)
 * An object store for the Chunk data itself, which can be:
-    * [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
-    * [Google Bigtable](https://cloud.google.com/bigtable)
-    * [Apache Cassandra](https://cassandra.apache.org)
-    * [Amazon S3](https://aws.amazon.com/s3)
-    * [Google Cloud Storage](https://cloud.google.com/storage/)
+  * [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
+  * [Google Bigtable](https://cloud.google.com/bigtable)
+  * [Apache Cassandra](https://cassandra.apache.org)
+  * [Amazon S3](https://aws.amazon.com/s3)
+  * [Google Cloud Storage](https://cloud.google.com/storage/)
+  * [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
 
 Internally, the access to the chunks storage relies on a unified interface called "chunks store". Unlike other Cortex components, the chunk store is not a separate service, but rather a library embedded in the services that need to access the long-term storage: [ingester](#ingester), [querier](#querier) and [ruler](#ruler).
 
@@ -59,6 +60,7 @@ The blocks storage doesn't require a dedicated storage backend for the index. Th
 
 * [Amazon S3](https://aws.amazon.com/s3)
 * [Google Cloud Storage](https://cloud.google.com/storage/)
+* [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
 
 For more information, please check out the [Blocks storage](operations/blocks-storage.md) documentation.
 
@@ -142,7 +144,7 @@ We recommend randomly load balancing write requests across distributor instances
 
 ### Ingester
 
-The **ingester** service is responsible for writing incoming series to a [long-term storage backend](#storage) on the write path and returning in-memory series samples for queries on the read path. 
+The **ingester** service is responsible for writing incoming series to a [long-term storage backend](#storage) on the write path and returning in-memory series samples for queries on the read path.
 
 Incoming series are not immediately written to the storage but kept in memory and periodically flushed to the storage (by default, 12 hours for the chunks storage and 2 hours for the experimental blocks storage). For this reason, the [queriers](#querier) may need to fetch samples both from ingesters and long-term storage while executing a query on the read path.
 
@@ -154,7 +156,7 @@ Ingesters contain a **lifecycler** which manages the lifecycle of an ingester an
 
 3. `ACTIVE` is an ingester's state when it is fully initialized. It may receive both write and read requests for tokens it owns.
 
-4. `LEAVING` is an ingester's state when it is shutting down. It cannot receive write requests anymore, while it could still receive read requests for series it has in memory. While in this state, the ingester may look for a `PENDING` ingester to start a hand-over process with, used to transfer the state from `LEAVING` ingester to the `PENDING` one, during a rolling update (`PENDING` ingester moves to `JOINING` state during hand-over process). If there is no new ingester to accept hand-over, ingester in `LEAVING` state will flush data to storage instead. 
+4. `LEAVING` is an ingester's state when it is shutting down. It cannot receive write requests anymore, while it could still receive read requests for series it has in memory. While in this state, the ingester may look for a `PENDING` ingester to start a hand-over process with, used to transfer the state from `LEAVING` ingester to the `PENDING` one, during a rolling update (`PENDING` ingester moves to `JOINING` state during hand-over process). If there is no new ingester to accept hand-over, ingester in `LEAVING` state will flush data to storage instead.
 
 5. `UNHEALTHY` is an ingester's state when it has failed to heartbeat to the ring's KV Store. While in this state, distributors skip the ingester while building the replication set for incoming series and the ingester does not receive write or read requests.
 
diff --git a/docs/operations/blocks-storage.md b/docs/operations/blocks-storage.md
@@ -11,6 +11,7 @@ The supported backends for the blocks storage are:
 
 * [Amazon S3](https://aws.amazon.com/s3)
 * [Google Cloud Storage](https://cloud.google.com/storage/)
+* [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
 
 _Internally, this storage engine is based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
 
@@ -28,7 +29,6 @@ When the blocks storage is used, each **ingester** creates a per-tenant TSDB and
 
 The in-memory samples are periodically flushed to disk - and the WAL truncated - when a new TSDB Block is cut, which by default occurs every 2 hours. Each new Block cut is then uploaded to the long-term storage and kept in the ingester for some more time, in order to give queriers enough time to discover the new Block from the storage and download its index header.
 
-
 In order to effectively use the **WAL** and being able to recover the in-memory series upon ingester abruptly termination, the WAL needs to be stored to a persistent local disk which can survive in the event of an ingester failure (ie. AWS EBS volume or GCP persistent disk when running in the cloud). For example, if you're running the Cortex cluster in Kubernetes, you may use a StatefulSet with a persistent volume claim for the ingesters.
 
 ### The read path
@@ -138,7 +138,7 @@ tsdb:
     # storage. 0 disables the limit.
     # CLI flag: -experimental.tsdb.bucket-store.max-sample-count
     [max_sample_count: <int> | default = 0]
-    
+
     # Max number of concurrent queries to execute against the long-term storage
     # on a per-tenant basis.
     # CLI flag: -experimental.tsdb.bucket-store.max-concurrent
@@ -189,6 +189,26 @@ tsdb:
     # Google SDK default logic.
     # CLI flag: -experimental.tsdb.gcs.service-account string
     [ service_account: <string>]
+
+  # Configures the Azure storage backend
+  # Required only when "azure" backend has been selected.
+  azure:
+    # Azure storage account name
+    # CLI flag: -experimental.tsdb.azure.account-name
+    account_name: <string>
+    # Azure storage account key
+    # CLI flag: -experimental.tsdb.azure.account-key
+    account_key: <string>
+    # Azure storage container name
+    # CLI flag: -experimental.tsdb.azure.container-name
+    container_name: <string>
+    # Azure storage endpoint suffix without schema.
+    # The account name will be prefixed to this value to create the FQDN
+    # CLI flag: -experimental.tsdb.azure.endpoint-suffix
+    endpoint_suffix: <string>
+    # Number of retries for recoverable errors
+    # CLI flag: -experimental.tsdb.azure.max-retries
+    [ max_retries: <int> | default=20 ]
 ```
 
 ### `compactor_config`
diff --git a/pkg/storage/tsdb/backend/azure/bucket_client.go b/pkg/storage/tsdb/backend/azure/bucket_client.go
@@ -0,0 +1,27 @@
+package azure
+
+import (
+	"github.com/go-kit/kit/log"
+	"github.com/thanos-io/thanos/pkg/objstore"
+	"github.com/thanos-io/thanos/pkg/objstore/azure"
+	yaml "gopkg.in/yaml.v2"
+)
+
+func NewBucketClient(cfg Config, name string, logger log.Logger) (objstore.Bucket, error) {
+	bucketConfig := azure.Config{
+		StorageAccountName: cfg.StorageAccountName,
+		StorageAccountKey:  cfg.StorageAccountKey,
+		ContainerName:      cfg.ContainerName,
+		Endpoint:           cfg.Endpoint,
+		MaxRetries:         cfg.MaxRetries,
+	}
+
+	// Thanos currently doesn't support passing the config as is, but expects a YAML,
+	// so we're going to serialize it.
+	serialized, err := yaml.Marshal(bucketConfig)
+	if err != nil {
+		return nil, err
+	}
+
+	return azure.NewBucket(logger, serialized, name)
+}
diff --git a/pkg/storage/tsdb/backend/azure/config.go b/pkg/storage/tsdb/backend/azure/config.go
@@ -0,0 +1,23 @@
+package azure
+
+import (
+	"flag"
+)
+
+// Config holds the config options for an Azure backend
+type Config struct {
+	StorageAccountName string `yaml:"account_name"`
+	StorageAccountKey  string `yaml:"account_key"`
+	ContainerName      string `yaml:"container_name"`
+	Endpoint           string `yaml:"endpoint_suffix"`
+	MaxRetries         int    `yaml:"max_retries"`
+}
+
+// RegisterFlags registers the flags for TSDB Azure storage
+func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
+	f.StringVar(&cfg.StorageAccountName, "experimental.tsdb.azure.account-name", "", "Azure storage account name")
+	f.StringVar(&cfg.StorageAccountKey, "experimental.tsdb.azure.account-key", "", "Azure storage account key")
+	f.StringVar(&cfg.ContainerName, "experimental.tsdb.azure.container-name", "", "Azure storage container name")
+	f.StringVar(&cfg.Endpoint, "experimental.tsdb.azure.endpoint-suffix", "", "Azure storage endpoint suffix without schema. The account name will be prefixed to this value to create the FQDN")
+	f.IntVar(&cfg.MaxRetries, "experimental.tsdb.azure.max-retries", 20, "Number of retries for recoverable errors")
+}
diff --git a/pkg/storage/tsdb/bucket_client.go b/pkg/storage/tsdb/bucket_client.go
@@ -3,6 +3,7 @@ package tsdb
 import (
 	"context"
 
+	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/azure"
 	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/gcs"
 	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/s3"
 	"github.com/go-kit/kit/log"
@@ -16,6 +17,8 @@ func NewBucketClient(ctx context.Context, cfg Config, name string, logger log.Lo
 		return s3.NewBucketClient(cfg.S3, name, logger)
 	case BackendGCS:
 		return gcs.NewBucketClient(ctx, cfg.GCS, name, logger)
+	case BackendAzure:
+		return azure.NewBucketClient(cfg.Azure, name, logger)
 	default:
 		return nil, errUnsupportedBackend
 	}
diff --git a/pkg/storage/tsdb/config.go b/pkg/storage/tsdb/config.go
@@ -8,6 +8,7 @@ import (
 	"time"
 
 	"github.com/alecthomas/units"
+	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/azure"
 	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/gcs"
 	"github.com/cortexproject/cortex/pkg/storage/tsdb/backend/s3"
 )
@@ -19,6 +20,9 @@ const (
 	// BackendGCS is the value for the GCS storage backend
 	BackendGCS = "gcs"
 
+	// BackendAzure is the value for the Azure storage backend
+	BackendAzure = "azure"
+
 	// TenantIDExternalLabel is the external label set when shipping blocks to the storage
 	TenantIDExternalLabel = "__org_id__"
 )
@@ -43,8 +47,9 @@ type Config struct {
 	MaxTSDBOpeningConcurrencyOnStartup int `yaml:"max_tsdb_opening_concurrency_on_startup"`
 
 	// Backends
-	S3  s3.Config  `yaml:"s3"`
-	GCS gcs.Config `yaml:"gcs"`
+	S3    s3.Config    `yaml:"s3"`
+	GCS   gcs.Config   `yaml:"gcs"`
+	Azure azure.Config `yaml:"azure"`
 }
 
 // DurationList is the block ranges for a tsdb
@@ -88,6 +93,7 @@ func (d *DurationList) ToMilliseconds() []int64 {
 func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
 	cfg.S3.RegisterFlags(f)
 	cfg.GCS.RegisterFlags(f)
+	cfg.Azure.RegisterFlags(f)
 	cfg.BucketStore.RegisterFlags(f)
 
 	if len(cfg.BlockRanges) == 0 {
@@ -105,7 +111,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
 
 // Validate the config
 func (cfg *Config) Validate() error {
-	if cfg.Backend != BackendS3 && cfg.Backend != BackendGCS {
+	if cfg.Backend != BackendS3 && cfg.Backend != BackendGCS && cfg.Backend != BackendAzure {
 		return errUnsupportedBackend
 	}
 
diff --git a/vendor/github.com/thanos-io/thanos/pkg/objstore/azure/azure.go b/vendor/github.com/thanos-io/thanos/pkg/objstore/azure/azure.go
diff --git a/vendor/github.com/thanos-io/thanos/pkg/objstore/azure/helpers.go b/vendor/github.com/thanos-io/thanos/pkg/objstore/azure/helpers.go
diff --git a/vendor/modules.txt b/vendor/modules.txt