Skip to content

Commit 127ac66

Browse files
committed
Introduced firewall in the Alertmanager to block specific addresses in receiver integrations
Signed-off-by: Marco Pracucci <[email protected]>
1 parent d3068f9 commit 127ac66

File tree

13 files changed

+650
-15
lines changed

13 files changed

+650
-15
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
* [ENHANCEMENT] Ruler: Added `-ruler.enabled-tenants` and `-ruler.disabled-tenants` to explicitly enable or disable rules processing for specific tenants. #4074
5454
* [ENHANCEMENT] Block Storage Ingester: `/flush` now accepts two new parameters: `tenant` to specify tenant to flush and `wait=true` to make call synchronous. Multiple tenants can be specified by repeating `tenant` parameter. If no `tenant` is specified, all tenants are flushed, as before. #4073
5555
* [ENHANCEMENT] Alertmanager: validate configured `-alertmanager.web.external-url` and fail if ends with `/`. #4081
56+
* [ENHANCEMENT] Alertmanager: added `-alertmanager.receivers-firewall.block.cidrs` and `-alertmanager.receivers-firewall.block.private-addresses` to block specific network addresses in HTTP-based Alertmanager receiver integrations. #4085
5657
* [ENHANCEMENT] Allow configuration of Cassandra's host selection policy. #4069
5758
* [ENHANCEMENT] Store-gateway: retry synching blocks if a per-tenant sync fails. #3975 #4088
5859
* [ENHANCEMENT] Add metric `cortex_tcp_connections` exposing the current number of accepted TCP connections. #4099

docs/blocks-storage/production-tips.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,14 @@ You can see that the initial migration is done by looking for the following mess
105105
The rule of thumb to ensure memcached is properly scaled is to make sure evictions happen infrequently. When that's not the case and they affect query performances, the suggestion is to scale out the memcached cluster adding more nodes or increasing the memory limit of existing ones.
106106

107107
We also recommend to run a different memcached cluster for each cache type (metadata, index, chunks). It's not required, but suggested to not worry about the effect of memory pressure on a cache type against others.
108+
109+
## Alertmanager
110+
111+
### Ensure Alertmanager networking is hardened
112+
113+
If the Alertmanager API is enabled and exposed to Cortex tenants, they can autonomously configure the Alertmanager, including receiver integrations (eg. webhook) that allow to issue network requests to the configured endpoint. If the Alertmanager network is not hardened, Cortex tenants may have the ability to issue network requests to any network endpoint including services running in the local network.
114+
115+
Given hardening the Alertmanager is out of the scope of Cortex, we provide a basic built-in firewall to block connections creates by Alertmanager receiver integrations:
116+
117+
- `-alertmanager.receivers-firewall.block.cidrs`
118+
- `-alertmanager.receivers-firewall.block.private-addresses`

docs/configuration/config-file-reference.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1849,6 +1849,18 @@ The `alertmanager_config` configures the Cortex alertmanager.
18491849
# CLI flag: -alertmanager.max-recv-msg-size
18501850
[max_recv_msg_size: <int> | default = 16777216]
18511851
1852+
receivers_firewall:
1853+
block:
1854+
# Comma-separated list of network CIDRs to block in Alertmanager receiver
1855+
# integrations.
1856+
# CLI flag: -alertmanager.receivers-firewall.block.cidrs
1857+
[cidrs: <string> | default = ""]
1858+
1859+
# True to block private and local addresses in Alertmanager receiver
1860+
# integrations.
1861+
# CLI flag: -alertmanager.receivers-firewall.block.private-addresses
1862+
[private_addresses: <boolean> | default = false]
1863+
18521864
# Shard tenants across multiple alertmanager instances.
18531865
# CLI flag: -alertmanager.sharding-enabled
18541866
[sharding_enabled: <boolean> | default = false]

docs/configuration/v1-guarantees.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,10 @@ Currently experimental features are:
4141
- Azure blob storage.
4242
- Zone awareness based replication.
4343
- Ruler API (to PUT rules).
44-
- Alertmanager API
44+
- Alertmanager:
45+
- API (enabled via `-experimental.alertmanager.enable-api`)
46+
- Sharding of tenants across multiple instances (enabled via `-alertmanager.sharding-enabled`)
47+
- Receiver integrations firewall (configured via `-alertmanager.receivers-firewal.*`)
4548
- Memcached client DNS-based service discovery.
4649
- Delete series APIs.
4750
- In-memory (FIFO) and Redis cache.
@@ -61,7 +64,6 @@ Currently experimental features are:
6164
- The bucket index support in the querier and store-gateway (enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true`) is experimental
6265
- The block deletion marks migration support in the compactor (`-compactor.block-deletion-marks-migration-enabled`) is temporarily and will be removed in future versions
6366
- Querier: tenant federation
64-
- Alertmanager: Sharding of tenants across multiple instances
6567
- The thanosconvert tool for converting Thanos block metadata to Cortex
6668
- HA Tracker: cleanup of old replicas from KV Store.
6769
- Flags for configuring whether blocks-ingester streams samples or chunks are temporary, and will be removed when feature is tested:

pkg/alertmanager/alertmanager.go

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,13 @@ const (
5959

6060
// Config configures an Alertmanager.
6161
type Config struct {
62-
UserID string
63-
Logger log.Logger
64-
Peer *cluster.Peer
65-
PeerTimeout time.Duration
66-
Retention time.Duration
67-
ExternalURL *url.URL
62+
UserID string
63+
Logger log.Logger
64+
Peer *cluster.Peer
65+
PeerTimeout time.Duration
66+
Retention time.Duration
67+
ExternalURL *url.URL
68+
ReceiversFirewall FirewallConfig
6869

6970
// Tenant-specific local directory where AM can store its state (notifications, silences, templates). When AM is stopped, entire dir is removed.
7071
TenantDataDir string
@@ -279,6 +280,8 @@ func clusterWait(position func() int, timeout time.Duration) func() time.Duratio
279280

280281
// ApplyConfig applies a new configuration to an Alertmanager.
281282
func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config, rawCfg string) error {
283+
conf = injectFirewallToAlertmanagerConfig(conf, am.cfg.ReceiversFirewall)
284+
282285
templateFiles := make([]string, len(conf.Templates))
283286
if len(conf.Templates) > 0 {
284287
for i, t := range conf.Templates {

pkg/alertmanager/firewall.go

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
package alertmanager
2+
3+
import (
4+
"flag"
5+
"fmt"
6+
7+
"github.com/prometheus/alertmanager/config"
8+
commoncfg "github.com/prometheus/common/config"
9+
10+
"github.com/cortexproject/cortex/pkg/util/flagext"
11+
netutil "github.com/cortexproject/cortex/pkg/util/net"
12+
)
13+
14+
type FirewallConfig struct {
15+
Block FirewallHostsSpec `yaml:"block"`
16+
}
17+
18+
func (cfg *FirewallConfig) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
19+
cfg.Block.RegisterFlagsWithPrefix(prefix+".block", "block", f)
20+
}
21+
22+
type FirewallHostsSpec struct {
23+
CIDRs flagext.CIDRSliceCSV `yaml:"cidrs"`
24+
Private bool `yaml:"private_addresses"`
25+
}
26+
27+
func (cfg *FirewallHostsSpec) RegisterFlagsWithPrefix(prefix, action string, f *flag.FlagSet) {
28+
f.Var(&cfg.CIDRs, prefix+".cidrs", fmt.Sprintf("Comma-separated list of network CIDRs to %s in Alertmanager receiver integrations.", action))
29+
f.BoolVar(&cfg.Private, prefix+".private-addresses", false, fmt.Sprintf("True to %s private and local addresses in Alertmanager receiver integrations.", action))
30+
}
31+
32+
func injectFirewallToAlertmanagerConfig(conf *config.Config, firewallCfg FirewallConfig) *config.Config {
33+
firewall := netutil.NewFirewallDialer(netutil.FirewallDialerConfig{
34+
BlockCIDRs: firewallCfg.Block.CIDRs,
35+
BlockPrivate: firewallCfg.Block.Private,
36+
})
37+
38+
conf.Global.HTTPConfig = injectFirewallToHTTPConfig(conf.Global.HTTPConfig, firewall)
39+
40+
for _, receiver := range conf.Receivers {
41+
for _, rc := range receiver.WebhookConfigs {
42+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
43+
}
44+
for _, rc := range receiver.SlackConfigs {
45+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
46+
}
47+
for _, rc := range receiver.PushoverConfigs {
48+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
49+
}
50+
for _, rc := range receiver.PagerdutyConfigs {
51+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
52+
}
53+
for _, rc := range receiver.OpsGenieConfigs {
54+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
55+
}
56+
for _, rc := range receiver.WechatConfigs {
57+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
58+
}
59+
for _, rc := range receiver.VictorOpsConfigs {
60+
rc.HTTPConfig = injectFirewallToHTTPConfig(rc.HTTPConfig, firewall)
61+
}
62+
}
63+
64+
return conf
65+
}
66+
67+
func injectFirewallToHTTPConfig(conf *commoncfg.HTTPClientConfig, firewall *netutil.FirewallDialer) *commoncfg.HTTPClientConfig {
68+
if conf == nil {
69+
return &commoncfg.HTTPClientConfig{
70+
DialContext: firewall.DialContext,
71+
}
72+
}
73+
74+
conf.DialContext = firewall.DialContext
75+
return conf
76+
}

pkg/alertmanager/multitenant.go

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -102,11 +102,12 @@ func init() {
102102

103103
// MultitenantAlertmanagerConfig is the configuration for a multitenant Alertmanager.
104104
type MultitenantAlertmanagerConfig struct {
105-
DataDir string `yaml:"data_dir"`
106-
Retention time.Duration `yaml:"retention"`
107-
ExternalURL flagext.URLValue `yaml:"external_url"`
108-
PollInterval time.Duration `yaml:"poll_interval"`
109-
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`
105+
DataDir string `yaml:"data_dir"`
106+
Retention time.Duration `yaml:"retention"`
107+
ExternalURL flagext.URLValue `yaml:"external_url"`
108+
PollInterval time.Duration `yaml:"poll_interval"`
109+
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`
110+
ReceiversFirewall FirewallConfig `yaml:"receivers_firewall"`
110111

111112
// Enable sharding for the Alertmanager
112113
ShardingEnabled bool `yaml:"sharding_enabled"`
@@ -158,9 +159,8 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet) {
158159
f.BoolVar(&cfg.ShardingEnabled, "alertmanager.sharding-enabled", false, "Shard tenants across multiple alertmanager instances.")
159160

160161
cfg.AlertmanagerClient.RegisterFlagsWithPrefix("alertmanager.alertmanager-client", f)
161-
162162
cfg.Persister.RegisterFlagsWithPrefix("alertmanager", f)
163-
163+
cfg.ReceiversFirewall.RegisterFlagsWithPrefix("alertmanager.receivers-firewall", f)
164164
cfg.ShardingRing.RegisterFlags(f)
165165
cfg.Store.RegisterFlags(f)
166166
cfg.Cluster.RegisterFlags(f)
@@ -873,6 +873,7 @@ func (am *MultitenantAlertmanager) newAlertmanager(userID string, amConfig *amco
873873
ReplicationFactor: am.cfg.ShardingRing.ReplicationFactor,
874874
Store: am.store,
875875
PersisterConfig: am.cfg.Persister,
876+
ReceiversFirewall: am.cfg.ReceiversFirewall,
876877
}, reg)
877878
if err != nil {
878879
return nil, fmt.Errorf("unable to start Alertmanager for user %v: %v", userID, err)

pkg/alertmanager/multitenant_test.go

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ import (
2424
"github.com/prometheus/alertmanager/types"
2525
"github.com/prometheus/client_golang/prometheus"
2626
"github.com/prometheus/client_golang/prometheus/testutil"
27+
"github.com/prometheus/common/model"
2728
"github.com/stretchr/testify/assert"
2829
"github.com/stretchr/testify/require"
2930
"github.com/thanos-io/thanos/pkg/objstore"
3031
"github.com/weaveworks/common/httpgrpc"
3132
"github.com/weaveworks/common/user"
33+
"go.uber.org/atomic"
3234
"google.golang.org/grpc"
3335

3436
"github.com/cortexproject/cortex/pkg/alertmanager/alertmanagerpb"
@@ -277,6 +279,175 @@ templates:
277279
`), "cortex_alertmanager_config_last_reload_successful"))
278280
}
279281

282+
func TestMultitenantAlertmanager_FirewallShouldBlockHTTPBasedReceiversWhenEnabled(t *testing.T) {
283+
tests := map[string]struct {
284+
getAlertmanagerConfig func(backendURL string) string
285+
}{
286+
"webhook": {
287+
getAlertmanagerConfig: func(backendURL string) string {
288+
return fmt.Sprintf(`
289+
route:
290+
receiver: webhook
291+
group_wait: 0s
292+
293+
receivers:
294+
- name: webhook
295+
webhook_configs:
296+
- url: %s
297+
`, backendURL)
298+
},
299+
},
300+
"pagerduty": {
301+
getAlertmanagerConfig: func(backendURL string) string {
302+
return fmt.Sprintf(`
303+
route:
304+
receiver: pagerduty
305+
group_wait: 0s
306+
307+
receivers:
308+
- name: pagerduty
309+
pagerduty_configs:
310+
- url: %s
311+
routing_key: secret
312+
`, backendURL)
313+
},
314+
},
315+
"slack": {
316+
getAlertmanagerConfig: func(backendURL string) string {
317+
return fmt.Sprintf(`
318+
route:
319+
receiver: slack
320+
group_wait: 0s
321+
322+
receivers:
323+
- name: slack
324+
slack_configs:
325+
- api_url: %s
326+
channel: test
327+
`, backendURL)
328+
},
329+
},
330+
"opsgenie": {
331+
getAlertmanagerConfig: func(backendURL string) string {
332+
return fmt.Sprintf(`
333+
route:
334+
receiver: opsgenie
335+
group_wait: 0s
336+
337+
receivers:
338+
- name: opsgenie
339+
opsgenie_configs:
340+
- api_url: %s
341+
api_key: secret
342+
`, backendURL)
343+
},
344+
},
345+
"wechat": {
346+
getAlertmanagerConfig: func(backendURL string) string {
347+
return fmt.Sprintf(`
348+
route:
349+
receiver: wechat
350+
group_wait: 0s
351+
352+
receivers:
353+
- name: wechat
354+
wechat_configs:
355+
- api_url: %s
356+
api_secret: secret
357+
corp_id: babycorp
358+
`, backendURL)
359+
},
360+
},
361+
}
362+
363+
for receiverName, testData := range tests {
364+
for _, firewallEnabled := range []bool{true, false} {
365+
t.Run(fmt.Sprintf("%s firewall: %v", receiverName, firewallEnabled), func(t *testing.T) {
366+
ctx := context.Background()
367+
userID := "user-1"
368+
serverInvoked := atomic.NewBool(false)
369+
370+
// Create a local HTTP server to test whether the request is received.
371+
server := httptest.NewServer(http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) {
372+
serverInvoked.Store(true)
373+
writer.WriteHeader(http.StatusOK)
374+
}))
375+
defer server.Close()
376+
377+
// Create the alertmanager config.
378+
alertmanagerCfg := testData.getAlertmanagerConfig(fmt.Sprintf("http://%s", server.Listener.Addr().String()))
379+
380+
// Store the alertmanager config in the bucket.
381+
store := prepareInMemoryAlertStore()
382+
require.NoError(t, store.SetAlertConfig(ctx, alertspb.AlertConfigDesc{
383+
User: userID,
384+
RawConfig: alertmanagerCfg,
385+
}))
386+
387+
// Prepare the alertmanager config.
388+
cfg := mockAlertmanagerConfig(t)
389+
cfg.ReceiversFirewall.Block.Private = firewallEnabled
390+
391+
// Start the alertmanager.
392+
reg := prometheus.NewPedanticRegistry()
393+
am, err := createMultitenantAlertmanager(cfg, nil, nil, store, nil, log.NewNopLogger(), reg)
394+
require.NoError(t, err)
395+
require.NoError(t, services.StartAndAwaitRunning(ctx, am))
396+
t.Cleanup(func() {
397+
require.NoError(t, services.StopAndAwaitTerminated(ctx, am))
398+
})
399+
400+
// Ensure the configs are synced correctly.
401+
assert.NoError(t, testutil.GatherAndCompare(reg, bytes.NewBufferString(`
402+
# HELP cortex_alertmanager_config_last_reload_successful Boolean set to 1 whenever the last configuration reload attempt was successful.
403+
# TYPE cortex_alertmanager_config_last_reload_successful gauge
404+
cortex_alertmanager_config_last_reload_successful{user="user-1"} 1
405+
`), "cortex_alertmanager_config_last_reload_successful"))
406+
407+
// Create an alert to push.
408+
alerts := types.Alerts(&types.Alert{
409+
Alert: model.Alert{
410+
Labels: map[model.LabelName]model.LabelValue{model.AlertNameLabel: "test"},
411+
StartsAt: time.Now().Add(-time.Minute),
412+
EndsAt: time.Now().Add(time.Minute),
413+
},
414+
UpdatedAt: time.Now(),
415+
Timeout: false,
416+
})
417+
418+
alertsPayload, err := json.Marshal(alerts)
419+
require.NoError(t, err)
420+
421+
// Push an alert.
422+
req := httptest.NewRequest(http.MethodPost, cfg.ExternalURL.String()+"/api/v1/alerts", bytes.NewReader(alertsPayload))
423+
req.Header.Set("content-type", "application/json")
424+
reqCtx := user.InjectOrgID(req.Context(), userID)
425+
{
426+
w := httptest.NewRecorder()
427+
am.ServeHTTP(w, req.WithContext(reqCtx))
428+
429+
resp := w.Result()
430+
_, err := ioutil.ReadAll(resp.Body)
431+
require.NoError(t, err)
432+
assert.Equal(t, http.StatusOK, w.Code)
433+
}
434+
435+
// Ensure the server endpoint has not been called if firewall is enabled. Since the alert is delivered
436+
// asynchronously, we should pool it for a short period.
437+
deadline := time.Now().Add(time.Second)
438+
for {
439+
if time.Now().After(deadline) || serverInvoked.Load() {
440+
break
441+
}
442+
time.Sleep(100 * time.Millisecond)
443+
}
444+
445+
assert.Equal(t, !firewallEnabled, serverInvoked.Load())
446+
})
447+
}
448+
}
449+
}
450+
280451
func TestMultitenantAlertmanager_migrateStateFilesToPerTenantDirectories(t *testing.T) {
281452
ctx := context.Background()
282453

0 commit comments

Comments
 (0)