You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running Cortex in a Kubernetes environment, the ulimit config for cortex process could not be changed.
To Reproduce
On Node - Setting fs.file-max on /etc/sysctl.conf : 10485760
On Node - Setting on /etc/security/limits.conf
* soft nofile 10485760
* hard nofile 10485760
Running Cortex version 1.17.0 with Helm
Config ulimit for pod (I did try to config with initContainer / SecurityContext)
Inside Pod:
# ulimit -Ha
core file size (blocks) (-c) unlimited
data seg size (kb) (-d) unlimited
scheduling priority (-e) 0
file size (blocks) (-f) unlimited
pending signals (-i) 256726
max locked memory (kb) (-l) unlimited
max memory size (kb) (-m) unlimited
open files (-n) 1048576
POSIX message queues (bytes) (-q) 819200
real-time priority (-r) 0
stack size (kb) (-s) unlimited
cpu time (seconds) (-t) unlimited
max user processes (-u) unlimited
virtual memory (kb) (-v) unlimited
file locks (-x) unlimited
# ps | grep cortex
# cat /proc/1/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes unlimited unlimited processes
Max open files 65535 65535 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 256726 256726 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
When Cortex reaches approximately 65,000 open files, it crashes with errors indicating "too many open files" in the logs. Here the log of Store Gateway:
caller=bucket_stores.go:161 level=warn msg="failed to synchronize TSDB blocks" err="445 errors: failed to synchronize TSDB blocks for user user1: read dir: open /data/tsdb-sync/user1: too many open files; failed to synchronize TSDB blocks for user user2: read dir: open /data/tsdb-sync/user2: too many open files; failed to synchronize TSDB blocks for user user3: read dir: open /data/tsdb-sync/user3: too many open files .....
Expected behavior
The Cortex process running in Kubernetes pods should properly inherit and apply the increased file descriptor limits
Environment:
Infrastructure: Kubernetes (v1.29.5) with Containerd (v1.7.16)
Deployment tool: Helm
Server OS: Ubuntu 22.04.3 LTS
Additional Context
The text was updated successfully, but these errors were encountered:
Describe the bug
When running Cortex in a Kubernetes environment, the ulimit config for cortex process could not be changed.
To Reproduce
fs.file-max
on/etc/sysctl.conf
: 10485760/etc/security/limits.conf
Inside Pod:
When Cortex reaches approximately 65,000 open files, it crashes with errors indicating "too many open files" in the logs. Here the log of Store Gateway:
Expected behavior
The Cortex process running in Kubernetes pods should properly inherit and apply the increased file descriptor limits
Environment:
Additional Context
The text was updated successfully, but these errors were encountered: