Skip to content

Unable to change ulimit when running on K8S environment #6709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
22fortisetliber opened this issue Apr 18, 2025 · 0 comments · May be fixed by #6707
Open

Unable to change ulimit when running on K8S environment #6709

22fortisetliber opened this issue Apr 18, 2025 · 0 comments · May be fixed by #6707
Assignees
Labels
type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.

Comments

@22fortisetliber
Copy link

22fortisetliber commented Apr 18, 2025

Describe the bug
When running Cortex in a Kubernetes environment, the ulimit config for cortex process could not be changed.

To Reproduce

  1. On Node - Setting fs.file-max on /etc/sysctl.conf : 10485760
  2. On Node - Setting on /etc/security/limits.conf
* soft nofile 10485760 
* hard nofile 10485760
  1. Running Cortex version 1.17.0 with Helm
  2. Config ulimit for pod (I did try to config with initContainer / SecurityContext)

Inside Pod:

# ulimit -Ha
core file size (blocks)         (-c) unlimited
data seg size (kb)              (-d) unlimited
scheduling priority             (-e) 0
file size (blocks)              (-f) unlimited
pending signals                 (-i) 256726
max locked memory (kb)          (-l) unlimited
max memory size (kb)            (-m) unlimited
open files                      (-n) 1048576
POSIX message queues (bytes)    (-q) 819200
real-time priority              (-r) 0
stack size (kb)                 (-s) unlimited
cpu time (seconds)              (-t) unlimited
max user processes              (-u) unlimited
virtual memory (kb)             (-v) unlimited
file locks                      (-x) unlimited 
# ps | grep cortex
# cat /proc/1/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            65535                65535                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       256726               256726               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

When Cortex reaches approximately 65,000 open files, it crashes with errors indicating "too many open files" in the logs. Here the log of Store Gateway:

caller=bucket_stores.go:161 level=warn msg="failed to synchronize TSDB blocks" err="445 errors: failed to synchronize TSDB blocks for user user1: read dir: open /data/tsdb-sync/user1: too many open files; failed to synchronize TSDB blocks for user user2: read dir: open /data/tsdb-sync/user2: too many open files; failed to synchronize TSDB blocks for user user3: read dir: open /data/tsdb-sync/user3: too many open files .....

Expected behavior
The Cortex process running in Kubernetes pods should properly inherit and apply the increased file descriptor limits

Environment:

  • Infrastructure: Kubernetes (v1.29.5) with Containerd (v1.7.16)
  • Deployment tool: Helm
  • Server OS: Ubuntu 22.04.3 LTS

Additional Context

@22fortisetliber 22fortisetliber linked a pull request Apr 18, 2025 that will close this issue
3 tasks
@dosubot dosubot bot added the type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating. label Apr 18, 2025
@friedrichg friedrichg self-assigned this Apr 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants