Skip to content

inotify file descriptor issue on large instances causes redpanda galaxy module to crash #211

@WesWWagner

Description

@WesWWagner

When building a 15 node im4gn cluster with TLS and prometheus monitoring enabled, I have an issue where Redpanda fails to start due to the following message:

ubuntu@ip-172-31-16-44:~$ journalctl -f -u redpanda | grep -i error
Jan 23 01:13:08 ip-172-31-16-44 rpk[12253]: ERROR 2024-01-23 01:13:08,030 [shard 0] main - application.cc:388 - Failure during startup: std::__1::system_error (error system:24, could not create inotify instance: Too many open files)

ubuntu@ip-172-31-16-44:~$ ulimit -n
1024

I have not yet looked into the code for the galaxy component but something is not configuring enough inode and linux security widgets before spooling up redpanda for the first time on large instances (which will start more threads because of more cores, etc)

I tested this on 23.3.3 and 23.2.10 and received the same behavior so it is not a recent regression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions