Ubuntu user hits thread number limit preventing SSH login

Ubuntu user hits thread number limit preventing SSH login
Page content

Recently I was investigating quite an interesting issue - there is Ubuntu based VM our testers run some tests on. It was reported they’re unable to log into the virtual machine.

After a brief investigation it became clear the issue is not network or SSH key related.

These are records from auth log:

/var/log/auth.log

Feb 28 20:21:39 test-instance sshd[21954]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Feb 28 20:21:39 test-instance systemd-logind[756]: New session 75 of user ubuntu.
Feb 28 20:21:39 test-instance sshd[21954]: fatal: fork of unprivileged child failed
Feb 28 20:21:39 test-instance systemd-logind[756]: Removed session 75.

Quite an obscure error message, but smells like cgroup problem. Indeed:

journalctl -xe --no-pager | grep cgroup

Feb 28 20:21:39 test-instance kernel: cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-75.scope

Number of processes in the system was not this high, so, naturally the next guy to blame is the number of threads.

Pids of the most thread-heavy processes can be found using the following oneliner

for prc in $(ps -A -o pid); do grep -s Threads /proc/${prc}/status | awk -v prc="${prc}" '{print prc, $2}'; done | sort -n -r -k 2 | head
10925 5156
10971 5138
11193 506
764 11
831 7
802 4
854 3
821 3
853 2
19109 2

First column here is a pid, second - number of threads.

Next it’s needed to find out what limit is being hit. Honestly it was quite a discovery for me that pids.max cgroup limit controls number of threads as well.

Limit set in the following file

cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
10813

Current usage can be found here:

cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
10809

user-1000 here is ubuntu, confirmed by id ubuntu.

As you can see the limit is almost exhausted. Once the limit was increased

echo '32768' > /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max

it became possible to log in as user ‘ubuntu’. Tester folks were able to identify the reason of an excessive thread spawning and the issue should not reoccur.