Start a new topic

No threads available in pool

Hi,

 

We are having an issue where a client is downloading several files in sequence and for some reason the worker threads are not completing. After a while the threads are all assigned and the server stops being able to handle new connections, at least that is what seems to be happening from looking in the log:



SERVER|No threads left! Busy:800 Available:0 Max:800 ... SERVER|java.io.IOException: No threads available in pool. ... SERVER|Server Memory Stats: Max=1.77 GB, Free=1.15 GB No more workers:800 #######################Start Thread Dump####################### ... abcd:(410077)-195.245.181.52 (idle_time) java.lang.Thread.sleep:-2 crushftp.handlers.IdlerKiller.run:49 com.crushftp.client.Worker.run:36 java.lang.Thread.run:-1 --------------------------------------------------------------------------------------------------- abcd:(410077)-195.245.181.52 (idle_time) java.lang.Thread.sleep:-2 crushftp.handlers.IdlerKiller.run:49 com.crushftp.client.Worker.run:36 java.lang.Thread.run:-1 --------------------------------------------------------------------------------------------------- abcd:(410077)-195.245.181.52 (idle_time) java.lang.Thread.sleep:-2 crushftp.handlers.IdlerKiller.run:49 com.crushftp.client.Worker.run:36 java.lang.Thread.run:-1 ...

 


We have disabled the user account and the situation seems to have normalized but it is worrisome that a single user can break the server.

In the meantime I've tried replicating the situation by cloning the user and the thread usage count remain stable. Is there something that we can do about this?

 

Below is a sample of the file download process being done by this user:

USER|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *CONNECT SSH-2.0-J2SSH_Maverick_1.4.47__Oracle_America,_Inc.
USER|01/22/2018 07:45:59 PM| CipherCS:aes128-cbc CipherSC:aes128-cbc*
MDTM|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /*
USER|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *CONNECT SSH-2.0-J2SSH_Maverick_1.4.47__Oracle_America,_Inc.
USER|01/22/2018 07:45:59 PM| CipherCS:aes128-cbc CipherSC:aes128-cbc*
MDTM|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /*
MDTM|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *fileExists /EDP/ETD_EDP_20141124.csv:true*
MDTM|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /EDP/ETD_EDP_20141124.csv*
MDTM|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *fileExists /EDP/ETD_EDP_20141124.csv:true*
RETR|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR START /EDP/ETD_EDP_20141124.csv*
RETR|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR /EDP/ETD_EDP_20141124.csv:Y7kpB*
RETR|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] WROTE: *150 Opening BINARY mode data connection for /EDP/ETD_EDP_20141124.csv (27009 bytes). ("/EDP/ETD_EDP_20141124.csv") R E T R*
RETR|01/22/2018 07:45:59 PM|226-Download File Size:27009 bytes @ 26K/sec.
RETR|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] WROTE: *226 Transfer complete. MD5=0b6147077f373057803e6a60f6488bbc ("/EDP/ETD_EDP_20141124.csv") RETR*
RETR|01/22/2018 07:45:59 PM|[406896:abcd:195.245.181.52] WROTE: *Adding ETD_EDP_20141124.csv to event tracking.*
RETR|01/22/2018 07:45:59 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR END   /EDP/ETD_EDP_20141124.csv:Y7kpB*
USER|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *CONNECT SSH-2.0-J2SSH_Maverick_1.4.47__Oracle_America,_Inc.
USER|01/22/2018 07:46:00 PM| CipherCS:aes128-cbc CipherSC:aes128-cbc*
MDTM|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /*
USER|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *CONNECT SSH-2.0-J2SSH_Maverick_1.4.47__Oracle_America,_Inc.
USER|01/22/2018 07:46:00 PM| CipherCS:aes128-cbc CipherSC:aes128-cbc*
MDTM|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /*
MDTM|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *fileExists /EDP/ETD_EDP_20141125.csv:true*
MDTM|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *getFileAttributes /EDP/ETD_EDP_20141125.csv*
MDTM|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *fileExists /EDP/ETD_EDP_20141125.csv:true*
RETR|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR START /EDP/ETD_EDP_20141125.csv*
RETR|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR /EDP/ETD_EDP_20141125.csv:myBHS*
RETR|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] WROTE: *150 Opening BINARY mode data connection for /EDP/ETD_EDP_20141125.csv (27753 bytes). ("/EDP/ETD_EDP_20141125.csv") R E T R*
RETR|01/22/2018 07:46:00 PM|226-Download File Size:27753 bytes @ 27K/sec.
RETR|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] WROTE: *226 Transfer complete. MD5=421a666c0e694f0133457eb4785ad992 ("/EDP/ETD_EDP_20141125.csv") RETR*
RETR|01/22/2018 07:46:00 PM|[406896:abcd:195.245.181.52] WROTE: *Adding ETD_EDP_20141125.csv to event tracking.*
RETR|01/22/2018 07:46:00 PM|[SFTP:406896_0:abcd:195.245.181.52] READ: *RETR END   /EDP/ETD_EDP_20141125.csv:myBHS*

 


When all threads are used, its an indication of something else.


It dumps a full thread dump...you provided a snippet of it not revealing anything.


I need to see the full thread dump, ideally as an attached file.  You can also email to our support email if you don't want to post it publicly.


Thanks,

Ben

Hi Ben,


The thread dump is attached.


Thanks,
João

txt
(252 KB)

I think you re on an old CrushFTP version...  Normally the thread dump includes the version info, yours doesn't which confirmed its definitely kind of old.


Your first action needs to be updating CrushFTP to current to see if the issue is already resolved in the current version.

I'm on 8.2.1_38 which is fairly new, the latest is _49.

Should I update first?

Nevermind...updating won't help.  I forgot that when you click dump thread list you get version info.  But when its logged you don't.


Restart CrushFTP now to get the threads cleared up.  You can increase max threads in the preferences, music tab.  I suggest 10000 instead of 800 if you are on a 64bit JVM.


Is this client opening hundreds of connections at a time?


What is memory usage like on the dashboard for used and max?

Its running on a 64-bit JVM indeed so I will follow your advice and increase the thread count to 10000 followed by a restart. I will also re-enable the user and check the memory usage.


The client seems to be downloading the files on the same connection, at least all the logging refers to the same session id 406896_0.

What is your idle timeout that you configured on the user too?

Is it the Max Idle Time setting? If so it is the default (10 minutes). Does that explain the threads not returning to the pool immediately after each file is downloaded? 

Not exactly as they should have disappeared once they disconnected...and it should have only been one thread per user session...


What SFTP client are they using, and any info in exactly what they are doing?

I'll ask but looking in the log seems they are using https://github.com/sshtools/j2ssh-maverick

We use the server library from the same vendor, but the commercial version of it.


So wondering if they are coding something odd and opening many sessions in SFTP on their one connection....


If they are using their own library, its *very* likely they have made a mistake in how they are implementing it.  Its one thing to use an existing took, but when you code the library actions yourself, we commonly see users make mistakes.


In the user info tab of the admin, do you only see one session, or many?

I've managed to capture the issue in the dashboard. I'll check the number sessions.


image


We can't replicate this issue, so its likely they have coded their use of the library wrong.  They are leaving sessions open, or they keep creating new SFTP subsystems as they go and never close them down, or don't reuse the one that just finished a transfer, etc.

Thanks, is there any way we can limit the number of sessions or subsystems that are created in order to control this type of misbehavior?

Login to post a comment