Some applications of control panel like "Cloud Infrastructure" or the whole Provider's Control Panel became unavailable. Login attempt on management node's UI via
http://mn_ip:8080 produces an error:
Different types of Operations Automation's tasks fail with the following error:
WFLYEJB0442: Unexpected Error
Common symptom for all tasks that in
core.log each of them ends with the following error message:
[task:159017197:17829 p:-default-threadpool;-w:-Idle:490 pau]: c.p.p.tracer exit by exception: com.parallels.pa.service.host.ejb.HCLSenderBean.sendHCLjava.lang.OutOfMemoryError: unable to create new native thread
Note the method that could not created thread -
console.log OutOfMemory errors occur frequently:
SEVERE [org.glassfish.jersey.server.ServerRuntime$Responder] (pa-rest task-192) An exception was not mapped due to exception mapper failure. The HTTP 500 response will be returned.: com.google.common.util.concurrent.ExecutionError: com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: unable to create new native thread
Nevertheless, memory statistics shows no outage of resources.
Thread leak in method
SendHCL. Threads initiated in this method stay open and at some point,
pau process reaches system limit for allowed amount of threads.
Tasks "Get traffic usage" and "Collect resources usage statistics from web clusters" contribute mostly since by default they run frequently and send a lot of requests.
This issue was passed for further investigation to the Engineering team as POA-111472: "Outage of several WildFly applications, sendHCL java.lang.OutOfMemoryError".
Issue could be workarounded by performing the following steps:
Increase thread limit to 16192 for user jboss in file
jboss soft nproc 16192
Restart of OA services is required to apply changes.
Make "Get traffic usage info" and "Collect resources usage statistics from web clusters" tasks less frequent (once an hour)
- Restart OA services per KB during usual Maintenance time to reset thread count
Please contact your technical manager to clarify status of POA-111472.