NG Load Balancer experiences a complete outage, all routing rules stop serving requests.
The following sequence of messages appears in
Jan 19 22:54:58 loadbalancer2 nanny: [inactive] shutting down 18.104.22.168:80 due to connection failure Jan 19 22:54:58 loadbalancer2 nanny: /sbin/ipvsadm command failed! Jan 19 22:54:58 loadbalancer2 lvs: nanny died! shutting down lvs Jan 19 22:54:58 loadbalancer2 lvs: shutting down virtual service 32 Jan 19 22:54:58 loadbalancer2 nanny: Terminating due to signal 15
This may happen after removing an NG node from the load balancer using the following command:
~# ipvsadm -d -f 100 -r xxx.xxx.xxx.xxx
In scope of Odin Service Automation platform, the issue is going to be addressed in scope of POA-100010, LVS configuration will allow removing a single routing entry without affecting all others.
Originally the issue derives from the following Red Hat behavior:
739223 Nanny crashes and shuts LVS down if a service is deleted using ipvsadm and then the corresponding real server goes down.
In case an outage occurs, LVS services should be restarted:
# service pulse start
In order to avoid the outage during maintenance for any NG node, follow these instructions:
Stop the Apache service on the node. Load Balancer will not redirect any request to that node if the Apache service is not running.
# service httpd stop
Once the maintenance is done, start the Apache service.
# service httpd start