This article provides general troubleshooting steps for the most common issues with Linux Shared Hosting NG for Parallels Operations Automation (POA) 5.4. For POA 5.5, please refer to the 5.5 LSH deployment guide page 40: "Troubleshooting: Webcluster Issues".
General problems with an NG cluster
Symptoms - Websites hosted in an NG cluster do not work. In general, when a problem with NG hosting arises, there is a critical failure on one of the NG components, namely:
- Load balancer (LB)
- NG Caching Service (shstg)
- Apache server (httpd)
- NG Configuration Database (CDB)
- NFS Storage
The most common problems occur with
The Apache server, through the NG module
mod_vhost, uses a pair (
<hostname, IP address>) to obtain virtual host configuration data from the CDB server via the NG Caching Service,
shstg may stop providing data to Apache. First, check that:
shstgservice is running on web servers in the NG cluster and there is only one instance of it:
~# ps aux | grep shstg root 5055 0.0 0.2 157028 3020 ? Ssl Jul02 18:56 /usr/sbin/shstg_srv /etc/h2e_shstg.conf
If it is not running or there are two or more instances of
shstgin the process list (see KB article #113154 for more details), restart
~# /etc/init.d/shstg start | restart
The Apache server is running.
Follow the same steps as above and start Apache:
~# /etc/init.d/httpd start
The NG Caching Server has access to the PostgreSQL database on the CDB server. In the default correct configuration, you will see the network connection from the NG Caching Service (
shstg_srv) on the web server to the port 5432 on the CDB server:
~# netstat -antp | grep :5432 tcp 0 0 10.39.84.43:43269 10.39.94.114:5432 ESTABLISHED 5055/shstg_srv
NFS shared storage is mounted on all web servers in the cluster using the
Remember that NFS shared storage is mounted on web servers using
automounttechnology, so if there are no open requests to websites, NFS storage may be automatically unmounted after an idle timeout.
To check if automount works correctly, try to list the contents of any webspace on the web server, or list the mount point where NFS storage is mounted.
For example, if the NFS volume with ID #1 is configured in the web cluster properties, then in the default installation, you might try to list the folder
/var/www/vhosts/1. Even if it is not mounted already, automount will mount it automatically.
If both the
httpdservices are running and there are no suspicious errors in the logs, there may be a problem on the Load Balancer. Read KB article #114327 for more details about NG Load Balancer configuration and functionality.
pulse is the controlling daemon that spawns the
lvsddaemon and performs heartbeating and monitoring of services on the real web servers in the NG cluster.
Make sure pulse and its child processes
nanny are running on the LB server:
1462 ? Ss 1:30 pulse 1470 ? Ss 0:34 \_ /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf 1488 ? Ss 3:46 \_ /usr/sbin/nanny -c -h 10.39.84.43 --server-name 10.39.94.43 -p 80 -r 80 -f 100 -s GET / HTTP/1.0\r\n\r\n -x HTTP -a 10 -I /sbin/ipvsadm -t 10 -w 32 -V 0.0.0.0 -M g -U /usr/sbin/h2e_get_cluster_load.sh --lvs 1489 ? Ss 3:43 \_ /usr/sbin/nanny -c -h 10.39.84.44 --server-name 10.39.94.44 -p 80 -r 80 -f 100 -s GET / HTTP/1.0\r\n\r\n -x HTTP -a 10 -I /sbin/ipvsadm -t 10 -w 32 -V 0.0.0.0 -M g -U /usr/sbin/h2e_get_cluster_load.sh --lvs
If there are problems with Apache on the web server, you will see the error message [inactive] shutting down in the /var/log/messages file on the LB server:
Jul 16 04:41:14 nglb nanny: [inactive] shutting down 10.39.84.43:80 due to connection failure
When the Apache server is back on the web server again, another entry will be put into the
/var/log/messages file on the LB server:
Jul 16 04:41:54 nglb nanny:[ active ] making 10.39.84.43:80 available
So, check the system log file on the Load Balancer to see the problems with Apache on the web servers in the NG cluster.
ipvsadm utility to check the current load balancing statistics and rules (including the web servers' weights). At the very least, check that all web servers in the cluster are listed by
~# ipvsadm --list IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 100 lblc -> 10.39.84.43:0 Route 32 0 0 -> 10.39.84.44:0 Route 32 0 0 ~# ipvsadm -L --stats IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes -> RemoteAddress:Port FWM 100 62337 825314 0 49918802 0 -> 10.39.84.43:0 31 243 0 20959 0 -> 10.39.84.44:0 1 1 0 60 0
If the steps above do not rectify the problem, use
tcpdump on the LB and web servers to monitor traffic.
NG cluster performance problems
In these cases, a high load (>20-30) will be observed on the web server, and you will usually notice a lot of php-cgi processes.
This is a rare situation. It means that all processes are limited by the same resource. Currently, it is suspected that NFS could be a limiting factor under certain circumstances.
The general way to troubleshoot the performance issues would be to:
- Set up monitoring
- Locate the bottleneck
- Fix the bottleneck
The most likely bottlenecks are as follows:
NFS server performance problems
A slow or limited network connection between web servers and the CDB server or MySQL server(s) where customers' databases are working
- The load balancer algorithm needs tuning
An additional note about point two above: In most cases, customer websites are not static: they have different web applications installed that work with the database. In most cases, they use MySQL databases. The web application works on the NG web server and connects to the MySQL database working on a remote server.
If many websites connect to the same MySQL server simultaneously, the connection between the NG web servers and the MySQL server must have enough bandwidth. Otherwise, the web application will wait for data from the MySQL server, increasing the load on the NG web server.
If the MySQL server is deployed inside a Parallels Virtuozzo Containers (PVC) container, then PVC may limit outgoing traffic for such containers using the traffic shaper. This is standard functionality of PVC and, in the event of a connection between the NG web server and MySQL servers, this may cause a bottleneck. The same goes for the connection between the NG web servers (caching service, shstg) and the NG Configuration Database server, which may also be deployed inside a PVC container.
To solve the problem with traffic limiting, find which MySQL servers are being used by NG websites and see if they are deployed inside a PVC container. Then, consider doing the following on the corresponding PVC server where MySQL or the NG Configuration Database servers are running:
Stop the traffic shaper on the PVC server (confirm with the Provider, since this will stop traffic shaping for all containers on the server):
~# /etc/init.d/vz shaperoff