Article ID: 120355, created on Feb 27, 2014, last review on Mar 16, 2015

  • Applies to:
  • Odin Business Automation Standard 4.5

Symptoms

Gaps in the hourly traffic data for all containers is showing in PBAS:

CP > Resources Usage > Traffic statistics

Date    Incoming traffic        Outgoing traffic        Combined traffic
...
2014-02-24 01:00        6.59MB  37.76MB         44.35MB
2014-02-24 02:00        0.00MB  0.00MB  0.00MB
2014-02-24 03:00        0.00MB  0.00MB  0.00MB
2014-02-24 04:00        0.00MB  0.00MB  0.00MB
2014-02-24 05:00        0.00MB  0.00MB  0.00MB
2014-02-24 06:00        0.00MB  0.00MB  0.00MB
2014-02-24 07:00        0.00MB  0.00MB  0.00MB
2014-02-24 08:00        0.00MB  0.00MB  0.00MB
2014-02-24 09:00        4.08MB  53.80MB         57.88MB
...

I've verified that hourly traffic is counted correctly on HW node by PVA.

It seems service vzcoll just hung since 2014/02/24 01:39:24 till 2014/02/24 09:18:00. For example for node #198:

~# cat /var/log/hspc/vzcoll.log | grep -n1 'Node #198' | grep 'HSPC::VZAgent::VZA46::Extract::extract_ve_net' | grep '2014/02/24'

...
[2014/02/24 01:30:38] [INFO] [9766] [HSPC::VZAgent::Common::call_func]  : Call to `HSPC::VZAgent::VZA46::Extract::extract_ve_net'
[2014/02/24 01:39:24] [INFO] [9766] [HSPC::VZAgent::Common::call_func]  : Call to `HSPC::VZAgent::VZA46::Extract::extract_ve_net'
[2014/02/24 09:18:00] [INFO] [24449] [HSPC::VZAgent::Common::call_func]  : Call to `HSPC::VZAgent::VZA46::Extract::extract_ve_net'
[2014/02/24 09:23:04] [INFO] [24449] [HSPC::VZAgent::Common::call_func]  : Call to `HSPC::VZAgent::VZA46::Extract::extract_ve_net'
...

Cause

The most possible reason of vzcoll hang-up is unstable network communication with certain nodes, e.g. with the Node #159.

/var/log/hspc/vzcoll.log

[2014/02/24 01:48:52] [INFO] [9766] [HSPC::Collector::log_debug_message_with_hn] Node #159 : Response received, processing ...
[2014/02/24 01:48:52] [INFO] [9766] [HSPC::Collector::log_debug_message_with_hn] Node #159 : recv_pkt() result: 2, error: 11, buf lenght 0
[2014/02/24 01:48:52] [INFO] [9766] [HSPC::Collector::process_response]  : Message received. Type: 2
[2014/02/24 09:13:08] [INFO] [9766] [HSPC::Collector::TERM]  : VZAgent collector caught SIGTERM, shutting down

Such behavior is considered to be a software issue PBAS-29284 - vzcoll stability - needs improvement that was fixed in PBAS 4.5.2 To resolve the issue, please upgrade to the latest PBAS version or use one of the hotfixes below.

Hotfix for PBA-S 4.3.3 and PBA-S 4.5.1 CentOS5, 32-bit

The hotfix for PBA-S 4.3.3 and PBA-S 4.5.1 CentOS5, 32-bit are attached. To install the hotfix download the file for your PBA-S version and install it on PBA-S node using the following commands:

On PBA-S 4.3.3

~# rpm -Ufv hspc-vzcoll-4.3.3-56.swsoft.i386.rpm
~# service vzcoll restart

On PBA-S 4.5.1

~# rpm -Ufv hspc-vzcoll-4.5.1-23.swsoft.i386.rpm
~# service vzcoll restart

The hotfix provides the following solution: new timeout is introduced (PROCESS_RESPONSE_TIMEOUT = 60). When response is received with error 11 (resource temporarily unavailable) vzcoll does not get stuck on processing it (stops by timeout).

Search Words

vzcoll

hourly traffic

incorrect traffic

624ca542e40215e6f1d39170d8e7ec75 caea8340e2d186a540518d08602aa065 400e18f6ede9f8be5575a475d2d6b0a6 70a5401e8b9354cd1d64d0346f2c4a3e

Email subscription for changes to this article
Save as PDF