Search Engine: Elastic

Article ID: 131743, created on Nov 7, 2017, last review on Nov 7, 2017

  • Applies to:
  • Operations Automation
  • Business Automation

Symptoms

A sales order fails with the error:

Execution Failed: Operations Automation is not available. Connection to 192.168. 90. 11:8440 has failed: Connection timeout. Please check network settings.

The corresponding pem.activateSubscription API request sent from Billing to OA is not getting a response in 10 minutes. From /var/log/pa/core.log, the API request is stuck after finishing providing resources on SaaS level:

Nov  7 20:45:01.615 : DBG [openapi:1605 openapi-task-95:939 pau]: c.p.p.s.p.e.SubscriptionManagerOpenAPI Entering activateSubscription, accountId: 1000010, subscriptionId: 1000050, subscriptionName: null, stId: 3, parentNotificationId: null
...
Nov  7 20:45:01.861 : DBG [openapi:1605 1:14928:7f7e757fb700 SAAS ]: [ {anonymous}::provideResourcesFromParamsImpl] <
=== EXIT [0.000004]
Nov  7 20:45:01.861 : DBG [openapi:1605 1:14928:7f7e757fb700 SAAS ]: [ SaaS::SaaSManager_impl::doChangeSubscriptionLi
mits] <=== EXIT [0.036235]

while the request should normally end with the line like:

Nov  7 03:31:43.085 : DBG [openapi:10 openapi-task-10 pau]: c.p.p.s.x.c.DynamicMethodHandler XML RPC invocation of 'activateSubscription' took 4594 ms

The following error could be found in /var/log/pa/console.log some time before the incident:

171107 10:32:58 WARN  [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] (EJB default - 7) IJ000604: Throwable while attempting to get a new connection: null: javax.resource.ResourceException: IJ031084: Unable to create connection
...
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is in recovery mode
        at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:443)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:217)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:52)
        at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216)

At the same time in /var/lib/pgsql/9.6/data/pg_log/postgresql-Tue.log:

[2017-11-07 10:41:12.178 EET] p=19835:303@0/0 c=oss@192.168.10.10/oss:[unknown] PANIC:  hash table "Shared Buffer Lookup Table" corrupted

The Virtuozzo container, where OA Management Node resides, was migrated in online mode exactly at the time of the postgresql error. This could be seen in /var/log/messages on the Virtuozzo node:

Nov  7 10:40:00 pcs01 vzmdest[181412]: Start of CT 101 migration (private /vz/private/101, root /vz/root/101, opt=24)
Nov  7 10:40:58 pcs01 vzmdest[181412]: OfflineManagement CT#101 ...
Nov  7 10:40:58 pcs01 vzmdest[181412]: done
Nov  7 10:40:58 pcs01 vzmdest[181412]: Undumping CT#101 ...
Nov  7 10:41:11 pcs01 kernel: [78571813.682698] CT: 101: restored
Nov  7 10:41:11 pcs01 vzmdest[181412]: done
Nov  7 10:41:11 pcs01 vzmdest[181412]: Resuming CT#101 ...
Nov  7 10:41:11 pcs01 vzmdest[181412]: done

Cause

A Virtuozzo online migration may cause corruption of the shared memory segments that postgresql server relies on.

Reference: checkpointing shared memory fails

Resolution

In order to fix the issue, restart OA services on the Management Node.

Avoid using Virtuozzo online migration for the management containers (OA and BA), if they are located on Virtuozzo nodes of 6.0 (or lower) versions.

198398b282069eaf2d94a6af87dcb3ff caea8340e2d186a540518d08602aa065 e12cea1d47a3125d335d68e6d4e15e07 5356b422f65bdad1c3e9edca5d74a1ae

Email subscription for changes to this article
Save as PDF