The symptoms may vary:
- failed tasks with ODBC-related errors:
PSSDBImpl: your call failed due to database server connectivity problem. The application has restored the connection. Please, retry your call.
Execution Failed: server closed the connection unexpectedly. This probably means the server terminated abnormally before or while processing the request."
- CP availability issues with the same errors in the logs.
In the online store domains are shown as unavailable for registration, logs show connection error, e.g:
[15-10-01 10:24:20.324 OPENSRS_Obj RQ00000 TH24894 NTE] Entering method OPENSRS.CheckAvailability(user = -1, SID = 0, lang = en, request = 0, localObject = 1, transaction = 0 (HP)) [15-10-01 10:24:20.324 OPENSRS_Obj RQ00000 TH24894 TRC] +++ ItemResult* OPENSRS::CheckAvailability(Int, Str)() [15-10-01 10:24:20.325 generic_wor RQ00000 TH24894 DBG]  Loading readonly a row (RegistrarID=11) in table OpenSRSConf [15-10-01 10:24:20.325 RDBMS RQ00000 TH24894 INF] Prepare [0x7f4908014390]: SELECT "RegistrarID", "UID", "Pass", "TechContactMode", "Mode", "TestHost", "TestPort", "RealHost", "RealPort", "UserArc", "DateArc" FROM "OpenSRSConf" WHERE "RegistrarID" = $1 [15-10-01 10:24:20.326 RDBMS RQ00000 TH24894 NTE] Code: 1000. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [15-10-01 10:24:20.326 OPENSRS_Obj RQ00000 TH24894 TRC] ... *** INTERRUPTED BY EXCEPTION *** ItemResult* OPENSRS::CheckAvailability(Int, Str) [15-10-01 10:24:20.326 OPENSRS_Obj RQ00000 TH24894 NTE] Finished method OPENSRS.CheckAvailability(user = -1, SID = 0, lang = en, request = 0, localObject = 1, transaction = 0 (HP))
- In postgresql logs errors
LOG: could not receive data from client: Connection reset by peerand
LOG: unexpected EOF on client connectionappear from time to time.
BA attempts to reuse the postgresql connection created earlier, but due to some networking issue the existing connection has been reset. BA does not validate the connection status, method just fails, system does not attempt to initiate a new connection, which is a bug with id #PBA-66084 ("Status of the database process transaction is associated with is not checked". But the initial root cause lays in the connection being dropped somewhere in the middle of the connection BA application server <-> database (it can be done by firewall, router, some third-party applications).
The suggestion here is either increasing the idle connections timeout on router or move BA database to the same network as BA application server to avoid connections being dropped, please contact the system administrator for the details.
To workaround the issue you may change the system settings on the database server, e.g. initial settings:
[root@hostname ipv4]# cat tcp_keepalive_time 7200 [root@hostname ipv4]# cat tcp_keepalive_intvl 75 [root@hostname ipv4]# cat tcp_keepalive_probes 9
Change them to check the behaviour:
echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time echo 90 > /proc/sys/net/ipv4/tcp_keepalive_intvl echo 1000 > /proc/sys/net/ipv4/tcp_keepalive_probes
Note, the change is reset after the server reboot.