Search Engine: Elastic

Article ID: 131441, created on Sep 10, 2017, last review on Sep 10, 2017

  • Applies to:
  • Operations Automation
  • Business Automation

Q: I have tried to perform a manual failover to node B, but instead it got disconnected, why?

[root@dbB pgha]#  ./pghactl.py status --format=raw
*************** Common PGHA info ***************************
Automatic slave recovery mode: ALL
Local node address       : 192.0.2.2

*************** Pgpool cluster data ************************
Status                   : ALIVE
Quorum Status            : QUORUM_EXIST
Pgpool 192.0.2.3      : STANDBY
Pgpool 192.0.2.1      : MASTER
Pgpool 192.0.2.2      : STANDBY

*************** DB nodes ***********************************
DB node name             : a.db.node
IP address               : 192.0.2.1
Is master                : TRUE
PostgreSQL version       : 9.6.2
Status                   : UP
Pgpool attach status     : ATTACHED_AS_UP
Replication status       : SYNC_REPLICATING_MASTER

DB node name             : b.db.node
IP address               : 192.0.2.2
Is master                : FALSE
PostgreSQL version       : 9.6.2
Status                   : UP
Pgpool attach status     : ATTACHED_AS_UP
Replication status       : ACCEPTING_REPLICATION

[root@dbB pgha]#  ./pghactl.py do-failover -n 1 -r b.db.node
[root@dbB pgha]#  ./pghactl.py status --format=raw
*************** Common PGHA info ***************************
Automatic slave recovery mode: ALL
Local node address       : 192.0.2.1

*************** Pgpool cluster data ************************
Status                   : ALIVE
Quorum Status            : QUORUM_EXIST
Pgpool 192.0.2.3         : STANDBY
Pgpool 192.0.2.1         : MASTER
Pgpool 192.0.2.2         : STANDBY

*************** DB nodes ***********************************
DB node name             : a.db.node
IP address               : 192.0.2.2
Is master                : TRUE
PostgreSQL version       : 9.6.2
Status                   : UP
Pgpool attach status     : ATTACHED_AS_UP
Replication status       : NOT_REPLICATING_MASTER

DB node name             : b.db.node
IP address               : 192.0.2.1
Is master                : NO_DATA
PostgreSQL version       : NO_DATA
Status                   : NO_DATA
Pgpool attach status     : ATTACHED_AS_DOWN
Replication status       : NO_DATA

A: The 'do-failover' command is not supposed to be launched manually. As documentation states: "Performed by Pgpool when the master or slave failure is detected."

What happens, when this command is launched while 2 DB servers are up and running, is: the cluster acknowledges a failover, the target node (NEW_MASTER_NODE) tries to become the new master. Since (normally) the full automatic recovery is on, the cluster immediately starts the split-brain problem resolution, compares the time stamps of the last commits on both nodes, and, having detected node A as the most recent one, puts down node B.

Q: What is the proper way to test a cluster failover?

A: The correct way to simulate/perform failover is to cause an outage:

  • stop 'network' service
  • kill postgres process forcibly
  • stop the DB container forcibly (in case of VZ environment and DB node installed in a container)
  • simulate memory shortage
  • etc.

Q: How to relocate master role, in case I would like to perform a server maintenance? A: To put a node into maintenance mode, use enable-maintain command of the pghactl.py utility:

[root@dbB pgha]#  ./pghactl.py enable-maintain

This way, PGHA cluster turns on the mode for hardware node maintenance, detaches the database from the Pgpool cluster.

If the command is performed on slave DB node, then it disables the sync replication from master DB node.

If the command is performed on master DB node, the local DB is disabled while the database on the opposite node is promoted to master. The local Pgpool is stopped.

To restore the node state in cluster, stop the maintenance mode:

[root@dbB pgha]#  ./pghactl.py disable-maintain

Q: I have tried to promote node B to sync master, but nothing happens, why?

[root@dbB pgha]#  ./pghactl.py sync-master
[root@dbB pgha]#  ./pghactl.py status --format=raw
*************** Common PGHA info ***************************
Automatic slave recovery mode: ALL
Local node address       : 192.0.2.2

*************** Pgpool cluster data ************************
Status                   : ALIVE
Quorum Status            : QUORUM_EXIST
Pgpool 192.0.2.3         : STANDBY
Pgpool 192.0.2.1         : MASTER
Pgpool 192.0.2.2         : STANDBY

*************** DB nodes ***********************************
DB node name             : a.db.node
IP address               : 192.0.2.1
Is master                : TRUE
PostgreSQL version       : 9.6.2
Status                   : UP
Pgpool attach status     : ATTACHED_AS_UP
Replication status       : SYNC_REPLICATING_MASTER

DB node name             : b.db.node
IP address               : 192.0.2.2
Is master                : FALSE
PostgreSQL version       : 9.6.2
Status                   : UP
Pgpool attach status     : ATTACHED_AS_UP
Replication status       : ACCEPTING_REPLICATION

A: In case of no unavailability on both DB nodes, it is not possible to promote another DB node to master, as it inevitably causes a split-brain problem. Using a full automatic recovery mode, the PGHA cluster immediately attemps to resolve the situation and keeps the old master in charge.

198398b282069eaf2d94a6af87dcb3ff caea8340e2d186a540518d08602aa065 e12cea1d47a3125d335d68e6d4e15e07 5356b422f65bdad1c3e9edca5d74a1ae

Email subscription for changes to this article
Save as PDF