How does High Availability work in Operatios Automation Cloud Infrastructure environment?
When a Virtuozzo node is registered in OACI with Cloud Storage support enabled,
OACI_hn_failRPM is installed on the server. It stores the OACI IM address and credentials in
/usr/local/etc/PACI_hn_fail.conffile and brings two additional shaman scripts
shamandservice on a node detects that the node is crashed (which means that it has become unavailable for the shaman master server),
PACI_hn_failscript is called on the shaman master node and passes the IP address of the crashed node to OACI IM and triggers the failover procedure.
The failover procedure:
3.1. If the crashed node has "INACTIVE" state in OACI or if other nodes in cluster have "INACTIVE" state in OACI and there's no candidate node to put resources one, then no virtual environments are relocated.
3.2. If the crashed node has "ACTIVE" or "LOCKED" state, then OACI IM chooses appropriate nodes in the same Cloud Storage cluster, which currently host the least amount of containers, and relocate the resources from the failed node to the healthy ones. During the relocation process, the VE gets a transient "FAILOVER_IN_PROGRESS" state in its history. After a successful failover, it gets a transient "FAILOVER_SUCCESS" state. All Load Balancers that were present on the failed node are recreated on healthy nodes.
Note!: All VEs present in OACI would be relocated, it is impossible to disable failover for VEs created from OACI.
- A notification is sent to the affected customers that their servers were relocated to other nodes.