VMware | vSphere | Part 1 | Business Continuity using HA Cluster
I am about to start the Two-part series where I am going to discuss two very important features of VMware vSphere. In this first part, we will begin with vSphere’s High Availability (HA) protection feature, while in the second part we will explore the automated workload scheduling feature called Distributed Resource Scheduler (DRS).
There are many great advantages that are offered by virtualizing your infrastructure and running virtual resources to serve out business-critical workloads. In the case of VMware vSphere, it provides many notable features and capabilities that provide high-availability in the environment as well as automated workload scheduling to ensure the most efficient use of hardware and resources in your vSphere environment. In this two-part series, we are going to point out 2 of the core cluster-level aspects of vSphere
within the enterprise – vSphere HA and DRS. You have most likely seen both of these referenced along with running vSphere in the enterprise.
So, let’s start first with –
What is VMware vSphere HA? What do it do?
How do you benefit by running HA in your vSphere environment?
VMware vSphere Clusters
One of the apparent benefits and best practices once utilizing VMware vSphere to run business-critical workloads is to run a vSphere Cluster.
What is a vSphere Cluster?
A vSphere cluster may be a configuration of quite one VMware ESXi server aggregate along as a pool of resources contributed to the vSphere cluster. Resources like CPU cipher, memory, and within the case of software-defined storage like vSAN, storage, square measure contributed by every ESXi host.
Why is running your business-critical workloads on prime of a vSphere Cluster important?
When you consider the benefits provided by running a hypervisor, it permits quite one server to run on high of one set of physical hardware. Virtualizing workloads during this means provides several potency edges in orders of magnitude in comparison to running one server on one set of physical hardware.
However, this could additionally become the mythical being heel of a virtualized answer, since the impact of hardware failure will have an effect on more business-critical services and applications. you’ll imagine that if you merely have one VMware ESXi host running several VMs, the impact of losing that single ESXi host would be vast.
This is where running multiple VMware ESXi hosts throughout a vSphere Cluster very shines.
However, you’ll raise yourself, that how will merely be running multiple hosts during a cluster enhance your high-availability? In what way will a bunch within the vSphere Cluster “know” if another host has failed? Is there a special mechanism that’s to look after managing the high-availability of workloads running on a vSphere Cluster? Sure, there is. Let’s see.
What is HA in VMware?
VMware realised the necessity to possess a mechanism to be able to offer protection against a failed ESXi host within the vSphere Cluster. With this need, VMware vSphere High-Availability (HA) was born.
VMware vSphere HA delivers the subsequent benefits:
- VMware vSphere HA is cost-efficient and permits automatic restarts of VMs and vSphere hosts once there’s a server outage or an OS failure detected within the vSphere environment.
- Monitors all VMware vSphere hosts & VMs within the vSphere Cluster
- Delivers high-availability to most applications running in virtual machines regardless of the OS and applications.
- The beauty of VMware’s vSphere HA solution that’s enforced via the VMware Cluster is the simplicity that it is organized. With a number of clicks through a wizard-driven interface, high-availability is configured.
How does this compare with traditional “clustering” technologies?
Windows Server Failover Clustering Comparison
Windows Server Failover Clustering (WSFC) has become the clustering technology that most think of when they have clustering technology in mind. The problem seen with WSFC is that it requires a lot of specialized expertise to run WSFC services correctly, especially when it comes to upgrades, patching, and general operational tasks.
Contrasting vSphere HA with WSFC, the operational overhead is nominal.
There is little chance that HA can be configured incorrectly as it is either enabled on a cluster or not. With WSFC, there are many considerations that need to be made when configuring WSFC to avoid both configuration and implementation mistakes.
Consider about the following:
- Failover clustering (FC) requires applications that support clustering (SQL, etc.)
- Failover clustering (FC) requires quorum is configured correctly.
- FC not supported by many legacy operating systems and applications.
- FC requires complexity of cluster network names, resources, and networking.
Windows Server Failover Clustering is advertised to provide near zero-downtime at the application level. However, when you add in the expertise required for a properly functioning HA solution, along with the proper implementation of WSFC, the risks can begin to outweigh the benefits of using WSFC for high-availability of applications and services. This is especially true when for most organizations who may not truly need a “zero downtime” solution. Additionally, your application has to be designed to take advantage of WSFC and work properly with WSFC technology.
While vSphere HA does require a restart of the virtual machines on a healthy host when a failover occurs, it requires no installation of additional software inside the guest virtual machines, no complex configurations of additional clustering technologies, and applications or OS’s do not have to be designed to work with particular clustering technology.
Legacy operating systems and applications generally have limited abilities when it comes to supported technologies to provide high-availability. So, there literally may be no native options to provide failover functionality in the case of hardware failures.
The vSphere HA high-availability mechanism works and is simple to implement, configure, and manage. Additionally, this is a technology that is well tested in thousands of VMware customer environments, so it has a stable and long history of successful deployments.
VMware vSphere HA is a simple toggle button to turn the feature on at the vSphere Cluster level.
General Overview of vSphere HA Behaviour
By using the benefits provided to the ESXi hosts in a vSphere Cluster, in its most basic form, vSphere HA implements a monitoring mechanism between the hosts in the vSphere Cluster.
The monitoring mechanism provides some way to work out if any host within the vSphere Cluster has failing.
In the infographic below, a two-node vSphere Cluster has experienced a failure of one of the ESXi hosts in the vSphere Cluster. The vSphere Cluster has vSphere HA permissible at the cluster level.
After vSphere HA recognizes that a host in the vSphere Cluster has failed, the HA process moves the registration of VMs from the failed host over to a healthy host.
After the VMs are registered on a healthy host, vSphere HA restarts all the VMs of the failed host on a healthy ESXi host in the cluster where the VMs were reregistered. The only downtime incurred is with the restart of the VMs on a healthy host in the vSphere Cluster.
VMs are moved to a healthy ESXi host and restarted there VSphere HA Technical Overview
Prerequisites for vSphere HA
You may wonder what underlying prerequisites may be required in order for vSphere HA to work. Do you just require a VMware Cluster to allow HA work?
Unlike Windows Server Failover Clustering, there are only a few requirements that need to be in place for HA to work. Following are Requirements:
- At least two ESXi hosts
- At least 4 GB of memory configured on each host
- vCenter Server
- vSphere Standard License
- Shared storage for VMs
- Pingable gateway or another reliable network node
VMware vSphere HA Master vs Subordinate Hosts
When you enable vSphere HA on a cluster, a particular host in the vSphere Cluster is designated as the master of vSphere HA. The remaining ESXi hosts in the vSphere Cluster are configured as slave subordinates in the vSphere HA configuration.
What role does the vSphere HA ESXi host that is designated as the master play? The vSphere HA master node:
- Monitors the state of the slave subordinate hosts – If the subordinate host fails or is unreachable, the master host identifies which VMs need to be restarted.
- Monitor the power state of all VMs that are protected. If a VM fails, the master vSphere HA node ensures the VM is restarted. The vSphere HA master decides where the VM restart takes place (on which ESXi host).
- Keeps track of all the cluster hosts and VMs that are protected by vSphere HA.
- Is designated as the mediator between the vSphere Cluster and vCenter Server. The HA master reports the cluster health to vCenter and provides the management interface to the cluster for vCenter Server.
- Can run Virtual Machines themselves and monitor the status of them.
- Stores protected VMs in cluster datastores.
vSphere HA Subordinate Hosts:
- Run virtual machines locally.
- Monitor the runtime states of the VMs in the vSphere Cluster.
- Report updates of state to the vSphere HA master.
Master Host Election and Master Failure
How is the vSphere HA master host selected?
When vSphere HA is enabled for a cluster, all active hosts (no maintenance mode, etc.) participate in electing the master host. If the elected master host fails, a new election takes place where a new master HA host is elected to fulfil that role.
VMware vSphere HA Cluster Failure Types
In a vSphere HA enabled cluster, there are three types of failures that can happen to trigger a vSphere HA failover event. Those host failure types are:
- Failure – A failure is unexpectedly what you think. A host has stopped working in some form or fashion due to hardware or other issues.
- Isolation – The isolation of a host generally happens due to a network event that isolates a particular host from the other hosts in the vSphere HA cluster.
- Partition – A partition event is characterized by a subordinate host losing network connectivity to the master host of the vSphere HA cluster.
Heartbeating, Failure Detection, and Failure Actions
How does the master node determine if there is a failure of a particular host?
There are several different mechanisms the master uses to determine if a host has failed:
- The master node exchanges network heartbeats with the other hosts in the cluster every second.
- After the network heartbeat has failed, the master host checks for host liveness check.
- The host liveness check determines if the subordinate host is exchanging heartbeats with one of the datastores. Then it sends ICMP pings to its management IP addresses
- If direct communication with the HA agent of a subordinate host from the master host is not possible and the ICMP pings to the management address fail, the host is viewed as failed and VMs are restarted on a different host.
- If it is found that the subordinate host is exchanging heartbeats with the datastore, the master host assumes the host is in a network partition or is network isolated. In this case, the master simply monitors the host and VMs
- Network isolation is the event where a subordinate host is running, but can no longer be seen from an HA management agent perspective on the management network. If a host stops seeing this traffic, it attempts to ping the cluster isolation addresses. If this ping fails, the host declares it is isolated from the network.
- In this case, the master node monitors the VMs that are running on the isolated host. If the VMs power off on the isolated host, the master node restarts the VMs on another host
As mentioned above, one of the metrics used to determine failure detection is datastore heartbeating. What is this exactly? VMware vCenter selects a preferred set of datastores for heartbeating. Then, vSphere HA creates a directory at the root of each datastore that is used for both datastore heartbeating and for keeping up with the list of protected VMs. This directory is named. vSphere-HA.
There is an important note to remember regarding vSAN datastores. A vSAN datastore can’t be used for datastore heartbeating. If you only have a vSAN datastore available, there can be no heartbeat datastores used.
VM and Application Monitoring
Another really powerful feature of vSphere HA is the ability to monitor individual VM via VMware Tools and restart any virtual machines that fail to respond to VMware Tools heartbeats. Application Monitoring can restart a VM if the heartbeats for an application that is running are not received.
- VM Monitoring – With VM Monitoring, the VM Monitoring service uses VMware Tools to determine if each VM is running by checking for both heartbeats and disk I/O generated by VMware Tools. In the event these checks fail, the VM Monitoring service determines most likely the guest operating system has failed and the VM is restarted. The additional disk I/O check helps to avoid any unnecessary VM resets if VMs or applications are still functioning properly.
- Application Monitoring – The application monitoring function is enabled by obtaining the appropriate SDK from a third-party software vendor that allows setting up customized heartbeats for the applications to be monitored by the vSphere HA process. Much like the VM Monitoring process, if application heartbeats stop being received, the VM is reset.
Both of these monitoring functions can be further configured with monitoring sensitivity and also maximum per-VM resets to help to avoid resetting VMs repeatedly for software or false positive errors.
So, VMware vSphere HA is a great way to ensure that your vSphere Cluster provides very resilient high-availability to protect against general host failures of ESXi hosts in your vSphere Cluster.
Here I am concluding this post. In the next part we will see one more important feature which ensures efficient use of resources called DRS, in detail.
So be sure to read my next blog on VMware vSphere DRS.
Sumaiyya S Bagwan | Technical Trainer
SevenMentor Pvt. Ltd.
Call the Trainer and Book your free demo Class now!!!
© Copyright 2019 | Sevenmentor Pvt Ltd.