Ensuring service availability for customers is an important IT challenge. It can be provided at various levels:
- At the application level;
- At the virtual machine (VM) level;
- At the storage level.
In the article below, we will review the scenarios for each of them.
1. FAULT TOLERANCE IN THE APPLICATION LEVEL
It is provided by an excessive number of computers in the group (nodes or nodes in the cluster), united by communication channels and provided to the end user as a single service called a cluster. The only matter is placing these nodes. It is possible to place them in a single datacenter and eliminate problems related to technological maintenance and the technical failure of one of the nodes by transferring the active node from one host to another.
Or you can consider a more comprehensive approach and place them in different data centers, thus significantly reducing the risks and increasing the number of factors against which the system is protected (storage, hypervisors, communication channels, geographical distribution). In the VMware vCloud Director infrastructure, this is ensured by connecting an additional virtual data center to the organization. Then, in the cloud control panel you can select VDC.
Each VDC is independent. It allows you to create your vApp, independent networks, VMware vShield Edge and other virtual infrastructure objects. To do this, select the necessary VDC in the panel from the Datacenters tab. The selected VDC is shown at the top of the screen.
When creating a new vApp, you can always pre-select the right VDC where you want your virtual machines to be located.
In the vCloud Diector, the window title helps to determine the location of your vApp.
By creating your vApps in different datacenters and hosting virtual machine nodes there, you provide application-level resiliency for multiple services, for example, this solution is applied to:
- Microsoft Exchange using a DAG (Database Availability Group) cluster.
- Microsoft Active Directory, using the replication mechanism between domain controllers
- DFS
Microsoft SQL Server, by means of the AlwaysOn technology (starting with MS SQL Server 2012) you can use secondary replicas, with the possibility of both synchronous (slow, without data loss) and asynchronous replication (faster, with possible data loss). For more details, see https://docs.microsoft.com/ru-ru/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-2017
Network connection between virtual machines located in different data centers can be provided by the following:
- By connecting VLANs between data centers, through additional service connections from Cloud4Y. This allows a single private network (gray addresses) between all VMs.
- Via public networks (Internet), by creating an additional VMware vShield Edge with a white IP address in the second Data Center and connecting via public IP addresses, or by setting up a VPN channel between EDGEs via gray addresses. In this case, the grey subnets must be different in different Data Centers.
Both of these options can be combined. On the one hand, you will allow servers to connect directly without VPN. On the other hand, you will provide two public points (white IP addresses) in different datacenters to connect your users and provide a high availability service.
2. FAULT TOLERANCE AT THE VIRTUAL MACHINE LEVEL
To meet the strategic goal of providing a highly available virtual machine, we suggest taking advantage of Veeam Backup & Replication.
This technology, using replicas, allows you to create ready to start copies of the virtual machine in another data center. Veeam replication speeds up disaster recovery, prevents data loss and ensures business continuity.
If the initial machine in one data center stops working for some reason, you can quickly switch to replication and restore business-critical services and applications with minimal downtime in another data center. For users, this will happen quickly and they will be able to continue their work while IT staff troubleshoots the 'problem' data center.
The virtual machine replica is created by creating replication tasks with a certain frequency. During the first session of the replication task, Veeam Backup & Replication copies the entire VM image and registers a copy of the VM on the ESX(i) target host. In the next sessions Veeam Backup & Replication copies only the VM's modified blocks of data relative to the last session (incremental changes) and creates a new recovery point for replication of the VM.
Using this recovery point, you can "roll back" the virtual machine to the required state. We recommend that you support multiple recovery points to ensure that your replica remains operable. In case the last recovery point is not working, you can use an earlier recovery point.
Once the problem is resolved in one of the datacenters, you can switch from replica back to the original virtual machine or continue using the replica as a working VM.
3. FAULT TOLERANCE AT THE DATA STORAGE SYSTEM (DSS) LEVEL
For customers facing specific disaster tolerance tasks, we offer a SyncCluster solution that provides high fault tolerance and service availability.
Cloud4Y has developed a new solution in the Russian market - SyncCluster with SLA 99.99%. SyncCluster is a unique service that combines an array based on clusters with synchronous mirroring.
A storage system has been implemented in Moscow, where one half is located in one Tier 3 data center and the other half is 10 km away in the other Tier 3 data center, both halves of which operate synchronously as a single structure. This approach ensures that data is written to both storage sites. In case of any failure in one of the data centers (power failure, failure of any part of the storage system, failure of controllers, multiple disks in one disk group in a short period of time, communication channels between data centers), your data will remain available, (RPO=0, RTO=10 minutes), i.e. no transaction is lost.
Whereas in the previous solution replication was performed at the VM level and with certain periods, synchronous replication is performed at the storage level all the time, thus avoiding unnecessary load on the servers, providing a maximum SLA of 99.99% and saving data. This solution is particularly suitable for software that does not support application-level replication.
SyncCluster from Cloud4Y - a cost-effective solution that provides continuous availability of services and data for companies where business continuity is crucial. We offer MetroCluster as a service with a monthly fee, which will allow our customers to reduce capital costs for the purchase of hardware and its technical maintenance.