While cloud computing has proven to be beneficial for many organisations, IT departments have been slow to trust the cloud for business-critical Microsoft SQL Server workloads. One of their primary concerns is the availability of their SQL Server, because traditional shared-storage, high-availability clustering configurations are not practical or affordable in the cloud.
Amazon Web Services and Microsoft Azure both offer service level agreements that guarantee 99.95 percent uptime – fewer than 4.38 hours of downtime per year – of IaaS servers. Both SLAs require deployment in two or more AWS Availability Zones or Azure Fault Domains respectively. Availability Zones and Fault Domains enable the ability to run instances in locations that are physically independent of each other with separate compute, network, and storage or power source for full redundancy. AWS has two or three Availability Zones per region, and Azure offers up to three Fault Domains per ‘Availability Set’.
This arrangement guarantees that 99.95 percent of the time at least one of the locations – Availability Zones or Fault Domains – will be operational. In the event of a failure of one location, a load balancer will redirect traffic to the instances in the other location.
For web servers and other non-transactional applications this can be sufficient for high availability. However, simply redirecting clients to a different instance of SQL does nothing to address the fact that each instance will now have a different data set. Something needs to be done to ensure that the data remains in sync between the SQL instances and that client redirection is done seamlessly with minimal downtime.
For a company experiencing downtime in their Microsoft SQL Server and other important application environments, the modest service fee refunds – a 10 percent refund for falling short of 99.95 percent uptime, and a 25-30 percent refund for falling short of 99 percent uptime – may be of little consolation in the event of a cloud outage. According to analyst firm CloudHarmony, Amazon EC2 and Amazon EBS combined had 46 outages ranging from 19 seconds to 2.8 hours from mid-June 2014 to mid-June 2015. Microsoft Azure Virtual Machines and Object Storage experienced 242 outages ranging from 10.4 minutes to 13.16 hours during the same period.
High availability in cloud environments
To make the cloud practical for business critical applications, you need a way to mitigate downtime using high availability protection – traditionally failover clusters. In a failover cluster, two or more servers are configured with shared storage – typically a SAN. In the event of a failure on the primary server, software such as Windows Server Failover Clustering moves the application operation to the secondary server. Since both servers share storage, operations can continue without data loss. Seamless failover/failback also enables software updates and patches to be installed while minimising downtime associated with planned maintenance.
The problem is, in most cloud environments, including in both AWS and Azure, cluster-aware shared storage is not available. This gives DB administrators two basic options: keep Microsoft SQL Server and other critical applications on-premise or add replication software to create a SAN-less cluster in the cloud.
Clusters for high availability within the cloud
SAN-less clusters offer a simple, highly efficient way to implement a failover cluster in a cloud. You simply use purpose-built SAN-less clustering software or add it as an ingredient to your Windows Server Failover Clustering environment. The software uses efficient replication to synchronise storage in two or more servers may it be physical, virtual, or cloud.
By continuously synchronising the data from primary to remote storage using real time, block-level replication, the storage appears to WSFC as a traditional SAN regardless of the type of storage or where it is located. SAN-less clustering software is designed to be storage agnostic; that is, it is capable of working with the local or direct-attached storage normally used in public clouds, as well as with SANs iSCSI storage and network-attached storage.
Of significance in HA cloud configurations, synchronisation software also handles write acknowledgements in a way that assures satisfactory performance over a WAN link to an Availability Zone or Fault Domain in a distant data centre. Some solutions even offer data compression and advanced bandwidth management techniques to further improve WAN performance.
Being agnostic to storage systems also facilitates use of hybrid cloud configurations where, for example, a cluster protecting SQL applications in an enterprise data centre using a SAN can be extended to a cluster node in a cloud. This configuration provides a cost-efficient DR option without the cost and complexity of managing your own secondary data centre.
Companies can use SAN-less clustering software solutions that are fully integrated with WSFC, enabling them to implement them in a cloud without the need for specialised training or changes to standard IT operations. Other SAN-less clustering software can be used to support Linux (as well as Windows) environments where it monitors the complete application stack, manages application failover, and synchronises storage. It enables complete configuration flexibility and provides a simple, cost-efficient HA and DR solution where traditional clusters are impossible or impractical.
The ability to leverage the familiar and proven Windows Server Failover Clustering technology in both Amazon Web Services and Microsoft Azure clouds makes SAN-less clustering software an affordable solution that is worth considering for SQL Server and other business critical applications.