A 5 part blog series by Zensar’s Architecture Centre of Excellence
In the first part we looked at some of the critical factors that need to be considered to ensure business continuity in IT landscape. Out of the five key areas to focus, let’s us look at High availability architecture in this part.
What does high availability (HA) of IT systems mean?
Being accessible all the time with minimum or no downtime over the specified period of evaluation (i.e. year) is what qualifies for an IT system to be called highly available (HA).
HA is usually specified in terms of uptime such as 99.9% over a year or month and this includes any planned shutdowns for maintenance and systems upgrades. Obviously, there is a higher cost associated with the higher uptime figures. Some factors should be considered while stating uptime requirements:
Do you serve the users only in certain geography/time zone or should it be available globally?
Do you provide services only in certain time window (such as 9am to 6pm weekdays) or are they available 24 by 7?
Do you experience peak loads seasonally such as festive break periods?
What is the risk of revenue loss of going systems down for a certain period depending on what is your business domain (retail, banking etc.)?
What is the percentage of IT budget you can afford compared to the revenue model?
To maintain high availability below are some of the architectural best practices:
Sounds too technical? Let’s look at a simple example,
If a company has a single instance of email server serving 1000 employees which goes down due to some incident of failure in the data center, there will be a single point of failure for Email system. However, it can be avoided with keeping a redundant secondary instance of email server running. Obviously both these instances need to be configured to establish resiliency so the secondary instance can take over the service in case primary instance fails. Also as mentioned earlier both instances should be hosted independently and should be isolated from each other.
How to achieve high availability (HA)?
The main categories of IT infrastructure resources are compute, network and storage. Due to latest advancements in technology, these can be controlled by software i.e. automated systems, hosted on-premises or on cloud or a mix of both. So, maintaining HA becomes challenging in such complex infrastructure combinations.
A graphic below shows the various options. In most cases a combination is implemented:
Clustering – Cluster is a logical group of independent servers (called nodes) connected with each other that provide services to their clients with failover and redundancy features. Cluster implements most of the HA best practices mentioned above and it is normally a pre-built solution provided by the server vendor although it is possible to build a cluster on our own. These solutions take care of aspects such as session and state persistence and sync, transaction management, configuration change replications, location transparency so that the users availing the services get a uniform experience and cluster appears a single logical server.
Hence, many databases, application servers, web servers can be clustered to provide HA. However, clustering has been historically an “on-premises” solution. Looking at the latest trends, many IT operations are migrated to the cloud as “pay per use” model and “Data Centre as a service” models.
Dynamic server farms – Dynamic server farms – Non-clustered dynamic server farms are created out of virtualized infrastructure and typically using mechanisms such as “Infrastructure as Service” and usually in a cloud environment. They can be either public clouds like Amazon AWS, Microsoft Azure or a private cloud built using tools like OpenStack. The difference here is the out of the box features provided in clustering aren’t available here, but different tools and APIs are evolving that can provide each feature individually such as state persistence, caching/storage etc. A myriad of tools in DevOps area are used to provide effective HA using dynamic server farms.
Redundancy using hot, warm or cold standby servers – Redundancy is established by putting in place spare server instances which remain idle in normal operating cases and they become active and start serving requests in case of existing active servers go down i.e. during failover. These redundant standby instances can be in hot, cold or warm state depending on uptime and disaster recovery requirements. Hot server is an active server member capable of serving requests anytime but the load balancer treats it as redundant and only diverts request in case of failover. Warm server is equipped with necessary software installed and is active and running but not any serving requests. The cold server is same as Warm except it is not running.
Load-balancing – Load balancer is the equipment that facades all the server requests and diverts each request to an active node (server) based on algorithm configured such as round-robin, least-utilized etc. They are usually a hardware device but software load balancers are also available. Load balancers help with implementing many of the HA best practices.
Different geographic locations for isolation and independence – Two different geographical locations are utilized (a primary and secondary Data Centre) to host the set of HA servers so that isolation and independence can be achieved. According to some industrial standards the two locations should at least be 50 miles apart.
Asset health monitoring – It is quite vital to monitor the server health to ensure high availability. Some mechanisms like heart beat check (i.e. sending dummy requests to server) are used including key parameters like CPU utilization, storage are also monitored. The automated mechanisms these days send necessary notifications to take corrective actions. Tools like NAGIOS do such monitoring effectively.
Server hardening – As a pre-requisite to HA, the servers should be stress tested for peak loads and necessary auditing/logging mechanisms should be in place to investigate the results.
A diagram below shows a typical deployment/physical architecture that utilizes clustering mechanism for IBM Web and application servers and also uses load balancing, independence and isolation features.
About the Author
Vijaykumar Dixit – Vijay has over two decades’ industry experience and over 15 years in IT. Vijay is a TOGAF9.1 certified practitioner and Oracle master certified JavaEE architect. He holds bachelor’s degree in Engineering and a PG diploma in advanced computing. He is currently part of Zensar’s Architecture Centre of Excellence.
His areas of expertise are enterprise architecture, solution design-consulting, SaaS product development and cloud computing. He has helped clients define & setup technology road-map, establish architecture governance and best practices and achieve increased return on IT spend by promoting Service oriented architectures, cloud migration and applications portfolio review and rationalization. He has been a key contributor in Java based enterprise solutions development and also has a sound background of DotNet (Microsoft), Open source based bespoke development and COTS based integration. He has worked for blue chip clients such as Boots PLC, TNT Logistics, Carphone Warehouse, Verizon & Government organizations such as ONS and NHS in the UK and Liberty, Discovery etc. in South Africa.