High Availability for Microsoft Applications

Home / White Papers / High Availability for Microsoft Applications

High availability describes a concept that can also be called operational continuity over a particular period of time. This level of service is reflected by the amount of up time you are able to provide to your suppliers, clients and staff to have access to the system as a whole, the services you provide and the core of your business; the applications. Availability is completely subjective and really depends on your business’s needs and the success of your operational continuity is a combination of achieving business goals and how well you deal with your technical requirements. As you optimize the performance of your Microsoft applications your target needs to be to keep your applications and workloads up and running by combining the performance of hardware platforms and mission critical applications.

Microsoft commitment to high availability

Microsoft is fully committed to maintaining high availability, check their websites if you are not convinced. You will find that they offer a series of recommendations about how to prepare your network for high availability, here are some key recommendations:

Always select the most robust hardware platform that your budget can afford, this will help you greatly
Adopt the Windows Server ®2008 High Availability Program for Windows Server ® Enterprise
Follow the Windows Server® for Itanium-Based Systems
Embrace the Windows Server® 2008 Datacenter program

Having followed this advice you will have access to the programs, support and useful guidance from Microsoft that will make sure that you make the right decisions and are able to choose the right hardware platforms that will keep your business running.

Having given its advice on how to deploy the right hardware platforms Microsoft turns its attention to what you need to do at the application level. Microsoft have built in high availability into their applications that could well be essential to your business, this is done by being able to replicate data across multiple instances of the applications. Why this is important is that you can :

Ensure service availability
Guaranty business continuity
Implemented in all versions since Microsoft® Exchange 2007
Implemented in all versions of Microsoft® SQL Server 2005

With the launch of Windows Server 2008 in both the Enterprise and Datacenter editions both shipped with the ability to keep the operating systems and applications able to run in high availability, this is done by clustering the servers, how does this work?

What is a server cluster?

A cluster consists of a group of networked servers that are often called nodes. These nodes are able to run the server content of any of the other servers in their cluster and are ready to take up the load should one or more of the other servers in the network fail because of hardware or software failure. Clustered computers are aware of their peers, continually monitor peer level performance and can make decisions when to take over from a server in the cluster when it is considered, thanks to pre-programmed algorithms, that intervention is required.

What are the key applications that require server clustering?

Business critical applications are at the top of the list for key applications that require the benefits of server clustering. You could make a list of the applications that you run on your network servers and decide on the impact that loss of these applications will have, these can include;

Critical

Online banking services
@-commerce applications
Company email applications
All line-of-business (LOB) class applications

Serious

Unified communications and conferencing applications
CRM systems
Collaborative applications for example Microsoft SharePoint

What can cause service outage and the application stops responding?

It is important to take into account two loss of service scenarios, in order to do this analysis it is important to spend a moment talking about the OSI seven layer model, basically each layer defines a function of an element in the network, starting from the most basic and finishing with the top level or most complex elements. On this basis we find that the physical cabling of your network is categorized as Layer 1 networking and the applications that provide the programs that users access as part of their daily work are categorized at Layer 7, with 6 other layers in between. Now let’s focus on the 2 Layers that are important for high availability, these are:

Layer 4 the network layer
Layer 7 the application layer

Layer 4 load balancers check the performance of the servers themselves, each server could be virtualized and running many applications over 300 or 400. Layer 4 load balancers are able to monitor the health of the server and decide whether to take it out of the network or to continue to use it, these load balancers cannot monitor the health of the applications. The result is that the server could be performing perfectly but the application, for example Microsoft Lync, has hung but the load balancer continues sending access requests to it. A good example of a popular Layer 4 load balancer is Windows Network Load Balancer or WNLB.

Layer 7 load balancers health check the performance of the applications on the servers, these servers that come from the hardware and/or virtual load balancer manufacturers for example Kemp Technologies are able to monitor the performance of each individual application and should it fall below the KPIs that have been defined by the network management switch the service to back up servers. This form of load balancing is naturally more precise and effective than only using Layer 4 load balancing.

Microsoft Window Server 2008 Hyper-V® defines the changes

One of the inherent weaknesses of failover clusters is that the applications that as they run they need to monitor the performance of each other, why this is a weakness is that the applications should be optimized to perform for their users and not be obliged to act as “traffic cops” and monitor performance.

Windows Server 2008 has introduced the ability for the supervisor layer to intervene without troubling the application server functions. Windows Server 2008 Hyper-V® offers new frontiers for high availability this functionality is known as quick migration it combines failover clusters with server virtualization and quick migration is aware of virtualized servers and the physical hosts that run them. This combination is a fundamental step forward as it allows for no single physical server to become a vulnerable point of failure for your network.

Dealing with downtime

Downtime effectively occurs for two reasons that are either planned or unplanned. Serious downtime can be caused by:

Network failure either LAN or WAN
A server fault resulting in it becoming offline

Sometimes it is necessary to take servers offline for planned events including maintenance of the hardware or upgrades to applications or server operating systems. Unplanned downtime can take place at any moment and is beyond the control of the IT department administrators. Causes can be minor issues such as a hard disk or power supply that fails to a catastrophic event for example a fire , a flood or an earthquake. One of the important points to take note of is that downtime be it planned or unplanned will eventually take place and it is not a case of if it happens but rather when will it happen.

Making sure your servers are located in a secure setting is of top importance, for example if your servers are located in the parts of the USA that are at risk of suffering the effects of hurricanes the premises should be constructed as hurricane proof. In addition comprehensive firefighting installations should be installed to protect your servers from that risk. However you can never be 100% sure your premises are invincible and so making provision for back up facilities in a different location makes good sense. The intelligent use of geographic load balancers for example the Kemp Technologies Geographic Load Master makes sense to divert traffic to your back up sites should the primary site be taken off line.

Regular server maintenance makes sense and allows to clean up the server and restore it to its original performance levels, having installed backup servers load balancers and therefore increased the redundancy in your network means these server outages will have less effect on your users. If maintenance is not performed minor problems in your servers will eventually grow more serious and the server will stop working. As you plan your back up facilities consider the cost to the business of unplanned downtime both in terms of business lost as well as damage o the image of the organization.

Hardware reliability needs to be measured

Don’t confuse the terms server uptime and server availability, they are two different things. Your servers could be running fine but are not available to the users because a component in your network a router, firewall or WAN equipment could have failed, this counts against server availability. By selecting servers with dual power supplies and multiple network cards you can increase their reliability, however to really achieve a H/A network make sure you install two or more load balancers configured in high availability mode.

Defining the downtime rules

If you ask an IT Manager about the permitted levels of down time the organization targets the reply needs to be more than just a percentage for example 99%. Actual downtime values set on an annual basis are as follows:

99% = 87 hours 35 minutes
99.9% = 8 hours 45 minutes
99.99% = 52 minutes 35 seconds
99,999% = 5 minutes 16 seconds

The cost of minimizing your permitted downtime varies server by server and is more complex because different server functions have a different level of criticality. A print server going off line is more likely to be annoying than critical, however it is a different matter if your mission critical database server fails as the damage to the business is immediate. You should bear these different levels of criticality in mind as you estimate the costs for raising the reliability of your systems because if it will cost you $95,000 to raise your reliability on a server from 99.99% to 99.999% but your business would only loose $1,000 a minute thanks to downtime the investment does not make a good return.

Perhaps the most intelligent method of measuring the server performance is not whether it can handle 80, 100 or 200 sessions simultaneously but the effective time it takes users to complete their transactions. If you offer an ecommerce site where the percentage of users who can complete their transactions at peak traffic periods is too low it is not the number of users who can connect but the number who are unable to complete their purchases successfully that should be the point you care about and intend to resolve because your servers can still be running but you are losing revenue as disappointed potential customers abandon your site.

Microsoft approach to High Availability

High availability solutions owe their success to how much redundancy you deploy in your network to minimize the risk of a single point of failure taking out your mission critical servers. By employing a combination of high performance network servers from leading vendors together with load balancers deployed in high availability mode you can reduce the impact of a server failure. Microsoft have taken the ability of its products and programs to aid the hardware platforms to maintain high availability and keep critical business IT systems up and running.

Advantages of Microsoft High Availability

Microsoft has maintained a strong strategic objective to help its users maintain high availability with the Microsoft applications that they use. High availability is sewn into Microsoft products and programs none more so than Windows Server® 2012 which comes complete with the High Availability Program for Windows Server 2012 Enterprise and Windows Server® 2012 Datacenter. Follow the high availability recommendations for the type of hardware platforms that you should install in order to be sure to deploy a sufficiently powerful and reliable system.

Compared with Microsoft Exchange 2007 the design of Microsoft Exchange 2010 is completely different, the high availability is actually built into the core of the application and this allows the support cluster service availability automatic recovery and data availability on an end-to end basis. The introduction of this new streamlined method of core architecture design known as database availability group (DAG) has meant that the task of cluster implementation and maintenance has been greatly simplified.

The support overhead for IT departments maintaining Microsoft applications and programs on server clusters has been considerably simplified with the later generations of the core applications Lync, SharePoint and Exchange, which is a great benefit in terms of the complexity of the skills IT departments needed just to maintain their clusters and the amount of time necessary for maintenance work.

In addition Microsoft have expanded their recommendations for best practice to extend our beyond the advice for network servers to now include the deployment of hardware network load balancers. In fact since the launch of Microsoft Exchange 2010 network managers were told specifically to use certified network load balancers for example Kemp Technologies Load Master instead of relying on Windows Network Load Balancer (WNLB) as they had done previously for Exchange 2003 or Exchange 2007.

In conclusion Microsoft has continued with its role of acting as a trusted adviser to its users regarding the set up implementation and clustering of servers for Microsoft applications and programs. Over time this role has extended to cover the complete cluster architecture and supporting infrastructure.

Start Powering Your Always-on Application Experience Today

30-Day Free Trial Contact Sales