In-Depth
Always There for You
Hardware and software products use different approaches to high availability.
- By Peter Varhol
- 04/01/2008
Thanks to innovative approaches to high availability, users are experiencing almost no downtime with their mission-critical applications. It's not unusual to have an hour or less of downtime with many such applications on an annual basis. Often, e-mail can be down for longer periods of time, but that's mostly due to Internet or connectivity problems rather than server or application issues.
I looked at software solutions from AppAssure Software Inc., SteelEye Technology Inc. and Quest Software Inc., and also briefly delved into hardware approaches offered by Marathon Technologies Corp. and Coyote Point Systems Inc. I found different levels of protection using different approaches.
The software products examined here all make high availability a reality at a relatively low cost. They use standard hardware, and provide a very high level of uptime for critical applications. However, they tend to focus on specific applications, rather than everything running on the server. The hardware products fortify separate parts of the infrastructure against failure or overload.
AppAssure Replay for Exchange
Replay for Exchange provides a backup and rapid refresh solution specifically targeting Exchange, including Exchange 2003 and 2007. It uses a unique method of restoring any component of Exchange, from data to the entire hard disk, which takes minutes rather than hours or days.
REDMOND
RATING |
|
Replay for Exchange |
LifeKeeper for SQL Server |
Recovery Manager for Active Directory |
Documentation
20% |
9.0 |
10.0 |
8.0 |
Installation
20% |
10.0 |
8.0 |
9.0 |
Ease of Use
20% |
9.0 |
9.0 |
9.0 |
Feature Set 20% |
8.0 |
8.0 |
8.0 |
Administration
20% |
9.0 |
8.0 |
9.0 |
Overall
Rating: |
8.8 |
8.6 |
8.6 |
Key: 1: Virtually inoperable or nonexistent 5: Average, performs adequately 10: Exceptional
|
|
|
Replay works within an existing IT infrastructure and in general requires no additional hardware beyond the Replay backup server. Depending on your infrastructure, you may want additional storage space for its backups, or additional servers in case your Exchange server hardware fails altogether, but none of that's strictly required. It all depends on the type of failures you might have, and how quickly you want to get back online. For corrupted mailboxes, you can just drag another server out of the closet. You don't need any other hardware and your downtime will be measured in minutes.
Replay works by recording data changes at the level of the physical disk block. It takes a complete snapshot at the beginning of the backup period, and then does continuous incremental updates. Because it works at the physical block level, these updates tend to be fast. When you restore, whether it be a mailbox or an entire server, the restore process is especially fast. Replay continuously updates the image of the entire server, enabling system and bare-metal recoveries from any point in a few minutes, regardless of the size of the data set.
I used Replay in a virtual environment, so setup efforts weren't an issue. I used the Replay tools to set up and monitor my images, and kept track of the state of those images. I didn't want to bring down the virtualized Exchange server, but I could -- and did -- mount anything from an individual e-mail to an entire server. The individual e-mail restore took only an instant, and the server restore under 15 minutes.
Captured Exchange databases are guaranteed to mount after rollback, because they're at the block level rather than file level. In addition, Replay validates the storage groups by performing an Exchange Information Store Mount Store procedure, which includes a log roll. As an added advantage, you can perform your Exchange backups off-line from the Replay server. You don't need to have a backup window built into the production server in order to do your backups.
Replay also provides a decent set of tools for managing your high-availability Exchange environment. The console displays information about the repository volumes, including their capacity and any alerts. It monitors the health of the production Exchange servers, and lets you open and mount any recovery point on your image.
Replay works only with Exchange 2000, 2003 and 2007 (I tested with 2003). Because it does bare-metal backup and restore based on the underlying Exchange disk format, it's not a solution that's readily transferrable to other applications. However, for Exchange high availability, it's difficult to get simpler or more robust. If e-mail is a mission-critical application for you, this is one of the simpler ways you can use to ensure its availability. Anyone who has experienced Exchange downtime, or spent hours waiting for an Exchange backup to load, will appreciate the flexibility of Replay.
[Click on image for larger view.] |
Figure 1. AppAssure Replay for Exchange: Replay lets you search and restore entire Exchange stores, individual mailboxes and single e-mails.
|
SteelEye LifeKeeper for SQL Server
LifeKeeper provides a backup and recovery solution for SQL Server and Exchange. There are two separate products, depending on whether you want high availability for either SQL Server or Exchange. I looked at the SQL Server version, as it was easier for me to get a test SQL Server environment up and running. LifeKeeper for SQL Server backup software uses server virtualization on an alternative server, so users don't typically notice that the application is running on a different node within a cluster, and don't have to reconfigure their application.
As that implies, LifeKeeper requires additional hardware. You need at least one additional server to serve as the backup. You can install it on heterogeneous server environments, so you don't have to purchase the exact hardware as your production SQL Server system. In fact, my alternative server had less disk space, less memory and a less-powerful processor than the production server.
LifeKeeper actively monitors and protects all resources required by the SQL Server application, and actively checks that the application itself is running. If there's a failure, it can switch over to the backup server almost instantaneously.
Installing LifeKeeper took several hours, in part because I also had to install and configure SQL Server on my systems before I installed the software. But once you get it up and running, it performs well. Steeleye provided me with the services of a support engineer to assist in the installation process, and assured me they provided the same service to everyone evaluating the software. I chose to work remotely with the engineer, who helped me through an hour-long setup of the basic software. However, the documentation is clear enough for you to do it without assistance.
[Click on image for larger view.] |
Figure 2. SteelEye LifeKeeper for SQL Server: LifeKeeper lets administrators look at SQL Server status and properties to determine the health of the production database. |
LifeKeeper uses floating IP addresses, which move across with applications on failover. This approach removes any need for clients to reconfigure. It also ensures that all necessary files are available on both servers for the application to restart where it left off previously. LifeKeeper actively monitors and protects all resources required by SQL Server, so that the switchover happens quickly and with no user or administrative interaction.
The support engineer and product documentation both told me that I could physically pull the power cord on the server, and the failover would work just fine, but I couldn't bring myself to treat a server like that. I did simulate some failures, such as a network disconnect, and the alternative server jumped in as the production server with no discernable pause. I pinged the IP address while I did this, and usually I lost a single packet at most. In practice, you almost certainly won't lose data on a LAN.
LifeKeeper is a more complex solution to set up and administer, but it also offers the potential for better uptime. It works with whatever additional server hardware you have available and you can implement it fairly quickly. Because it can protect SQL Server systems (in addition to Exchange), it offers high availability for the back-end of a wide range of applications. Anyone looking to improve uptime with SQL Server should look into LifeKeeper.
Quest Recovery Manager for Active Directory
Recovery Manager for Active Directory offers an easy-to-use solution for fast, granular, online recovery of Microsoft Active Directory services in the event of software failure or human error. Rather than manually rebuilding those services, Recovery Manager provides a way to quickly diagnose the problem and automate the recovery.
The product provides automated backup and online, granular recovery of AD, System State and Group Policy data. It integrates with existing AD administration tools, including AD Users & Computers, AD Sites & Services and AD Domains & Trusts. You can also script jobs using Microsoft PowerShell to automate certain processes.
[Click on image for larger view.] |
Figure 3. Quest Recovery Manager: Recovery Manager for Active Directory provides backup and restore capabilities for AD objects. |
I installed Recovery Manager on my administrative system for my AD, and began the automated backup process. Recovery Manager provides wizard-based procedures for recovering AD. It also lets you restore individual directory objects, object attributes and the entire AD database remotely, without taking domain controllers offline or bothering users logged on to the network. You can schedule the creation of backups during off-peak hours.
I chose to leave my backups on my domain controller, although I could have easily offloaded them to another server or storage unit. I let the backup process run for two days, including several incremental backups, in order to better simulate a live environment. During the two days, I used the Recovery Manager console to determine what had changed in AD through snapshot comparisons of backups with the live network.
Marathon everRun |
The everRun HA solution from Marathon Technologies Corp. offers a unique approach to high availability that combines virtual images with technology that enables two appropriately configured servers to run the same application at almost the exact same memory and processor state.
Consider two servers in a production IT environment. From an external point of view, these servers are performing identical tasks, running the same application image. One of these servers -- the primary -- is running the applications in real time. The other is the secondary server. All disk writes are mapped to both servers. If there's a device failure on the primary server, everRun copies the contents of memory and the processor state over to the secondary server using a private Ethernet connection. Together, these two servers make up a single virtual server, which is what the application user sees. The user logs on to one application, and reads and writes data located in one place. everRun handles the simultaneous access to both physical servers.
Using the everRun software is easy. You can log on and see a console that shows the details of the hardware configuration. For each device, you see two configurations: one for the primary server, one for the secondary. You can also display a graphic of the two servers with the connections between them, and each of the storage devices on each box. This window provides you with the status of each device. From this console, you can take devices and even entire servers offline.
Marathon's everRun products provide a unique way to achieve availability and fault tolerance through the use of virtual images and hard-wired communications between the servers. Its approach lets you use standard hardware and, in fact, possibly hardware you already have. But the results are impressive in terms of their performance and response time in case of a failure. If you're looking for high availability, this represents a lower-cost way than many traditional solutions.
-P.V.
|
Last, I made sure I had my original AD files backed up before I started deleting user and permission objects. At the next backup cycle, Recovery Manager determined that those objects were missing, and alerted me on the console. From the console, I was able to initiate a restore for those files (or the entire directory structure if I had to), and was able to repair the damage I did to my network in just a few minutes.
Quest bundles Recovery Manager with Active Administrator in an offering called Availability Suite for Active Directory. Together, they provide a health solution for AD networks that combines monitoring the state of the AD and recovering from problems. You know when a problem occurs, why it occurred and how to resolve it.
Coyote Point Systems Equalizer E350si |
The Equalizer E350si focuses on high availability from the standpoint of managing application traffic. By load balancing across a cluster or server farm, the E350si makes it possible to achieve both uptime and good response rates for the applications it's managing.
|
Coyote Point Equalizer E350si: The E350si parcels out work to multiple Web servers to help prevent overload. |
I looked at the Equalizer 350si-R, which is billed as offering near gigabit performance for cluster management with Web applications. I didn't have much of a load to balance on my Web server, so I concentrated on getting the hardware on my network and monitoring traffic. E350si-R virtual clusters provide a single IP address to clients on your network, in effect letting you to set up and run an application on multiple servers while making it look like a single server to the users.
Setup was easy as the E350si-R comes ready to install out of the box. It appears to run a version of Linux that boots right up and gives you a simplified configuration console. I inserted it onto my test network by placing it on a switch in front of two servers. I then used a test client to generate automated random requests to the Web server (thanks, Visual Studio Team System), and found to my satisfaction that traffic started showing up on both Web servers.
The Equalizer also helps when you incorporate SSL security in the application after the fact. Processing for SSL is moved away from the servers to the network, and the result is that applications run faster and are more responsive. I didn't test this feature, but I did note that you can apply SSL at the E350si-R.
Coyote Point offers a range of products from department- to enterprise-level load balancers. If you need to make sure that your Web server doesn't fall over at unexpectedly high levels of traffic, and especially if you haven't load-tested a new application, the E350si-R can help ensure uptime under peak loads.
-P.V.
|
Losing your AD files can be a catastrophic affair that can take days to fully restore, especially on large networks. Recovery often means creating files over again from scratch, and possibly never getting them exactly right.
Having automatic backups and being able to easily restore files and get the network operating properly again provides a high-availability solution for your Microsoft network.