Architecting Microsoft Exchange for Maximum Availability
By VERITAS Software Corporation (Issue 3 2001)
This article highlights the components of a Microsoft Exchange 2000 system, provides tips on capacity planning, and describes how to achieve basic, enhanced, and high availability on Exchange systems.
Most organizations that rely on Windows® -based messaging systems have the same requirements for data availability as those using UNIX® -based systems. Historically, system administrators have considered UNIX environments more stable and robust than Windows-based systems. The typical reason given was that the UNIX base operating system was more stable.
The UNIX environment provides a level of flexibility that can reduce or eliminate certain aspects of downtime. Today, a series of software products developed by VERITAS makes these same levels of flexibility available in Windows-based Microsoft® Exchange® systems.
Components of an Exchange 2000 system
Availability can easily be defined as the ability to access or process data. Even basic messaging systems include many components. The question is whether all system components must be available all the time. Exchange functionality relies on the components shown in Figure 1 .
Figure 1. Components of an Exchange 2000 system
Administrators can address the challenge of system availability by compartmentalizing the various services and their dependencies, and applying intelligent hardware and software design to meet the desired level of availability.
Requisite Windows 2000 services
In addition to the basic operating system services required for the application to operate, Exchange 2000 relies on Windows 2000 for several critical services.
Domain controllers. Domain controllers contain an instance of Active Directory® (configuration, schema, and domain naming contexts) within their domain. Exchange 2000 will source and bind to a Windows 2000 domain controller during startup to gather information about the Exchange environment from the configuration naming context. The Exchange server will continue to reference this domain controller during the operations.
Global Catalog. The Global Catalog holds a partial set of attributes within all domains in the forest. Exchange 2000 uses the Global Catalog services for message routing and mailbox lookup. For example, Exchange 2000 adds 126 attributes to the Global Catalog during the initial installation.
Network services. Basic network services are also critical to proper operation of Exchange 2000. Domain Name Service (DNS) in particular is heavily relied upon for basic operations such as starting Exchange services.
Exchange 2000 storage groups
The ability to segment databases into groups that contain several databases enables much faster recovery in the event of database corruption or component failure. Storage groups, which are contained within an Exchange server, allow administrators to aggregate multiple databases and treat them as a logical unit. This capability greatly eases some scalability and reliability issues for large databases.
The storage group itself has only a few properties, such as the system path and log file locations and an option for enabling circular logging. The individual databases retain their existing property settings. Up to five databases can be hosted within a single storage group, and up to four storage groups may be hosted on a single server. A single Exchange server can have up to 20 database stores.
Storage groups contain mailbox and public folder stores. Any particular storage group can contain any combination of mailbox and public stores. Each storage group actually runs as a separate instance of store.exe. Although all databases within a storage group run under one store.exe instance, individual databases can be mounted and dismounted without affecting other databases in the same storage group.
This setup allows recovery and maintenance of individual databases. However, each storage group contains one set of transaction logs that is shared among all databases in the storage group.
The method by which messages move among various servers within an organization has changed significantly in Exchange 2000, which uses the concept of routing groups. These groups supply single-hop routing of messages and an Open Shortest Path First (OSPF) style of message routing between those groups. In addition, Simple Mail Transfer Protocol (SMTP) is the primary transport for both internal and external messages.
Message transport relies heavily on the Global Catalog, DNS, Internet Information Services (IIS) and the Information Store. If any one of these components fails, message delivery halts. Thus, when dealing with the availability of message transport, consider clustering some components, such as IIS, for redundancy.
Users' ability to access messaging services is critical to any messaging system. Failure of the network medium between client and server and other components may affect a user's access to private and public stores. Clients may access the messaging service through several mechanisms, including Messaging Application Programming Interface (MAPI), HTTP, Internet Message Access Protocol (IMAP), and point of presence (POP).
Exchange 2000 offers a front-end/back-end configuration that facilitates user access. One of the primary benefits of the front-end server is to increase overall availability by disassociating the client from the namespace of the back-end servers. Availability of front-end servers is achieved through a distribution of the servers in conjunction with either DNS round-robin or Network Load Balancing (NLB) to distribute requests among the front-end servers.
From the client perspective, two factors affect Exchange 2000 availability: direct access to the servers that host the private and public stores, and the use of front-end servers that increase availability (and security) for clients that access Exchange using Internet protocols such as POP, IMAP, and HTTP.
Administrators can make message store servers (that is, back-end servers) directly available through clustering. Normally, this configuration offers little benefit to performance, but it also does not present a penalty. When the client-access topology calls for the implementation of a front-end/back-end configuration, front-end server availability will also become an issue.
Planning Exchange system capacity
Although administrators often must retrofit messaging systems to increase availability, the easiest way to achieve the desired availability level is to include the necessary requirements during the planning phase. Capacity refers to the number of users a system can host in terms of its processing (speed, memory, and bandwidth) and storage capabilities.
Transport capacity is significant, even when using a single server. Routing groups ensure single-hop message routing between member servers; the improvements to SMTP, such as pipelining1 and chunking,2 help ensure optimum performance.
Although several factors are important for individual client-protocol access, these few key points relate to capacity and configuration:
- All client Internet protocols (IMAP, POP, and HTTP) require no disk resources of the front-end server.
- store.exe can be disabled on the front-end server to conserve memory.
- High bandwidth should be available between the front-end and back-end devices, particularly for HTTP.
- The ratio of front-end to back-end servers is generally 1:4. Also, each group of 5,000 clients should share at least one front-end server. For availability purposes, a minimum of two front-end servers should be available for NLB systems.
- MAPI clients receive a referral the first time they attempt to access Exchange through a front-end server.
The back-end processing, routing, and message delivery for Exchange 2000 far exceeds those capacities in Exchange 5.5. Performance under heavy client load, however, will vary widely depending on the type of client access being incurred and the Exchange topology selected.
Back-end server capacity is strictly a matter of power combined with adequate network throughput. Windows 2000 and Exchange 2000 fully use multiprocessor systems when performing back-end operations. Message routing between routing groups can be optimized after establishing the proper routing group connectors.
Partitioning the store into databases enables faster recovery of individual databases, but the overall size of the data to be restored does not change in a conventional configuration. For example, 50 GB of storage across 10 databases is still 50 GB of storage during a recovery. One best practice for managing Exchange storage groups is to use them sparingly to accomplish specific objectives. Each instance of store.exe commits about 100 MB of RAM and establishes another set of transaction logs.
Storage groups can be mounted independently of one another so that during the recovery process, one storage group can be restored and mounted, then the next, and so on. This enables other parts of the messaging system to be brought up while other groups are being restored.
It is best to partition databases within storage groups according to the tolerance for recovery time. During a recovery from tape, VERITAS® Backup Exec® will recognize the data that has been moved to remote storage and restore only those messages physically present in the store.
Capacity alone is not valuable without fast access to the data, however. Both storage configuration and physical file placement within the storage system assist storage system optimization. On most hardware platforms, the performance advantage of separating the log files that use sequential access from the database and .stm files that use random access is substantial.
Even if disk I/O is not a significant factor, placing these files on different spindles helps administrators analyze performance and determine the costs of user actions. Servers with multiple drives will benefit by using separate spindles for binaries and the system swap file, one spindle for each log file, and separate striped sets containing as many spindles as possible for each database file (both .edb and .stm files). At a minimum, log files and the store databases should be separate. Strictly from a performance perspective, the best performance will be gained by a mirrored-striped set. Testing has shown that read performance can be double that of a simple volume, while not sacrificing any write performance.
Some portion of the disk subsystem may still become a bottleneck. Therefore, the ability to address this occurrence without disruption of service is obviously important. VERITAS Volume ManagerTM provides the ability to move data from one physical disk to another with no disruption to data access.
Achieving Exchange system availability
The distributed nature of Exchange 2000 and underlying Windows 2000 services includes some built-in tolerance to failure. Because availability in this context refers to the messaging system as a whole, several steps can be taken to ensure that no part of the system is unavailable for any length of time.
Administrators can approach availability in a tiered fashion: basic, enhanced, and high availability. Basic availability relies on good backup and recovery procedures wrapped into a disaster recovery plan. Enhanced availability removes a single point of failure in storage subsystems by introducing redundancy. High availability adds redundant access to services through clustering and sophisticated server and data management.
Regularly scheduled backups provide basic availability. Backups should be organized to allow recovery of either the entire server or individual databases within any storage group. The backup applet that ships with Windows 2000 works well in small organizations that maintain one or two Exchange servers, but the applet does not support backing up open files, so plans should include database recovery to a separate server if the operating system fails.
VERITAS Backup Exec and VERITAS NetBackupTM include the functionality to implement a more sophisticated disaster recovery solution. Administrators can back up the operating system on a differential basis and the Exchange database files on either a full or a partial basis. A full disaster recovery plan is beyond the scope of this article, but do consider the following: Exchange 2000 provides the ability to segment databases. When segmenting, plan on a maximum database size to allow for recovery from the backup media in the desired amount of time. The limit set will depend on the server and backup device hardware used. This limit will directly affect the number of users that a particular server can host.
Although database partitioning provides a clear advantage during the restoration of an individual database, the Exchange 2000 store.exe must be taken offline during the restore process. Also realize that a complete loss of data on the server will force the complete restoration of data from backup media. Administrators will need to account for the total time necessary to restore all data, regardless of the database size.
Targeted availability. Often, the availability level chosen is the highest common denominator of availability requirements. For example, a few key employees (perhaps executive staff) who need constant and immediate access to mail services would dictate the availability level of the system. This may mean that an entire system must be architected at an availability level higher than would otherwise be needed. However, the Backup Exec Mailbox Level Restore functionality, which facilitates fast recovery of individual mailboxes (even to the message level), lets administrators offer the requisite Service Level Agreement (SLA) to those who need it.
The key attributes of enhanced availability on Exchange systems include:
- Fault-tolerant storage and storage management
- Expanded backup functionality, including improved restores
- Database size management for more efficient backups and better performance
Volume management for enhanced availability. Introducing fault-tolerant storage and management into Exchange will directly affect performance and availability, but will have very little impact on the overall messaging system design. The important topics to address are the configuration of the storage subsystem itself and the management of the storage subsystem and associated data. Apart from a storage area network (SAN), one optimal drive configuration that meets both performance and high-availability objectives follows:
- Operating system. Maintain system files and binaries on a mirrored or duplexed drive set. This approach provides the ability to recover the system by booting from the mirror if a single drive fails.
- Transaction log. Maintain transaction log files on a separate spindle of the disk subsystem for performance, and keep the files distinctly separate from the database for recovery. It is much easier to recover from a corrupt database if the transaction log files are available for replay. Any fault-tolerant disk configuration that includes the system partition is acceptable, but a separate mirrored drive set is preferable.
- Database. For fault tolerance and capacity, maintain the message store databases on at least a RAID-5 disk group. Because databases experience a high level of random access, it is preferable to maintain them on a mirrored striped set. The ability to place each instance of store.exe on a separate instance of a mirrored striped set will further increase performance and reliability, but the storage systems that can accommodate this type of configuration are expensive.
Extend online backups. The Microsoft Exchange agent allows online backups of Exchange servers, including individual mailboxes and messages; however, other files on the server may be online and active during the backup. The Microsoft Exchange agent uses the Exchange backup application programming interfaces (APIs) to fully protect the Information Store and Directory Service. If users store their personal stores (.pst files) on the server rather than locally at their desktop, these files also must be backed up online.
The Open File Option in Backup Exec allows any open file to be protected online while it is active. The Open File Option creates a point-in-time or static view of a volume. Original data is buffered on the static volume when changes are made to the files during a backup. Buffered data, not the changed data, is then backed up. By buffering and backing up the original data, synchronization between files is maintained, and the files can be restored in a consistent state.
Both Backup Exec and NetBackup support high-speed media. This capability enables backing up data to a hard drive very quickly, minimizing the time that system performance is affected. Once data is on disk, data can be archived more leisurely on a tape backup.
Simplify restores with IDR. If a disaster occurs, rebuilding a downed server is time-consuming. Installing a new operating system, installing the backup application, and recataloging tapes requires time and often introduces the chance for mistakes.
Backup Exec and the NetBackup Intelligent Disaster Recovery OptionTM (or IDR) automate this process if they are part of the Exchange protection solution. (Note: IDR is not supported on Microsoft Small Business Server 2000 systems.) IDR works through an agent incorporated into local or remote Windows NT® and Windows 2000 computers. During the boot process-either from diskette or CD-the recovery agents perform a restore from the local or remote server containing the backup media, completely automating the system-recovery process.
Improve recovery time. Administrators can partition Exchange 2000 databases so that corruption and associated recovery minimize impact, and administrators can control database size, which also minimizes downtime for recovery. Still, the ability to store many types of files—video, graphics, audio, and documents—in the Web store means that more data will be stored and the databases will expand.
Remote Storage for Microsoft Exchange makes space available by moving bulky message attachments from the public and private databases of the Exchange Information Store and onto a secondary storage device, such as disk or tape. This activity can reduce backup and restore times. Figure 2 illustrates the Remote Storage for Microsoft Exchange architecture and functionality. The tool has both server and client components:
Figure 2. Remote Storage for Exchange architecture and functionality
Server. Remote Storage Server stores attachment tables and configuration data in a new or existing Microsoft SQL Server 7.0 database. The Remote Storage Server service, which runs on either an existing or dedicated backup media server, sets the policies that will be applied to the attachments. Other Remote Storage for Microsoft Exchange components communicate with the Remote Storage Server.
The Remote Storage Media agent communicates with the media managers of NetBackup DataCenter and Backup Exec for Windows NT 4.0 and Windows 2000 or the hard drive media manager. The Remote Storage Media agent communicates with the Remote Storage for Microsoft Exchange agent to receive attachments that will be relocated to secondary storage. Existing storage devices can be shared among the backup programs and Remote Storage Services, eliminating the need for additional hardware. Suitable media include tapes and disks.
The Remote Storage for Microsoft Exchange agent runs as a service on each Exchange server to be managed. A single Remote Storage Server can manage one or more Exchange servers. A single Exchange server can have attachments relocated to either disk or tape, but not both. Each Exchange server supports single instance storage.
Client. The Microsoft Outlook® Mail client extensions redirect requests for relocated attachments from the Exchange server to the Remote Storage Server. These client files can be "pushed" to the workstations in a mail message. Remote Storage for Microsoft Exchange is transparent to the Outlook user. The icon of a relocated attachment appears with a small clock in the lower left corner, indicating it has been relocated to the Remote Storage Server.
A simple double-click on the attachment icon sends a request to the Remote Storage Server to copy the attachment to the workstation and open it. Remote Storage for Microsoft Exchange provides a dialog that pops up to inform the Outlook user of the current activity and the estimated time to copy the attachment to the workstation.
The Remote Storage for Exchange architecture includes the relocation process and attachment retrieval. Relocation begins when the Remote Storage for Microsoft Exchange agent looks at the client's mailbox, determines which attachments meet the criteria (set by the administrator) for relocation, and migrates them to the Remote Storage Server. A structured query language (SQL) entry is written that contains the information about the physical location of the file, and a request is sent to the Remote Storage Media Server, which writes the attachment to disk or tape.
For the retrieval process, a client requests the attachment. The client extension intercepts the request normally sent to the Exchange server and redirects it to the Remote Storage Server. Then a query is made to the SQL database, which quickly provides the Remote Storage Server with the information necessary to retrieve the attachment and copy it to the client workstation for the Outlook session. A VERITAS data analyzer tool can help administrators determine the number, size, and age of attachments in the Information Store. This tool also allows administrators to run a what-if analysis on the existing attachments to determine the maximum space savings.
Administrators can achieve highly available Exchange systems by introducing redundancy through clustering, site-level protection through replication, and further enhanced backup and restore functionality.
Clustering. Although clustering is the best method to ensure that Exchange Server's private and public stores are consistently available, most organizations will establish a front-end/back-end configuration for servicing POP, IMAP, and HTTP clients. Redundancy of front-end servers combined with DNS round-robin or NLB will increase the availability of Exchange to the clients and also increase performance.
VERITAS ClusterXTM extends cluster management to the enterprise through tightly integrated functions such as single-seat configuration, status, and analysis.
The entity that fails over in a Microsoft Cluster Server (MSCS) environment is a disk group, which stores all of the active Exchange database files. Because only one disk group can exist in a system, the configuration of standard MSCS disk groups limits the functionality of a conventional MSCS cluster. VERITAS Volume Manager for Windows 2000 allows administrators to create flexible storage configurations that integrate with an MSCS cluster.
Volume Manager allows online storage reconfiguration in a clustered environment. This integration allows the cluster service to migrate the required storage automatically between nodes during failover. This capability enhances the functionality of MSCS, especially in an Exchange environment, which often requires adjustments to the storage subsystem and data placement. In addition, because Volume Manager virtualizes the disk subsystem and presents complex arrangements as a single volume to Windows 2000, it is possible to have a more robust disk subsystem presented to the cluster service as the quorum disk.
The quorum disk indicates ownership of a node in the cluster. To own a quorum disk, the server must successfully import the disk group and reserve a majority of SCSI IDs within the disk group. In the optimum configuration, three mirrored sets are marked for quorum disk use. The objective is to use the most desirable disk configuration possible for the cluster's quorum disk. Remember that the cluster is of little use without a robust storage subsystem configuration.
Replication and efficient upgrades. Given that the average messaging infrastructure has a life cycle of only a few years between upgrades, the ability to perform upgrades and migrations efficiently will affect overall system availability. The planned downtime required by most conventional upgrade techniques compromises high availability.
With appropriate configuration and setup, VERITAS Storage ReplicatorTM eases migration of an Exchange server. Storage Replicator allows continuous and timely mirroring of data across local or wide geographical areas. For a planned outage or maintenance, recovery is simply restoring the Exchange data at the secondary server and restarting the Exchange service. The Exchange data at the secondary server is accurate up to the minute, which minimizes data loss.
Accelerated recovery. When dealing with the upper limit of high availability, the amount of time necessary to recover from component failure is critical. Recovery time of a system or application is directly proportionate to how quickly the data can be restored from media, which depends on how current the data is and how quickly the data can be restored.
Backup-to-disk functionality and replication are two mechanisms for accelerating the recovery of applications and restoration of data. NetBackup lets administrators use disks as backup media. Restores from disk can be made many times faster than restores from tape.
VERITAS Storage Replicator enables accelerated recovery by facilitating the migration of Exchange services from a source to a target server. When combined with file-system mirroring, storage replication can bring data online quickly.
Replication is an especially valuable approach to ensure business continuance in the case of a major site outage that takes an entire physical location offline. A separate physical location could host the replica data and NetBackup could produce timely images of the system. For a complete site outage, recovery is as simple as restoring the missing systems from disk to the new host servers that also contain the replica.
VERITAS provides high availability for Exchange users
New capabilities in Exchange 2000 improve system availability. Capacity planning and tools such as those from VERITAS help Exchange system administrators achieve even higher availability of their data and systems while easing system administration. VERITAS products are designed and integrated with one another to solve data-availability issues without introducing complex administration and management paradigms.
VERITAS Software Corporation is a leading provider of data-availability software solutions that enable customers to protect and access their business-critical data.
For more information
For a complete copy of the white paper Microsoft Exchange Without Interruption: Achieving High Availability with VERITAS Solutions , please visit http://www.veritas.com/downloads/news/microsoft_exchange_high_availability.pdf
For more information, service, and support, visit www.veritas.com