Impact of Hyper-Threading Technology on Exchange 2000 Performance
Scott Stanford and Ramesh Radhakrishnan (August 2002)
The Intel® XeonTM processor offers Hyper-Threading technology that can improve the performance of workloads running Microsoft® Exchange 2000 Enterprise Server. In this article, the DellTM team compared performance using relevant PerfMon counters for varying workloads of the MAPI (Messaging Application Programming Interface) Messaging Benchmark 2 (MMB2). The tests evaluate the impact of Hyper-Threading technology and approximate the performance benefits that can be attained using this technology.
Intel® XeonTM processors include Hyper-Threading technology1 to enable improved system and application performance through increased utilization of processor resources. In theory, Hyper-Threading can improve performance by allowing the processor to execute multiple instruction streams with very low or no context-switching overhead.
Hyper-Threading achieves this functionality by duplicating the architectural state of a physical processor so that the processor appears as two logical processors to the operating system. The architectural state consists of the register set, control registers, advanced programmable interrupt control (APIC) registers, and certain machine state registers. All other physical resources in the processor, such as caches, execution units, and buses, are shared between the two logical processors.
Today's commonly used enterprise applications—Web servers, file and print servers, and mail servers, for example—consist of multiple threads or processes that can take advantage of Hyper-Threading technology. These applications can achieve substantial performance improvement on symmetric multiprocessing (SMP) systems by executing the multiple threads on different CPUs. Performance studies by Intel have shown that all the resources of the CPU are not utilized during the execution of such applications.2 Hyper-Threading technology is an attempt to increase utilization of CPU resources by executing these multiple threads on a single processor simultaneously without expensive switching overhead.
Because processor resources are shared between the logical processors, the performance improvement for most applications is not expected to match the improvement obtained by adding an additional CPU to the system. In the worst-case scenario, Hyper-Threading might degrade application performance because of scheduling conflicts and contention for the shared resources. Understanding the application architecture and its execution behavior is an important step in learning how Hyper-Threading technology can affect application performance.
Test examines Exchange 2000 performance
Dell evaluated the performance impact of Hyper-Threading technology on Microsoft® Exchange 2000 Enterprise Server by analyzing system performance under the MAPI (Messaging Application Programming Interface) Messaging Benchmark 2 (MMB2) workload.3 The testing team used the DellTM PowerEdgeTM 4600 system as the mail server to host the MAPI-based workload. To generate a typical office mail workload, the team used the Microsoft LoadSim tool. The hardware setup is illustrated in Figure 1 .
Figure 1. Physical message path for the MMB2 tests on a PowerEdge 4600 server
For these tests, the PowerEdge 4600 server was configured with 4 GB double data rate (DDR) memory and two Intel Xeon processors at 2.2 GHz. On the software side, the server ran Microsoft Windows® 2000 Advanced Server with Service Pack 2 (SP2) operating system and Exchange 2000 Enterprise Server with SP2.
The Exchange-LoadSim configuration
The Dell team conducted the tests in accordance with the MMB2 LoadSim guidelines. To this end, the total number of user mailboxes was distributed evenly across the four databases and logically contained within two storage groups.
Figure 1 shows the physical setup and steps involved during the execution of the MMB2 workload. The events occur in the following sequence:
- Client performs actions (for example, read, write, delete, and move)
- Message passes to server's network interface card (NIC)
- Open Systems Interconnection (OSI) deconstruction occurs and message enters message processing stream
- Message goes to log controller while awaiting categorization and destination processing
- Message is handed to logs after log memory buffer transfer completes
- Logs roll to information store database files (streaming or MAPI)
- Message commits to database
- Message is delivered to client
In all the test cases, RAID-10 was used for the Microsoft Active Directory® database and transaction log RAID volumes. For the light to moderate workloads, RAID-5 was used for the Exchange information store databases while RAID-10 was used for the Exchange information store transaction logs. For the heavy workloads, RAID-0 was used for the Exchange information store transaction logs and the Exchange information store database volumes.
Exchange 2000 with SP1 versus Exchange 2000 with SP2
Although this article focuses on Exchange 2000 Enterprise Server with SP2, the Dell team did run similar MMB2 workload tests on a server using Exchange 2000 Enterprise Server with Service Pack 1 (SP1). Figure 2 shows the time spent by various processes when running the MMB2 workload on a PowerEdge 4600 server with SP1 installed and a PowerEdge 4600 server with SP2 installed. This data was obtained by using the Intel VTuneTM software-profiling tool. These two servers differed only by service pack version; all other configuration parameters were identical.
Figure 2. Comparing Hyper-Threading performance for Exchange 2000 with SP1 and Exchange 2000 with SP2 running on a PowerEdge 4600 server
As illustrated in Figure 2 , Exchange 2000 with SP2 improves performance by efficiently using processor resources and reducing processor idle time. The tests performed by the Dell team show that, on systems with Hyper-Threading enabled, Exchange 2000 with SP2 performs more efficiently than Exchange 2000 with SP1. The threading model used in SP2 is an improved version of the model used in SP1, and the improved model is better mapped to Hyper-Threading technology.
SP2 introduces an asynchronous threading model,4 which does not hold processor resources for extended periods of time and therefore is better suited for Hyper-Threading technology. The SP2 threading model decreased the percentage of processor idle cycles (as illustrated in Figure 2 ), which translates to higher messaging throughput and lower response times for the MMB2 workload. SP1 uses a synchronous threading model that can degrade performance on Hyper-Threading systems by introducing additional idle cycles.
Tests performed by the Dell team show that performance of Exchange 2000 on servers with Hyper-Threading technology could be affected by the version of Service Pack used in the system configuration. For the MMB2 workload used in this study, Hyper-Threading-enabled systems running Exchange 2000 with SP2 will perform better than Hyper-Threading-enabled systems running Exchange 2000 with SP1. Because the logical processors in a Hyper-Threading processor share resources, the right threading model should be used to enable efficient resource sharing and thereby maximize performance benefits.
Light to moderate workloads achieve faster response times
To take advantage of the 10 internal hard drive slots in the PowerEdge 4600 server, the Dell team set up the operating system, Exchange 2000 executables, and paging file on a RAID-1 container in the 1x2 drive bay. The Active Directory database, Active Directory logs, and the Exchange 2000 transaction logs were spread across a total of four RAID-10 containers in the 1x8 drive bay.
The Dell PowerEdge Expandable RAID Controller 3, Dual Channel Integrated (PERC 3/Di), with 128 MB of on-board cache memory and battery backup support, controlled the SCSI channels A and B. The four Exchange information store databases were housed in four Dell PowerVaultTM 210S SCSI disk storage enclosures. Two Dell PERC 3/DC (Dual Channel) cards provided support for four RAID-5 containers.
Figure 3 shows the MMB2 response times for light to moderate workloads; the response times were measured during heavy processor, memory bus, and disk I/O subsystem utilization. Understanding how the user mailbox distribution relates to increasing workloads is important. The first and second database RAID controllers were not equally stressed in the test cases with a lighter workload. In more moderate workload scenarios, RAID controllers were equally stressed, thus minimizing any effect on response time by the disk I/O latencies that may occur at lighter workloads because of a smaller number of spindles. The average disk-per-second write latencies and disk bytes per second illustrate this point and are important to keep in mind when using server response times to evaluate the impact of Hyper-Threading.
Figure 3. Response time for light to moderate MMB2 workloads
At 3000 users, 95th percentile response times jumped to a rate nearly three times that of the 2000-user workload. Such an increase is primarily caused by the I/O patterns for RAID-5, whereby four I/Os per write are required to satisfy the parity and write calculations for that algorithm. As more RAID-5 volumes are stressed during the test, latency impacts that are directly related to the RAID-5 algorithm cause a corresponding increase in response time. MMB2 I/O patterns are approximately 40 percent writes and 60 percent reads, so any I/O requirements for satisfying RAID-5 algorithms are further skewed toward write latencies.5 Administrators may see similar behavior occur as more mail databases are brought online to handle company growth or as users' mailbox storage requirements increase.
Figures 4 and 5 show some of the important PerfMon counters that administrators can use to compare the two configurations: Hyper-Threading enabled and Hyper-Threading disabled. Figure 4 compares the performance counters of the two configurations for the light workload; Figure 5 compares the counters for the moderate workload. These performance counters include the following:
Figure 4. Performance counters for 1000-user workload
Figure 5. Performance counters for 3000-user workload
- Information store send queue size: The number of messages waiting for delivery to Microsoft Internet Information Services (IIS) for Simple Mail Transfer Protocol (SMTP) service and final destination processing
- SMTP server local queue length: The number of messages waiting for delivery to the local information store database (after processing by IIS SMTP categorization)
- Deferred procedure calls (DPCs) queued/second: The rate at which DPCs are added to a processor's queue; DPCs are lower priority interrupts that are handled on a per processor basis
For both light and moderate workloads, the processor utilization and processor queue length were lower when Hyper-Threading was enabled. Similarly, the context switches per second and interrupts per second were slightly higher because additional thread switches occur when Hyper-Threading is enabled. However, because more processor cycles were available to service these interrupts, Hyper-Threading did not degrade system performance.
The queue lengths for Exchange 2000 processes were slightly higher when Hyper-Threading was enabled, but more processing power was available, so the queues were serviced faster and response times improved, as shown in Figure 3 . The higher Exchange-related queue lengths can also be attributed to slight variations in the performance log's sample time ranges.
Heavy workloads experience dramatic performance improvement
For the heavy workloads, the operating system, Exchange 2000 executables, and paging file were set up on a RAID-10 container in the 1x8 drive bay and spanned four disks. The Active Directory database and Active Directory logs were spread across another two RAID-10 containers and spanned the last four disks in the 1x8 drive bay.
The Dell PERC 3/Di card with 128 MB of on-board cache memory and battery backup support controlled the SCSI channels A and B. The four Exchange information store databases were housed in four PowerVault 210S enclosures. One Dell PERC 3/DC card provided support for the Exchange 2000 transaction logs while four Dell PERC 2/Quad Channel (QC) cards managed the four Exchange information store databases housed in four PowerVault 210S enclosures.
In contrast to the light and moderate workloads, the team simulated 7,300 MAPI users in the heavy workload scenario—an increase of more than 3,000 users. Figure 6 shows that the average processor utilization and processor queue length were higher when Hyper-Threading was disabled. Similar to the light and moderate workloads, the heavy workload experienced higher context switches and interrupts per second when Hyper-Threading was enabled. Under such a heavy workload, Hyper-Threading improves performance, as demonstrated by the smaller information store and SMTP server local queue lengths (see Figure 6 ).
Figure 6. Performance counters for 7,300-user workload
When compared to the light and moderate workloads, the heavy workload scenario experiences a dramatic reduction in Exchange-related queue lengths when Hyper-Threading is enabled. The benefits of Hyper-Threading for this heavy workload scenario are threefold:
- Reduced processor queue. Processor queue length was reduced by more than half, thereby reducing the number of waiting threads in the processor queue.
- Reduced information store send queue. Information store send queue size was reduced by a factor of six.
- Reduced SMTP server local queue. SMTP server local queue length was reduced by a factor of 149, which means that enough processor overhead was available to handle waiting message threads and retire them to the final local database store. With Hyper-Threading disabled, not enough processor overhead was available to either the SMTP service or the send queue-related services to efficiently retire pending messages. Furthermore, the processors handled more than 450 additional interrupts per second with Hyper-Threading enabled, thus reducing any potential bottlenecks in completing pending disk I/O transactions. Also, the average length of the SMTP server local queue in the Hyper-Threading disabled state was more than 4,900, an indication that the server was overwhelmed when functioning in just a two-processor, non-Hyper-Threading state.
Hyper-Threading technology can accelerate Exchange 2000 performance
The lower response times and smaller resource queue lengths exhibited in these studies show that Hyper-Threading technology can improve the performance of Exchange 2000 Enterprise Server with SP2. However, this improvement varies based on the size of the load placed on the server and on disk I/O configurations. For a higher load, the benefits of Hyper-Threading technology are more prominent in message throughput. For lighter loads, the performance improvements occur in the form of lower messaging response times.
Scott Stanford (firstname.lastname@example.org) is a systems engineer in the Dell System Performance and Analysis Lab. His current work focuses on Exchange 2000 Server benchmarking. Scott served in the U.S. Peace Corps in Nepal and with the U.S. Army, 24th Infantry Division. Prior to Dell, he worked in the public sector as an information services manager. He holds an M.S. in Community and Regional Planning from the University of Texas at Austin and a B.S. in Parks and Recreation Science from Texas A&M University. Scott is A+ and N+ certified and a Microsoft Certified Systems Engineer (MCSE).
Ramesh Radhakrishnan, Ph.D. (email@example.com) is a design engineer consultant with the Dell System Performance and Analysis Lab. His responsibilities include performance analysis of Dell servers and characterization of enterprise-level benchmarks. Ramesh received a Ph.D. in Computer Engineering from the University of Texas at Austin.
FOR MORE INFORMATION
Microsoft Exchange Server: http://www.microsoft.com/exchange
Microsoft Exchange 2000 MMB2: http://www.microsoft.com/exchange/techinfo/planning/2000/PerfScal.asp
Microsoft LoadSim: http://www.microsoft.com/exchange/downloads/2000/LoadSim.asp