talking about Exchange Server Mail Storage System --- Skill Article
Author/Frank Yu analyst
Guide:
After understanding how the Exchange Server store works and what it does, let's look at some of the management tips for messaging storage systems. After mastering the principle of the administrator will have a deeper understanding of these skills, in the actual work to be confident, easy.
Exchangeselection and design of storage system hardware and software
Let's start by looking at how to select the appropriate disk hardware for the Exchange server's database files and log files.
As described in the previous issue of the log file on the role of database recovery, we learned that when the database is damaged, by restoring backup on tape and using the system's existing log files, the database can be restored to a state before the problem occurred. Sodatabase files and log files need to be stored on different physical disksTo prevent disk hardware failures from causing both the database and the log to become corrupted. Microsoft's documentation clearly states that any one of the databases or logs can be recovered if there is a valid backup. However, if the database and log are damaged at the same time, it is only possible to restore the backup to the state of the backup time.
Typically, the most important server storage systems in an enterprise are RAID arrays that are implemented through hardware systems. Common RAID systems are RAID 5 and RAID 1. The system features of these two systems are as follows:
RAID 5: Write data to disks in the array, parity data is stored on individual disks in the array, allowing a single disk error. RAID 5 is also a data check bit to ensure the security of data, but it is not a separate hard disk to hold the data check bit, but the data segment of the check bit interaction on each hard disk. If any of these drives are damaged, the corrupted data can be rebuilt according to the check digit on the other hard drive. The utilization of the hard disk is (n-1/n)%.
RAID1 the disks in the disk array into the same two groups, mirroring each other, when any disk media failure, can take advantage of the data recovery on its mirror, thereby improving the system's fault tolerance. The operation of the data still uses the block after the parallel transmission mode. So RAID 1 not only improves read and write speed, but also strengthens the system's reliability. However, the disadvantage is that the hard disk utilization is low, and the redundancy degree is 50%.
From the above features, RAID 5 focuses on the security of the data, RAID 1 (mirrored disk) in the security of the data is guaranteed by the premise, the emphasis on read and write speed.
The following figure is Microsoft's recommended storage hardware requirements for the Exchange store system.
As we can see, the database files (EDB files and STM files) are placed on the RAID 5 system, and the log file is stored using each storage group RAID 1 policy.
Microsoft is designed to fully improve the performance of the Exchange store. For database files, these files tend to be very large in size, and they need to be read and written very frequently during routine operations. From a security standpoint, database files are far more important than log files. Therefore, the use of RAID 5 system to save data files, can maximize the data security of the file: in the frequent read and write, the check bit to ensure that the data will not error, in the event of a disk hardware failure, can make the system unaffected.
For the log file, readers should recall the role of the log file we talked about in the previous issue: Make the in-memory transaction write to the hard disk as soon as possible. Exchange log files, in the case of recovery from backup tapes, will only be written once and read once in a lifetime. When writing, Exchange Server writes the in-memory data to a log file in 5MB, which occurs when Exchange Server writes the contents of the log to the database. Therefore, we can find that, for the disk system that holds the log file, its read and write pressure is not very large, but requires very fast write speed. The very fast write speed is guaranteed by two points: first, a RAID 1 system with a faster write speed (compared to RAID 5 does not need to calculate the check bit, which saves a lot of time); second, each storage group exclusively has a RAID 1 system (both raid The 1 array is used only to save the log files of the specific storage group, and there is no use for it, so we have reduced the number of fragments on the disk to a minimum. Ideally, each sector of the log file is next to each other, and when the disk is writing data, it does not need to reposition the head because of disk fragmentation, which maximizes write performance.
After determining the type of disk, we need to plan what size of disk to use. The disk space capacity of the RAID 5 system that holds the database files is determined by the actual number of mailboxes and the size of the mailbox. However, there is a certain amount of free space to be left on this basis. We take 300 users of the enterprise, for example, each user's mailbox size is 100M. Theoretically, the mailbox store has a maximum space footprint of 300*100m, or 30GB. In fact, we also need to consider the following factors:
First: Delete item's retention time. Generally on Exchange Server, we will set the amount of time that deleted messages remain on the server (Store->limit->deletion Settings). This makes it easy for users to restore deleted messages back. The backup structure of Exchange server determines that it is difficult to recover a single message, so setting the delete item retention time can help restore information that was mistakenly deleted. This time is usually set at about 15 days to 30 days. We need to be aware that once this setting is turned on, all deleted messages will not be erased in the database immediately, so this setting takes up a certain amount of disk space. If you set the delete item retention time to 15 days, we need to estimate the number and size of messages deleted by each user over the two-week period for further planning. If set to 15 days, in conservative cases, the number of deleted messages is 30% to 50% of the mailbox. Usually such estimates are inaccurate, if we want to master the dynamics of each mailbox on the server, you can use a product called "Quest Reports", this web-based program will give the administrator a detailed dynamic report of each mailbox capacity. The company's website is: http://www.quest.com/messagestats/
Second: The amount of space required for database maintenance. In our offline fragmentation of the Exchange Server database (Offline defrag), for a database file of size 20GB (EDB file plus STM file), we need an additional 20GB of space to hold the fragmented database files. In addition, when a database repair is required, usually we will make a backup on the server, these space, also need to consider.
Therefore, the capacity of the RAID 5 system that holds the database files is typically 1.5 to twice times the capacity of the number of mailboxes * users.
The size of the log file's disk space is determined by the period of full backup (the system automatically clears the log file when a full backup is made). If the enterprise makes a full backup every week, the log file disk space must be at least one week to produce the log file (this capacity also needs to be accommodated, given the unexpected factors such as a possible failure of the backup, a tape drive failure, and so on). Typically, we can make a mirrored array of 18GB SCSI disks, and then dynamically adjust the time of the full backup based on the growth rate of the log files.
performance detection and optimization of the storage engine
As an administrator, we need to closely monitor the performance state of the Exchange Server store. Some of the performance counters below are what we need to focus on all the time:
Msexchangeis/active User Count Msexchangeis/user Count
|
The above two counters reflect the number of active users and the number of logged-on users on the current server. General, Active User count is always less than user count. Because some system mailboxes are used internally in Exchange Server for inter-server communication, User count is always maintained at around 20, even if no consumer is online, which is normal.
Msexchangeis/rpc averaged Latency Msexchangeis/rpc operations/sec Msexchangeis/rpc packets/sec MSEXCHANGEIS/RPC Requests
|
The above four counters reflect the RPC processing responsiveness of the Exchange Server store. These counters can best reflect the current server load and response speed. RPC operations/sec, RPC packets/sec, respectively, represent the RPC requests received by the server per second (all Outlook MAPI client connections send a large number of RPC requests to the server when they read and send messages). RPC requests represents a request that is currently being processed by Exchange Server, and in general, Exchange Server can handle up to 100 requests at the same time, so if this counter exceeds the 100,exchange Server will have a serious performance problem. The last and most important one, RPC averaged Latency, which represents the average response time of 1024 RPC requests before the current time, in milliseconds,
general, this counter should be less than
-。 If the counter is larger than 100 and lasts for a long time, the client's outlook will become slow or even crash.
There are many factors that affect RPC averaged latency. Performing backups, online defragmentation, anti-virus software scanning databases, and so on, will cause the RPC averaged latency value to rise. In addition, it is worth noting that the improper configuration of the network environment can also cause problems. I have encountered a problem with the severity of the switch port speed mismatch with the speed of the NIC on Exchange server. In detail, the performance of the customer's mail system has suddenly dropped dramatically, and RPC averaged latency has a value of up to 5 digits, and all users cannot open the mailbox. After troubleshooting issues with Exchange and windows, we learned from our customers that they replaced the switch that was connected to Exchange Server the day before. Exchange server is arguably the application-tier software that does not and should not have any dependencies on the data-link layer's devices. But after checking Microsoft's knowledge base, we found this article: "Poor performance when Network Adapter was Set to Auto sense", the article's Knowledge Base number is 330343. As mentioned, for Exchange Server, this can cause serious performance problems if the network card or switch port is set to automatically detect speed. First look at Exchange Server, its network card is set to 100M full duplex, meet the requirements of Microsoft, and then connected to the switch to see, found on the switch with the Exchange Server network card connected to the port, is set to auto automatic detection speed, The current connection condition is 100M half duplex. Instead of a fixed 100M full-duplex setting, the fault disappears immediately, and the value of the RPC averaged latency is restored to less than 20, and the user has no problem sending or receiving mail.
Later we analyzed that for Exchange Server systems, it is possible for Microsoft to transmit RPC information using some specially formatted packets, so there is a high demand for the network link. Switches are generally used directly after power-up, and the settings are often overlooked by administrators.
MSEXCHANGEIS/VM largest Block Size MSEXCHANGEIS/VM Total 16MB Free Blocks MSEXCHANGEIS/VM Total Free Blocks MSEXCHANGEIS/VM Total Large free Block Bytes
|
These four counters are related to the memory usage of the Exchange Server store process. As we all know, on Exchange Server, the Store.exe process is often a large memory consumption, the ESE database engine, in order to improve its performance, need to request a lot of memory as its cache space, on the Exchange Server system with more than 300 users, The physical memory consumption of the Store.exe process is generally above 1GB. In the Windows operating system, memory is divided into physical memory and virtual memory. Physical memory refers to the memory that is installed on the machine, and virtual memory refers to the range of memory that the CPU can address. For Windows 2000来, the size of physical memory is determined by how much memory is installed, and virtual memory is 4GB by default. (For further knowledge of Windows 2000 memory, readers can refer to the sixth chapter of the book Inside Windows 2000: Memory management.) As shown in the left part of the figure below, each process has a 4GB address space, by default, 2GB is the operating system, and 2GB is used for the application.
Exchange server allocates and frees memory frequently in the 2GB user address space It has in the process of running. This causes "fragmentation" of the memory address space: the free space in the memory address becomes discontinuous. Of the four counters above, the VM Largest block size represents the largest contiguous free block of memory in the user's address space; VM Total 16MB free blocks represents the number of contiguous blocks of memory that are larger than 16MB in size; Blocks represents the total number of free memory blocks, and VM Total Large the free block bytes represents the overall amount of spare memory.
When the number of VMS largest Block size is less than 32M, the warning log with number 9582 is recorded in the Event Viewer, and when the VM Total is 16MB free blocks is zero and the maximum allocated memory space is less than 16MB, The error log with number 9582 is recorded in the Event Viewer.
Source:msexchangeis Category:performance id:9582 Type:warning/error Description: The virtual memory necessary to run your Exchange server was fragmented in such a, the performance may be affected. It is highly recommended, restart all Exchange services to correct this issue.
|
This scenario occurs where there is already a large amount of fragmentation in the virtual address space of Exchange Server, and the performance and stability of Exchange Server can be problematic due to the inability to meet memory allocations.
For such questions, you can refer to the Microsoft Knowledge Base Document "Troubleshoot Virtual Memory fragmentation in Exchange 2003 and Exchange 2000" with a document code of 325044. This article analyzes the causes of virtual memory fragmentation and how to deal with them in detail.
To meet the memory requirements of the server software, Microsoft's Windows Advanced Server and data canter versions of the operating system support the expansion of the user address space to 3GB, which can effectively alleviate the problem of virtual memory fragmentation. This feature needs to be changed in the system partition boot. ini to be able to open, the specific method of operation please refer to Microsoft documentation "A Description of the 4 GB RAM Tuning Feature and the physical Address Exte Nsion Switch "whose article is codenamed 291988.
For Exchange Server, Microsoft recommends turning on the operating system's 3GB switch when it installs more than 1GB of RAM, otherwise there may be a performance problem. Reference Documentation:
266096 Exchange REQUIRES/3GB switch with more than 1 GB RAM
328882 Exchange Memory use and THE/3GB switch
Recommendations for performance tuning for Exchange Server Store
1. Ensure that the Exchange server's network card and switch port settings are correct.
2. A server with more than 1GB of physical memory installs Windows Advanced Server Edition and opens the/3GB switch in boot. ini.
3. It should be added that, where possible, the creation of storage Group is minimal. In the previous installment, we know that each storage Group corresponds to an instance of an ESE database engine. In Store.exe, each instance of the ESE database engine that is generated consumes 10M of memory.
the role and considerations of database defragmentation
The running Exchange Server will continuously perform online defragmentation in the background based on the time specified by the administrator (online Defrag). Online defragmentation primarily performs the following actions:
1. Determine if there are deleted mailboxes in the store by querying the Active Directory.
2. Physically delete all messages and mailboxes that exceed the retention time.
3. Perform online defragmentation.
For the first action, Exchange Server initiates a query to the Active Directory to ensure that the user information in the Active Directory is synchronized with the mailbox information that is saved in the Exchange store, and that the Exchange Server makes a special token for the deleted mailbox. This does not impose too many additional burdens on Exchange Server, but it has some pressure on the Active Directory's domain controller. Generally we do an online defragmentation operation at night, so there is no problem with the Active Directory load at this time, but if the Active Directory domain controllers for some large multinational enterprises tend to serve users in each time zone, the time of online defragmentation needs to be carefully adjusted to avoid impacting users.
The second and third operations bring some load to the Exchange server itself, primarily with dense disk operations. During online defragmentation, the user's access to the mailbox can become noticeably slower. When the time of backup and online defragmentation for Exchange Server conflicts, online defragmentation is terminated and restored until the backup is complete. For details on online defragmentation, please refer to the Microsoft Knowledge Base Documentation "Understanding performance and Scalability characteristics of Exchange + MDB online Maintenance ", whose document code is 271222.
Under normal circumstances, online defragmentation will stop at the time specified by the Administrator and note the following in the event log
event:1221 Source:msexchangeis Private Type:information Category:general Description:the database has nnn megabytes of free space after online defragmentation have terminated.
|
This indicates that the size of the fragmentation space contained in the database was discovered and calculated during the Exchange Server online defragmentation process. Online defragmentation only marks the location of the fragments and calculates their space, and does not physically move the data pages to eliminate these fragmentation spaces. If you need to physically eliminate these fragments of space, you need to perform an offline defragmentation. When the fragmentation space shown in the above event reaches a certain scale (accounting for the 10%~15% of the database file), we need to perform an offline defragmentation.
For offline defragmentation, we usually follow the following process:
1. Perform a full backup of the store you are working on before you perform an offline defragmentation
2. Dismount Store
3. Use ESEUTIL/MH to confirm that the EDB and STM files are "Clean shutdown" (discussed in more detail in the previous installment)
4. Perform the following command to defragment
C:/Program files/exchsrvr/bin>eseutil/d X:/exchsrvr/mdbdata/sg1ms1.edb /tX:/exchsrvr/mdbdata/sg1ms1_temp.edb/o/P < Enter >
|
the command will have the following output:
Initiating defragmentation mode ...
Database:f:/exchsrvr/mdbdata/sg1ms1.edb
Streaming file:f:/exchsrvr/mdbdata/sg1ms1. Stm
Temp. Database:f:/exchsrvr/mdbdata/sg1ms1_temp.edb
Temp. Streaming file:f:/exchsrvr/mdbdata/sg1ms1_temp. Stm
Defragmentation Status (% complete)
0 10 20 30 40 50 60 70 80 90 100
|-----|-----|-----|-----|-----|-----|------|------|------|------|
.................................................................................
Note:
It isREQUIREDThat's immediately perform a full backup of this database. IF you restore a backup made before the defragmentation, the database would be rolled the me of that backup.
Operation completed successfully in 13.110 seconds.
The actual time of defragmentation depends on the size of the database file, and in Exchange 2000, the 7~10GB data can be processed in an hour. After defragmentation is complete, the system generates two non-fragmented EDB and STM files based on the file name you have created.
5. Before you mount the new database file, you need to ensure its integrity, and we want to execute the following command
C:/Program files/exchsrvr/bin>eseutil/g x:/exchsrvr/mdbdata/sg1ms1_temp.edb/sx:/exchsrvr/ Mdbdata/sg1ms1_temp.stm < return >
the output is as follows:
Microsoft (R) Exchange Server (TM) Database Utilities Version 6.0
Copyright (C) Microsoft Corporation 1991-2000. All rights Reserved.
Initiating INTEGRITY mode ...
Database:priv1.edb
Streaming File:priv1.stm
Temp. database:tempinteg3976. EDB
Checking database integrity.
Scanning Status (% complete)
0 10 20 30 40 50 60 70 80 90 100
|-----|-----|-----|-----|-----|-----|------|------|------|------|
.................................................................................
Integrity Check successful.
Operation completed successfully in 9.62 seconds.
This operation also takes a long time, and the general speed is 10GB per hour.
6. File name change. Remove the old EDB and STM files from the Mdbdata folder. Replace the defragmented temporary file with the same name as the old EDB file and the STM file. then mount the database.
7. If the Mount database fails, the quickest way to recover it is to copy the old EDB and STM files to the Mdbdata folder. During the defrag process, the old EDB file and the STM file have not been changed, and even if the defrag fails, it can revert to the state before defrag.
For more details on defragmentation, we can refer to the following document:
192185 Xadm:how to defragment with the ESEUTIL Utility
If you avoid corruption of the database files of Exchange Server
For a database corruption problem, it is far more effective than the post facto remedial. Database corruption can generally be divided into physical damage and logical corruption.
Physical damage is often caused by a failure of hardware devices such as disk media, control cards, and so on. This type of corruption can cause data loss, and the only solution is to recover from the backup tape.
In order to ensure data consistency, Exchange server writes the checksum computed from the data content to the actual data when writing to the database (a page written in 4KB). When read, the system recalculates the checksum and compares it with the saved checksum, and if the two values are different, the readout data has changed compared to the data originally written. This change is often caused by disk failure, controller bus transmission failure, and so on. To eliminate interference, when a checksum mismatch occurs, Exchange server reads the page again to disk and loops 16 times. If the checksum is still not matched to the original value 16 times in a row, Exchange server considers the database to have been physically corrupted. In the event log, the following content is logged:
Event id:23
Source:edb
Type:error
Category:database Page Cache
Description:msexchangeis ((455)) Direct read found corrupted page error-1018 ((1:251,563) (0-2295758), 251563 379225672 3 81322824). Please restore the database from a previous backup.
In addition, when the following code appears in the error description of the event log, you can basically assume that the database has been physically corrupted:
-1018 (JET_errReadVerifyFailure)
The data read from disk is not the same as the data and is written to disk.
-1022 (JET_errDiskIO)
The hardware, device driver, or operating system is returning errors.
-510 JET_errLogWriteFail
The log files is an out of disk space or there was a hardware failure with the log file disk.
Physical corruption of the database often leads to loss of data and Exchange Server downtime, and so on. We can take some of the following suggestions to avoid the occurrence of physical damage:
1. Use a high-quality disk and disk control system to properly configure the hardware RAID system.
2. Do not use file-level tools or anti-virus software to scan database files and log files.
3. Avoid using the write cache (Write-back caching) on the disk control card.
4. Perform a full backup on a regular basis. Full backup ensures the security of the data on the one hand, and discovers the physical damage of the database early on the other. When performing a full backup, the Backup program reads each page of the database and recalculates the checksum, and if there is a damaged page, the administrator can identify the problem and take action early.
When physical damage occurs, we can take the following steps to recover:
1. If you have a full backup, be sure to restore from the backup first.
2. In the absence of backup, eseutil/p can be used for manual repair. But it is not recommended that the recovery from backup is the best solution.
For more detailed information about the physical damage to the database, please refer to the Microsoft Knowledge Base Document "Understanding and analyzing-1018, -1019, and-1022 Exchange database Errors", The code of this article is 314917.
Another common type of database corruption is logical corruption. There is no problem with the contents of the database itself, but some internal views and associations can cause logical corruption when problems occur. The symptoms of logical corruption often appear as: Most users use normal, some users click on a specific mailbox folder or mail, there will be a freezing phenomenon. For this type of failure, you can generally use the Isinteg command to fix it.
For more information about the Exchange Server databases, you can read the Microsoft Knowledge Base Document "Overview of Exchange Server database Architecture and database Engine" The code of this article is 217987.