From the perspective of the basic principles of the database, this article analyzes the Exchange Server store module to reveal the working principle and maintenance skills of the Exchange Server mail storage system. This article is suitable for professional IT staff with some Exchange Server Management experience. It aims to enable readers to understand and understand the Exchange Server email system.
Hierarchical relationship between Information Store and Extensible Storage Engine
As we all know, in Exchange Server, Information Store (is) service is crucial. This service controls the operation requests for the mailbox and public folder database.
Furthermore, the database system of the Exchange Server is managed by the database engine extensible storage engine (ESE. This ese engine is specially developed by Microsoft to store non-relational data and has applications in many Microsoft Systems: for example, the ad database (NTDs. windows Dhcp, Windows wins, SRS, etc. The background is supported by the ESE database.
We know that the Exchange Server database is composed of EDB files, STM files, and many log files. In these files, Microsoft uses an internal data structure named "B + tree". One of the tasks of the ESE engine is that when the information store service requests to access the database, convert these requests to read/write access to the internal data structure. The B + Tree features fast access to data stored on disks. One reason Microsoft chose the B + tree as the ESE background structure is to improve the I/O performance when accessing data as much as possible. The structure of these B + trees is transparent to the Exchange Server Store service. Store only needs to send requests to ese, which will perform operations on these data structures.
In addition, as a database system, ESE has the responsibility to support transaction-level operations and maintain the integrity and consistency of the entire database. For modern database systems, when we mention transactions, acid is generally used to describe the features of transactions:
We will discuss in detail how exchange server and ese meet the above requirements in later sections.
For Information Store Service, ESE encapsulates all the details of database operations. IS can be called Based on interfaces provided by ESE. In Exchange Server, the isservice interface is store.exe, and each Storage groupwill generate an ESE engine instance in the store.exe process.
New features of the Exchange Server 2000/2003 Storage System
When Microsoft released Exchange Server 2000, the storage system of Exchange Server was greatly updated and improved.
From the perspective of the ESE engine, the ESE version was upgraded from ESE97 in 5.5 to ESE98 and improved in the following aspects:
1. I/O performance is further optimized and improved
2. added the calculation checksum operation for log files, further reducing the possibility of database errors
3. improved the speed of maintenance tools such as ESEUtil
Compared with the ESE engine behind the scenes, Information Store updates are more noticeable, for example:
1. Provide support for multiple Storage groups and stores on each Server, which is one of the biggest features different from 5.5.
2. The introduction of the stm stream file format in the database improves the performance of Internet Mail Operations
3. Introduction of Web Storage System, users can access the database using multiple protocols
Relationship between EDB files and STM files
In Exchange Server 5.5, the database only has files with the extension of edb. When Exchange Server 5.5 was released, Microsoft focused on the internal Email transmission system of the enterprise. At that time, the main protocol was the MAPI protocol, which was Microsoft's private email protocol, edb databases are specially optimized for this Protocol. Therefore, to support Internet standard SMTP mail formats, Exchange Server 5.5 must convert each time Internet mail is processed into a recognizable format that edb can recognize, resulting in huge performance loss.
In Exchange Server 2000, Microsoft increased support for Internet standard protocol SMTP. Therefore, the storage of Internet-format mail came into being: This is the stm file.
Messages in the MAPI format are based on Microsoft's RPC and binary standards, while messages in the Internet format are composed of plain text headers and MIME-encoded response streams. The features of both determine that they cannot exist in a database structure file.
Therefore, in Exchange Server 2000, Microsoft used the edb file and stm file to save the messages in these two formats, and established associations and references between the edb and stm files. The user's email content is actually composed of contents that span the edb and stm files. It is worth mentioning that in addition to the actual mail information, the edb file also saves information such as the email structure of each user, the content list and view of each folder, and so on. This is different from the place where only the streaming is saved in stm.
We will discuss the use of edb and stm files in the following situations:
1. the user uses Outlook to send and access emails using the MAPI protocol.
2. You can use Internet protocols such as SMTP and POP3 to access the Exchange Server.
Scenario 1:
After an email is submitted to the database from an MAPI client (usually Microsoft Office Outlook), the email content is saved in the edb file.
When a user accesses an email through a client of the MAPI protocol, if the requested email is saved in the edb file, the email is opened directly and returned to the user. If the requested mail is saved in the stm file (the mail is in SMTP format), the Exchange Server database engine will first perform a conversion, convert the data format in the stm file to a format that can be recognized by MAPI, and then send it to the client. This process is called "On-demand Conversion ".
Scenario 2:
You can use SMTP/POP3 clients (such as Outlook Express and FoxMail) to connect to your mailbox. When the SMTP protocol submits an email to the Exchange Server, the content of the email is saved in the stm file. As mentioned above, the edb file contains a list of folders and emails in the user's email address. Therefore, after the email is saved to the stm file, the Database Engine extracts some important information of this email (usually the content in the mail header and the location of the email in the stm file) and saves it to the edb file. This process is called "Property Promotion ". With this process, the user can obtain a complete list of mailbox content. When the MAPI client needs to access the mail in the stm file, the correct storage location of the parts in the stm file can be obtained. When a user uses POP3 protocol to read a mail, if the accessed mail is in the edb file, a Conversion from MAPI to Internet format ("On-demand Conversion") is the same ") it will also happen quietly in the background.
As described above, we know that these two files are closely related in the actual Exchange Server environment. Do not operate these two files separately at any time. Always treat them as a whole. The edb file contains the store tables List of each mailbox. When the client needs to obtain the folder content, it must send a request to the edb file. The two formats of files provide support for the two types of Protocols respectively, effectively reducing unnecessary format conversion.
Roles of Log files
When we talk about mail storage of Exchange Server, we have to talk about its log files. I have heard more than once the administrator of the Exchange Server complain that files are growing every day, which consumes too much hard disk space.
Let's take a look at the functions of these log files. For each Storage Group, Exchange Server generates a series of corresponding log files. These log files are 5 MB in size, with the extension of log. Their prefix is E0x, where x is the Storage Group number corresponding to the log file [footer: although the Storage Group attribute contains the "Log File Prefix" text box, it cannot be changed.] Therefore, the log file prefix of the first Storage Group is E00, the second is E01, and so on. In this way, when there are multiple Storage groups, the administrator can avoid "Zhang guanli Dai" of log files during maintenance ". In addition to consecutive Log files, we can also see files such as E0x. chk, Res1.log, and Res2.log.
Many administrators have a headache for Log files. So what is the purpose of Microsoft to introduce Log files in the database system of Exchange Server? We can look at the following aspects:
1. As an enterprise mail database system, data security and integrity must be ensured. What happens if we crash? We need to be able to minimize data loss to the latest level.
2. The high-performance mail throughput capability must be provided, and the transaction operations on emails in the database must be immediately recorded on the storage medium (transaction persistence ).
3. In the event of a disaster, the database status must be returned to the database state the moment before the disaster.
Now let's take a closer look. When I want to modify the content in my mailbox, the modified content is first read out and put into the memory. The actual modification takes place in the memory. After the modification is completed, the content must be written back to the storage medium to indicate that the modification has been completed successfully.
Such a modification process is called a "transaction" at the database level ". We know that to ensure database integrity and consistency, transaction operations are "atomic level. If a transaction succeeds, it indicates that its changes are permanently saved. If a transaction fails, the system must return to the status before the transaction starts.
When the system completes the modification in the memory, the transaction is not completed. If the database goes down at this time, there will still be no changes to the stored content in the database. So, how can we ensure that the changes completed in the memory can be written to the database at the first time (to meet the Database Transaction persistence Requirements )? Note: if this is the first time, the sooner the better. If we write data directly to an edb file, it cannot be the fastest, because the edb file is usually large, and the I/O system performs random write operations on large files, it takes a lot of time to wait for the disk to find the appropriate channels and sectors. When the system is busy, this will be a bottleneck. Therefore, the database system uses log files. After the changes in the memory are completed, they are first written to the log files. The size of log files is small, and the write performance is far better than that of large edb files. After the write is completed, the transaction is successfully saved on the storage media. The database engine of Exchange Server writes the content in the Log file to the database in the background, because the transaction operation is completed at this time, even if the power is down or the machine is down, the completed transaction will not be lost. This is the first role of a log file: ensure that the transaction can be saved to the non-Easy loss storage medium immediately. (Supports persistent Durable)
According to the above description, we know that the running Exchange Server database is composed of three parts.
-- The content of the log file (Dirt Page) has been modified in the memory ).
-- The log file content has not been written to the database file.
-- Edb and stm files.
The data in the memory (Dirt Page) is lost when the system powers down or crashes.
Exchange Server uses a file named E0x. chk (Check Point) to record the Log files that have been written to the database. This is a pointer-like record.
We can use the command ESEUTIL/MK to view the chk content of this file.
C:/.../Exchsrvr/BIN> ESEUtil/mk "C:/.../Exchsrvr/mdbdata/e00.chk" Microsoft (R) Exchange Server (TM) Database Utilities Version 6.0 Copyright (C) Microsoft Corporation 1991-2000. All Rights Reserved. Initiating file dump mode... Checkpoint file: C:/program files/exchsrvr/mdbdata/e00.chk LastFullBackupCheckpoint: (0x0, 0) Checkpoint: (0x8, 26DA, 30) FullBackup: (0x0, 0) FullBackup time: 00/00/1900 00:00:00 IncBackup: (0x0, 0) IncBackup time: 00/00/1900 00:00:00 Signature: Create time: 03/28/2004 20:26:10 Rand: 6519986 Computer: Env (CircLog, Session, Opentbl, VerPage, Cursors, LogBufs, LogFile, Buffers) (Off, 202,101 00, 1365,101 00, 128,102 40, 40828) Operation completed successfully in 1.47 seconds. |
In the command output, Checkpoint: <0x8, 26DA, 30> indicates the full location of the Log currently submitted to the database file. 0x8 is the serial number of the Log file, which generally corresponds to E0x00008. log. The remaining two parameters are the numbers of the internal pages of the Log file.
Next, let's take a look at the role of log files on system backup and recovery.
As mentioned above, the Exchange Server requires that the status be restored to the one minute before the disaster. For general systems, we always back up data every week or every day. How can we protect the data after the backup and before the disaster? The answer is the log file. We know that any changes to the database will be first written to the log file, and then updated to the database by the log file. We now assume that there is such a system that backs up at every day. After the backup is complete, the system runs normally. If the system fails at noon and the Administrator recovers the system with a tape of AM, the data from AM to AM will, will be filled by the log file. Specifically, after the Backup recovery is completed at AM, the Exchange Server will automatically scan the log folder associated with the store, if a new log exists, the Exchange Server automatically writes the logs to the database in sequence. Therefore, changes made to the database from AM to AM can be recovered. This is the second important role of log files. (Premise is that the circular log function is not enabled) The second role of log is to ensure the integrity of system backup and recovery. Of course, the premise is that loop logs are not used !! (You can see that the dangers of using cyclic logs are quite large. It is meaningless to back up the data several times over your data ?.
Someone may ask, what if the database file and log file are damaged at the same time? The answer is: avoid this situation. First, the probability of Database File Corruption is much higher than that of log files. In addition, Microsoft recommends placing database files and log files on different disks. We will focus on this issue in the next article.
The Administrator complained about log files, which will increase every day and consume a large amount of hard disk space. The only reasonable solution to this problem is regular full backup or Incremental backup for the Storage Group. Because Exchange Server will delete all Log files generated before the backup after the full backup or Incremental backup is completed. It is incorrect for many administrators to manually delete log files or start "loop logs" to reduce disk space consumption. Incomplete log files make the system unable to restore to the nearest state during Backup recovery. If your system performs a full backup once a week and you happen to have deleted some log files after the backup, you may lose the backup data when you need to recover it. Remember, data is always more valuable than disk space.
Startup and shutdown of ESE database engine and Information Store service
When the ESE engine loads a database file, it checks a special flag of the database file. This flag stores whether the database file is normally closed last time. This status is represented by "Consistent" or "Inconsistent. For a normally closed database file, all the content in the Log file and memory should have been submitted to the database file. Only at this time will the database be marked as "Consistent ". Note that the status of a running database must be "Inconsistent", because the Log file must have not been submitted to the database file. For a database that has been closed and is marked as "Inconsistent", it does not mean that the database file is damaged. "Inconsistent" only indicates that, there is also content that has not been written to the database file stored in the Log file.
The command ESEUTIL/MH can be used to check the database shutdown status.
C: // Exchsrvr/BIN> ESEUtil/mh "C: //.../Exchsrvr/mdbdata/priv1.edb" Microsoft (R) Exchange Server (TM) Database Utilities Version 6.0 Copyright (C) Microsoft Corporation 1991-2000. All Rights Reserved. Initiating file dump mode... Database: C:/program files/exchsrvr/mdbdata/priv1.edb File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,9 Engine ulVersion: 0x620,9 Created ulVersion: 0x620,9 DB Signature: Create time: 03/28/2004 20:26:24 Rand: 6536656 Computer: CbDbPage: 4096 Dbtime: 63139 (0-63139) State: Clean Shutdown <----- indicates the status when the database is closed. Log Required: 0-0 Streaming File: Yes Shadowed: Yes Last Objid: 574 ... <Omitted>... Operation completed successfully in 1.391 seconds. |
The "Clean Shutdown" in the State field indicates that the database is in the Consistent State.
When ESE loads the database file, it directly mounts the Store for the database file of "Consistent"; for the database file of "In consistent, ESE will execute the process called "Soft Recovery". In this process, logs that are not submitted to the database file in time will be written into the database. After all the logs are written, the database is marked as "Consistent" and loaded normally.
At the beginning of Soft Recovery, ESE writes Log files based on the location pointed to by the check point file (if the check point file is damaged or does not exist, the database starts with the oldest Log file ). When ESE writes data from the Log file to the Store, it determines whether to write the Log file to the database based on the timestamp dbTime.
In this process, the Event Log contains the following records:
Event Type: Information Event Source: ESE98 Event Category: Logging and Recovery Event ID: 301 Date: 10/17/2001 Time: 5:52:11 AM User: N/ COMPUTER: <SERVER_NAME> Description: Information Store (XXXX) The database engine has begun replaying logfile.../e0014553.log. |
We can also manually perform "Soft Recovery" for "Dis-mount" databases that are already in "Inconsistent ".
The specific command is "eseutil/R", followed by the path of the database file. (We recommend that you run this command after the power is down and restarted. You can run eseutil/MH to determine the database status. If it is "inconsistent", run this command again)
As a result, we can find that the exchange server has the ability to "self-repair" the database that is not properly shut down ". Therefore, ese ensures that the database remains in a recoverable State even when the power is suddenly down, And will automatically complete status detection and recovery after the service is restarted.
The ins and outs of the M Disk
At the release of Exchange Server 2000, Microsoft proposed the concept of "Web Storage System". Its core is to provide multiple ways to access the database of Exchange Server. These methods include:
File System/IFS
-- HTTP WebDAV
-- Exoledb/ADO
-- CDO
Among them, the IFS technology that provides file system services is a controversial module. After the Exchange Server 2000 is installed, an M Disk is displayed. This M Disk is a ing from a database to a file system implemented by Microsoft through the IFS (installable File System) technology. Developers can access the mailbox and emails of the Exchange Server through standard file operation APIs (such as createfile and openfile.
Open the M Disk and you will see a folder named after your current domain name. Add the following to this file and you will see a folder containing all mailboxes named mbx. Under mbx, the mailbox folder is named by the user's name. Under each folder, the content of inbox, Outbox, and other mailboxes can be seen. Each letter is represented by an EML file.
Exifs uses a special share name named //./backofficestorage to point to the database file. You can run "dir //./backofficestorage/domain. Con/mbx" in the command line. The execution result of this command is the same as that of using the M Disk as the drive letter directly.
We can change the drive letter mapped by the Exchange Server by modifying the registry.
Hlkm/system/CurrentControlSet/services/exifs/parameters Name: driveletter Data Type: REG_SZ Value: drive letter for IFS (drive letter, which does not need to be followed by a colon) |
After changing the registry, restart the Information Store service to make the change take effect.
You can also use the following command line tool to change the M Disk ing:
Subst X: //./BackOfficeStorage Note: ing Exchange Store to X Disk Subst/d M: Comment: Delete the ing to M Disk |
If we remove the M Disk, we can still access the database of the Exchange server through the shared name //./backofficestorage.
Exifs runs as a hidden service in windows. The following registry key value defines the parameters of this service:
HLKM/System/CurrentControlSet/Services/ExIFS/Parameters |
As this is a hidden service, we cannot control it through the Service Control Panel. But we can do this through the command line:
NET Start ExIFS Note: Start the service NET Stop ExIFS comment: Stop the service |
The following figure shows the exifs architecture.
Exifs is implemented using the exifs. SYS driver running in Windows kernel mode.
We know that the file system and the store of the Exchange Server are two completely different architectures. Files in the file system only contain a small number of attributes, while emails stored in the store have specific attributes. In the store, there are also very complex associations between emails (mailbox relationships, mailbox folder views, etc ). Therefore, files (emails) in the form of EML in M only reflect a subset of all the attributes and relationships of emails. Some improper operations on the M Disk often damage the internal relationship of the database, resulting in database damage. A typical example is that the anti-virus software scans the M Disk, discovers the suspected virus, and clears it. According to statistics from the Microsoft Technical Support Department, this is one of the main causes of damage to the Exchange Server store database. When virus files (EML files) are cleared by anti-virus software, "brutal Construction" is adopted, which often damages the association and mail structure in the database, and consequently damages the internal structure of the database files.
Another incorrect idea for ExIFS is that the administrator Thinks That backing up M can save the status and all data of the Exchange Server. This is totally incorrect. The M Disk is only a ing of the database content on the file system. The "Files" stored in M are ultimately emails stored in the database. The email Association and relationship in the database are removed because it is mapped to the M Disk. backing up the M Disk cannot restore all information in the database.