Best Practices for deploying Full-text indexing

Source: Internet
Author: User

Introduction

Full-text indexing enables powerful and fast searching by retrieving each word in a specified database. This article contains best practices for deploying Full-text indexing with Microsoft®exchange Server.

Preparing Your Exchange Environment

Prepare the Exchange environment to apply to Full-text indexing. The practice is to properly configure the server and ensure that the Exchange organization is stable.

Server Preparation

Before implementing Full-text indexing, you should configure your server to achieve optimal performance, such as by adding enough memory and distributing the larger and frequently visited files on multiple disks.

Server Configuration

Use mirrored redundant array of independent disks (RAID) configuration. Microsoft recommends using the Raid-0+1 configuration. This configuration can achieve optimal performance with redundancy. RAID-5 is not recommended for full-text indexing.

Disk Space Requirements

The Microsoft Search Service (MSSearch) requires that the disk containing the index (also known as a directory) have 15% free disk space at all times. Depending on the type of file you want to store, the size of the directory can vary between 10% and 30% of the database size. If you plan to maintain large amounts of data, consider the database growth rate.

Memory requirements

Add additional 256 megabytes (MB) of RAM to the recommended configuration of the Exchange 2000 server. Microsoft recommends that you do not run Full-text indexing with RAM less than MB.

Placement of files

There are four main categories of full-text indexing files. You can optimize the performance of full-text indexing by arranging the disk locations of these files in the following ways:

The directory directory is the primary index. Each Exchange information store can have a category.
Place the directory on the raid-0+1 array. The directory location can be specified in System Manager when the directory is created.

Property store This is a database that contains the various properties of the items indexed in the catalog. There is only one property store per server.
Put the Full-text property store on the RAID array. By default, these files are installed on the drive on which the Exchange Server resides. To move the property store, use the PSTOREUTL utility located in the program Files/common Files/system/mssearch/bin directory. The process of moving property storage is described in the "Starting Full-Text Indexing" section later in this article.

Temporary files These files contain temporary information used by the Microsoft Search service.
Place the Full-text indexing temp directory on the RAID array. By default, these files are installed on the system drive, which typically does not have the I/O throughput of the RAID array. To move the temporary directory, use the Settmppath.vbs utility located in the program Files/common Files/system/mssearch/bin directory. "Start full-Text Indexing" later in this article describes the process of moving temporary files.

Note If you are in a cluster, the temporary directory must be on a local drive.

Collector logs These files contain log information for the Indexing Service. Each directory has a corresponding set of logs.
The collector logs can be in the default location and can be moved if necessary. You can use the following registry key to specify the preferred location for the collector log: Hklm/software/microsoft/search/1.0/gather/exchangeserver_<instance>/<catalog Name>/streamlogsdirectory

To ensure that the specified directory is valid. If the specified directory is invalid, full-text indexing does not work.

Preparing Your Exchange 2000 organization

Verify that the Exchange 2000 Server or topology is properly configured and functioning before you install Full-text indexing. If you change your Exchange organization after you install Full-text indexing, you need to completely repopulate the index. In addition, the following items are validated:

Simple Mail Transfer Protocol (SMTP) address configuration is stable and functioning correctly. This configuration affects the URL because indexing the object is done by URL.
The server language is set correctly. To verify your language, open Control Panel, click Regional Settings, and check your system's language settings. Full-text indexing refers to the specified server language when hyphenation and intercepting stems (the "travel" search Returns "Travels", "traveled", and "traveling" procedures). Full-text indexing works most easily when the query language matches the language of the indexed file. Because the server language is sometimes used as the query language when the client language is unknown, it is best to match the server language to most documents on the server.
All servers are working properly and the entire organization is connected to a stable connection. Adequate testing should be done to ensure that all servers are properly configured within the organization.
Start Full-Text indexing

Use Exchange System Manager to deploy Full-text indexing. Deployment includes the following tasks:

To create a Full-text index
Optimizing Full-text Indexing
Make a full population
Schedule an incremental population
Enable Full-text indexing queries
Notify users
To create a Full-text index

The initial index must be created before Full-text indexing can be used. In System Manager, browse to the information store that you want to index, right-click it, and then click Create Full-text Index. A dialog box prompts you to select the location of the directory (that is, the index). Specify a location for the directory on the RAID array.

Optimizing Full-text Indexing

Use the following steps to optimize Full-text indexing on an Exchange server. As mentioned earlier, system performance can be enhanced by distributing frequently visited files on a RAID array.

Move the property store.
When you create the first index on the server, Exchange creates a new property store database on the Exchange system drive. Perform each of the following steps to improve performance by moving the property store database file to a RAID array. For each server, this step can only be performed once, because all indexes on a single server use the same property store:

Stop the Microsoft Search service and disable it.
Use the PSTOREUTL utility from a command prompt to move the database to a new drive (see the following example).
Re-enable and restart the Microsoft Search service.
Example:

"C:/Program Files/common Files/system/mssearch/bin/pstoreutl.exe"
Exchangeserver_<servername>-M
"D:/exchsrvr/exchangeserver_<servername>/exchangeserver_<servername>.edb"-l
"D:/exchsrvr/exchangeserver_<servername>"


In the previous example, drive C is the current location of the property store. Drive D is the destination where you want to move the property store.

Move temporary (/temp) directory.
From a command prompt, move the Microsoft Search temp directory (see syntax in the following example).

As mentioned earlier, by default, temporary files for collectors and filters (also known as temp files) are located on the Exchange system drive, which typically does not have the I/O throughput of the RAID array. To move the temporary directory, use the Settmppath.vbs utility located in the program Files/common Files/system/mssearch/bin directory. For each server, this step can only be performed once, because all indexes on a server use the same temporary directory.

Example:

cscript "C:/Program files/common files/system/mssearch/bin/settemppath.vbs" d:/temp


Note If you are in a cluster, the temporary directory must be on a local drive.

Make a full population

After you create an index, you must run a full population (also known as crawling) to populate the index with data. The Resource Usage settings for Full-text indexing are located on the Full-text Indexing tab of the Server Properties dialog box. By default, it is set to low. Microsoft recommends using the default settings. Higher settings do not result in a corresponding effect, and may slow down the client's access to the Exchange server.

If you use the settings with a lower resource, the fill process runs in the background and can be done during business hours. The threads used during the fill use idle processing time. User activity takes precedence on the system. Because Full-text indexing uses only idle cycles, indexing does not significantly slow the client's access to the server. The usual effect of a fill process is that the CPU usage will be close to 100%.

To start a full population:

In Exchange System Manager, browse to the information store that you want to index, right-click it, and then click Start Full Population.
Initial full population may take a long time. If you use a typical Exchange 2000 configuration, the padding performance is typically 10 to 20 messages per second. Performance varies depending on the configuration of the hardware, the type and size of the message, and the available server resources. As a result, the total time required for a full population is a few minutes (for small databases) to several days (for large databases). In addition, the content language of the document on the server also affects the time required for the fill. For example, if the server contains most of the content in an East Asian language document, the time it takes to populate the server is more than five times times the time spent in western language documents.

You can expand the public folder or mailbox store and click Full-text Indexing to view the status of the fill. During the initial population, the state is crawling. You can view the status or check the Microsoft Search message in the Event Viewer to determine whether the fill is complete.

Note Do not stop the full population when it is still in progress. If you must stop the full population, but want to rerun it later, click Pause Population instead of clicking Stop Population.

To pause a full population:

In System Manager, browse to the mailbox or public folder store that you want to index, right-click it, and then click Pause Population.
Set the schedule for incremental fills

Determine how often you want to run the incremental population to update the index. Because an incremental population runs in the background like a full population, regular updates do not significantly affect the response time to the user. Typical schedule settings are incrementally updated at the beginning of each hour. In this case, if the update lasts longer than an hour, the next incremental crawl will begin the next hour.

To set an incremental population schedule:

In System Manager, browse to the mailbox or public folder store that you want to index, right-click it, click Properties, and then click the Full-text Indexing tab.
In Update Interval, select an interval schedule. Typically, you do not need to set Rebuild Interval. Just set the Update Interval is enough.
Enable Full-text indexing queries

When at least one incremental population completes after the initial population, enable indexing to query:

In System Manager, browse to the information store that you want to enable, right-click it, and then click Properties.
Click Full-text Indexing, and then click the This index was currently available for searching by clients.
Notify users

After you install Full-text indexing on a mailbox server, notify users and tell them what they might expect when they run Full-text indexing searches.

Here is a simple email example that you can send to your users;

Respected users:

Full-text indexing (FTI) is now enabled for all mailboxes on the server. When you use the Advanced Find option in Outlook, you may notice some differences-the biggest difference is the speed.

Note that Outlook performs a FTI search only when you use the Advanced Find option on the Tools menu. Use the Find option for a traditional, character-based search. It does not use Full-text search.

To verify that the FTI is turned on, search for "intrusive" words (such as "the"). Because FTI excludes intrusive words, you should get 0 results immediately (assuming all network connections are normal). A disturbing word is a fragment of a word or word. Full-text indexing filters out these words.

The following list describes the other differences between FTI and character-based searches in Outlook 2000:

You can obtain not only the matching of attachments, but also the messages.
You can obtain a match for the related word, which is determined by the picker of the selected language. The picker uses the language setting to determine the words associated with the search word. For example, the English-stemming device considers "tester", "tested" and "tests" to be equivalent, but "testament" is only equivalent to "testaments".
You will not get a match to receive messages since the last fill, daily or hourly, from completion.
Pattern matching is not supported. FTI only search for complete words. The search for "test" does not match "testament", and search for "mod" does not match "model". Wildcard characters are not supported.
Intrusive words are removed from the query. This means that the search for "Michael P" will only search for "Michael" because "P" is a disturbing word; a search for "the truth" will only search for "truth".
Other features are unchanged:

To make and query to search for the combined word, just separate the words with a space. To perform an OR query that returns any one word, separate the words in the query with commas.
You can enclose a phrase in quotation marks to search for an entire phrase. For example, the search for the "ASP files" file matches only the two consecutive words that appear.
If there is no quotation mark, a search of two words returns any entry that contains both words. Searches for "ASP files" (without quotes) return items that have both "ASP" and "file", or words that have a corresponding stem.
Thank you.
e-mail Support group

Using Full-text Search

By using the Outlook client program, users can use Full-text indexing from the Advanced Find option on the Tools menu. Use the Find option to perform only character-based searches.

Full-text indexing behavior

Users who are accustomed to character-based searches will notice that they are different when using Full-text search. In particular, recently received messages do not come out now

The difference between character-based search and Full-text retrieval is also:

A large range of searches is faster than character-based searches.
Search for commonly-found characters is slower than searching for obscure words. In almost all cases, however, full-text retrieval is faster than the old "string lookup" style search, and the cost of the server is much lower.
In addition to the return message, the search can return an attachment.
Full-Text search returns related words, which are determined by the selection of the selected language's stemming device. For example, the picker thinks that "tester", "tested" and "tests" are the same, but that "testament" is only equivalent to "testaments".
The search does not return messages that have been received since the last population was completed.
Pattern matching is not supported, and only whole words are searched. Therefore, the search for "test" does not match "testament", and the search for "mod" does not match "model". Do not use wildcard characters.
The intrusive word is removed from the query. This means that the "Michael" query will only search for "Michael" because "P" is a disturbing word, and the "truth" query will only search "truth".
Search results that are obtained using full-text indexing are not dynamic. This is reflected in two aspects.

If there is a new message that matches the query, the view of the search results is not updated. This is because new messages have not been indexed. It is indexed on the next incremental crawl or fill.
If a user deletes a message from a folder displayed in the search results, the actual message is deleted, but the view of the folder still displays the deleted message. A duplicate query removes deleted messages from the query results.
When Outlook encounters a comma, an OR query is executed. For example, a search for "section, particularly" (without quotes) will find all documents containing "section" or "particularly". To search for "section" followed by "particularly", enclose the phrase in quotation marks, that is, "section particularly".
Managing Full-text Indexing

Move directory (Index)

To move a directory, use the Catutil utility (Catutil.exe) located in the program Files/common Files/system/mssearch/bin directory on the Exchange server. Before running the utility, stop all active Full-text indexes, and then stop and disable the Search service (MSSearch). For help with this utility, at the command prompt, type catutil movecat/?.

Note If you move the directory using this method, the Index Location displayed in Exchange System Manager may not be updated. The directory move is successful and works fine, but Exchange System Manager may not display the original location of the directory correctly. This is just a display error and does not affect the normal operation of Full-text indexing.

Adding users to the Index server

When you add a user to the index server, an incremental population is required to immediately add a new mailbox to the index.

Set the schedule for incremental fills

Determine how often you want to run the incremental population to update the index. Because an incremental population runs in the background like a full population, regular updates do not significantly affect the response time to the user. Typical schedule settings are incrementally updated at the beginning of each hour. If the update lasts longer than one hour, the next incremental crawl will begin the next hour.

When a new full population is required.

The index must be fully populated in the following cases.

The word breaker has changed (the word breaker is used by Full-text indexing to identify where and where a single word begins and ends) in a given text.
Disruptive word changes
Added a new document format filter
The schema file has changed
The stored SMTP address has changed
To make a disaster recovery
During the fill process, the index is still available for full-text queries. Indexes cannot be used for queries until the directory is re-created and a new full population must be deleted. This is only necessary if the existing directory is compromised.

found that the fill process is paused

If the fill process cannot continue, the Microsoft Search service (MSSearch) suspends it. To verify that MSSearch or an administrator has paused the fill, check the event log. Whenever events such as pausing or stopping a fill occur, MSSearch always records the event. For example, if the disk is too full to add a directory or log file, MSSearch will suspend the fill. Typically, you can resolve the problem (for example, free up space on a full drive) and restore the fill. Documents added during a pause are not added to the index until the next fill.

Note that the space on your hard drive is problematic, even if it appears to have plenty of free space. MSSearch uses disk space without restrictions, temporarily decompressing most of the catalogs to incorporate new results and then compressing them again.

Monitoring Full-text Indexing

Use System monitor and performance Logs and Alerts to monitor full-text indexing.

performance objects to monitor

There are five objects that can be monitored to facilitate the evaluation of Full-text indexing:

Microsoft Gatherer
Microsoft Gatherer Projects
Microsoft Search
Microsoft Search Catalogs
Microsoft Search Indexer Catalogs
Useful counters for these objects are described in the next section.

padding-related counters

Use the following counters in Microsoft Gatherer and Microsoft Gatherer Projects objects to monitor index fills:

Microsoft Gatherer:documents Filtered The number of documents that have been filtered, or the number of documents that have been indexed.
Microsoft gatherer:performance level performance levels vary from 1 to 4 and are set by Exchange System Manager. (1 = lowest, 2 = low, 3 = high, 4 = highest)
Microsoft Gatherer:system IO Traffic rate Displays the I/O rate used to determine whether to reduce crawl processing. For more information about I/O diagnostics, see Physical Disk counters.
Microsoft Gatherer:reason to back off the counter shows why the collection service aborted the populated code.
0-Start Running
1-High IO rate
4-Abort on user activity (by default, it is disabled on server installation)
5-Low battery (if you are currently running on battery rather than AC power)
6-low memory (less than 5 MB of memory left in paging file)

Microsoft Gatherer projects:crawl in Progress Flag the flag contains 0 or 1, indicating whether to run crawling: 0 = run crawl, 1 = do not run crawling.
Microsoft Gatherer projects:current Crawl is incremental the flag contains 0 or 1, which indicates whether crawling is crawling incrementally: 0 = incremental crawling, 1 = crawling completely.
Microsoft Gatherer Projects:gatherer paused Flag The flag contains 0 or 1, which indicates whether crawling is paused for crawling: 0 = not pausing crawling, 1 = pausing crawling.
Microsoft Gatherer Projects:urls in History the flag displays the total number of URLs (folders and documents) known to the Full-text index.
Microsoft Gatherer projects:waiting Documents This flag displays the total number of documents waiting to crawl. The number increases as the crawl begins, because the new URL is identified and then reduced during the crawl process.
Microsoft Search Indexer catalogs:merge Progress This flag displays the percent complete of the index merge.
Query-related counters

The following Microsoft Search catalogs counter contains information about Full-text indexing queries. Use them to monitor queries for Full-text indexing.

The Microsoft Search object contains a total of the values in the four separate directories in the Microsoft Search catalogs object.

Microsoft Search catalogs:queries This counter displays the total number of queries executed.
Microsoft Search catalogs:successful Queries This counter displays the number of successfully completed queries.
Microsoft Search Catalogs:results This counter displays the number of rows returned that fit within the range of the query.
Microsoft Search catalogs:failed Queries This counter displays the number of failed queries (for example, queries that contain noise words).
General performance-related counters

Use the following counters to monitor general performance.

CPU usage

Use the following counters to monitor CPU usage. Keep in mind that CPU usage typically reaches 100% during a full population.

Processor:% Processor time the range of this counter is 0到100%.
Process:% Processor time This counter has a range of 0 to *n processors%. Choose from the following examples: "Store" is the Exchange information store process, "MSSearch" is a search, "mssdmn" is the indexing process.
Disk usage

Use the following counters to monitor disk usage. You must have enough free disk space when you run the Full-text index. If you do not have enough disk space, you may cause serious problems. The catalog (index) may be corrupted and other system problems may occur.

Physical Disk:current Disk Queue Length This number should not exceed 1 or 2 spindles per spindle in a diskette system. Long Disk Queue Length indicates a bottleneck. The count should normally be back to 0. A minimal return of 0 or a disk queue length that does not return 0 also indicates a bottleneck.
Physical disk:avg. Disk Sec./read This counter displays the amount of time spent on each disk read. This time value is typically about 10 milliseconds. If the disk is busy, the value is extremely high.
Physical disk:avg. Disk Sec./write This counter displays the amount of time spent on each disk write. Typically, this value is approximately 10 milliseconds, but a RAID array with write-back caching has approximately 1 milliseconds of write time because the information remains in the controller's cache. Also, when the disk becomes busy, the number becomes larger.
Physical Disk:disk Transfers/sec This counter displays the total number of disk writes and reads per second. Most single spindles have a maximum range of 100 to 150 transmissions per second.
Memory usage

Use the following counters to monitor memory usage:

Memory:available Mbytes This counter lists the available memory on the computer.
Process:virtual Address Spaces This counter displays reserved memory (including virtual allocations).
Process:private Bytes This counter displays the total amount of memory allocated for the process, for example, the database cache. The counter does not include handles and shared memory.
Process:working Set This counter displays the actual memory allocated in RAM (equal to the Memory Usage of the Task Manager). Pausing a full-text indexing crawl causes its working set to be reduced to approximately 2 MB.
Paging

Use the following counters to monitor paging:

Memory:pages/sec This counter displays the number of hard paging (to disk) per second. However, you can work with multiple pages in a single disk read/write.
Memory:page Writes/sec This counter displays the number of disk writes per second for paging.
Memory:page Reads/sec This counter shows the number of disk reads per second for paging.
Process:page Faults/sec This counter displays a hard paging error (to disk) and a soft paging error (in memory) per second.
If the PAGES/SEC counter has a higher value (for example, more than 100), see the process that generated most paging errors, as it is likely that corrections will be required. If it is an information store, check:

Database:database Cache Size This counter shows how large the database cache is. A small cache can result in a high paging rate because the cache of the database is exchanged back to disk more frequently.
Trouble shooting

Collector Log

The Collector log file is generated during the fill. These files contain log information about the Indexing Service. They are located in the/exchsrvr/exchangeserver/gatherlogs directory. The name extension is. gthr.

If a particular document cannot be indexed, a record is added to the collector log file, regardless of the cause. Each record lists the file name and error number. To decode the error number, use the Gthrlog.vbs utility located in the/program Files/common files/system/mssearch/bin directory. The syntax for the utility is as follows:

cscript Gthrlog.vbs <filename>


where filename is the name of the. gthr file. Please use the Gthrlog.vbs utility at the command prompt. The results produced by the utility are displayed at the command prompt.

Full-text indexing rules for mixed languages

In mixed-language situations, the rules for full-text indexing are complex. The following conditions explain how various language settings affect indexing behavior. Administrators use these guidelines to determine the cause of the search problem reported by the user.

Language settings for a single message

The language settings of a single message affect indexing behavior in the following ways:

If the message is a MAPI message, it has a zone ID attribute, and the Full-text index uses that value to determine which word breaker to use. This property value is from the Office language settings on the client computer. If the Full-text index cannot find a word breaker that matches the zone ID attribute, it uses the Neutral <0> property.
If the message was created with Distributed Authoring and Versioning (DAV), it uses the "Accept-language" heading to determine the correct area.
If the message does not identify the zone (usually a message from the Internet), it uses the server's system zone. A server is an Exchange 2000 server that is used to store messages, where Full-text indexing is performed.
Language Settings for Attachments

The language settings for attachments affect indexing behavior in the following ways:

If the attachment is a Microsoft Office document, the Full-text index uses the language settings that are used to generate the document.
Language settings for servers running Microsoft windows®2000

The language settings of the server affect indexing behavior in the following ways:

If the message is a non-MAPI message (that is, an Internet message), its zone ID property is not set, and Full-text indexing uses the server's system zone setting to determine which word breaker to use.
Language settings for the client

The client's language settings affect indexing behavior in the following ways:

When you send a query from Outlook, the area ID of the client is also sent. If the area ID of the message does not match the query's region ID, the search results are unpredictable.
Note the language of the Exchange server is irrelevant in the previous case. Client settings are preferred.

Full-text indexing behavior in mixed language environments

The following example illustrates the query behavior for content indexes with various language settings.

All American Language Settings
If you are using U.S. Outlook running on a U.S. client, composing a message and sending it to Exchange 2000 running on a Windows 2000 server with U.S. settings, the following occurs:

Full-text indexing uses the United States word breaker to index messages.
Handle queries from the US client as expected.
Hebrew Client, U.S. Office settings, Hebrew Windows 2000
If you are using Hebrew Outlook to run on a Hebrew client with Office set to United States, compose a message and send it to Exchange 2000 running on a server that is set up in the system zone to the United States, the following occurs:

Full-text indexing uses the United States word breaker to index messages. Because of Office settings, the zone ID attribute for the message defaults to the United States.
The query from the Hebrew client failed because the correct word breaker was not applied to the Hebrew document.
Japanese client, Japanese Office settings, United States Windows 2000
If you are using Japanese Outlook to run on a Japanese client with Office set to United States, compose a message and send it to Exchange 2000 that is running on a server that is set up in the system zone to the United States, and the following occurs:

Full-text indexing uses a Japanese word breaker to index messages.
The query from the Japanese client was successful because the message was indexed and queried with the same area ID.
queries during initialization

During initialization, users can get their own messages but not get query results during the first few minutes of starting an Exchange server with Full-text indexing. This is because the search service MSSearch is loading the index and Exchange is loading the property store. The query returns the results only after these procedures have been completed.

Missing performance counters

Event messages that are similar to the following messages indicate that counters used by System Monitor, Performance Logs, and Alerts are missing. If you receive one of the following messages, you must restore the counters by restarting MSSearch.

Performance monitoring for the Gatherer service cannot to initialized because the counters are not loaded or the shared me Mory object cannot be opened. This is affects availability of the performance counters. Rebooting the system may fix the problem.
Performance monitoring cannot is initialized for the Gatherer object because the counters are not loaded or the shared mem Ory object cannot be opened. This is affects availability of the performance counters. Rebooting the system may fix the problem.
Performance monitoring for the Indexer object cannot is initialized because the counters are not loaded or the shared memo Ry object cannot be opened. Stop and restart the Search service. If This error continues, reinstall the application.
Disk bottlenecks

To avoid disk bottlenecks, use the following guidelines:

Note For more information, see the previous section, "Performance objects to monitor."

Monitor Disk Queue Length.

The expected average queue length is slightly more than the number of spindles in the RAID array.
The length should occasionally drop to zero.
The queue should be occasionally empty. If the queue is not always empty, it indicates a problem.
The average time per disk write and disk read should be close to the expected wait time. The system should take about 10 seconds for disk read/write. If you have a hard disk cache or a RAID controller configured, you may have less time.
Memory bottlenecks

A high paging speed may indicate a memory bottleneck. Check the performance counters listed earlier and monitor them for warning signs. In particular, check the Memory:page writes/sec and Memory:page reads/sec.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: