Exchange full-text search Overview

Source: Internet
Author: User
Tags format contains create index empty file system query reference table name
Full-text search 0, written in front of this paper is mainly about the Windows2000 Server environment, exchange Server and SharePoint Portal Server-search function of the application, at the same time, to some of the SQL Server under the concept. In this paper, we will only be on some of the basic concept, mainly in the personal understanding and case-based, want to learn more about the details of the data, you can access references of the connection. For the article in the mistakes and shortcomings, please correct me. 1, basic concepts, I believe that a lot of people for full-text search features are all very awe, thought Exchange was able to find a as the PDF in the document Exchange really is too strong. In fact, the Exchange didn't do (or did something about, isn't it didn't do anything, the following will talk about ), full-text search Windows to the next important service MS Search Service implementation of function, exchange only need and MS Search interaction naturally have full-text search function. So MS Enterprise Server in not only the Exchange, shareporit Portal Server and SQL Server and support the full-text search. The Windows, and a service Index Service, also do similar work, to have it to exist, even under Windows File System and IIS can text search, very convenient. I don't know. Can't say Index Service is MS Search Service of a special case, or upside down, I don't see both the architecture and the complete function on what's the difference, this is my puzzle in one of the places. When it comes to full-text search have to clear some basic concepts, we can know the service did something, this implementation-specific functionality when that should be in that one on the level of development. Full-text search is divided into two parts, the work: Full-text indexes (Full-Text Index) and full-text query (Full-Text Query ). We know that in order to improve query efficiency, there is a mechanism called "index (INDEX )", the index storage keywords and the corresponding recorded in the logical storage location in the space. Database Management System (DBMS) in the index table, I believe we can understand, full-text search in the same principle. The full-text index (Full-Text Index: Create an index of the process, A keyword and record in the relationship. To create a complete the index of the information or to incrementally modified property information. Index (Full-Text Catalog): can be considered key storage form of organization. The full-text query (Full-Text Query): Use the index, based on keywords to find the corresponding record. In MS system products results, MS Search Service, and Index Service such services (after the main description Search Service) the main work is to create and maintain index table and index, can be called the search engine; such as Exchange and SQL Server is to record provides storage space, can collectively referred to as the storage engine, storage engine must support MS Search some of the specific interface, you can use. Full-text indexing is the process of MS Search request storage engine, need the index data, analysis of the key, create or maintenance index information; the full-text query is storage engine requests MS Search, MS Search based on keywords return the corresponding record the position of the storage engine Organization these records returned to the caller. So we can learn, if you want to support the full-text search, so the first thing to do is to full text retrieval service provider, support for full-text search storage engine. In Windows Server platform, the Exchange application, these conditions are satisfied, then we can start it! 2, the full-text search under the Development of the Windows platform under the full-text search the basic structure, we can in many aspects related to the development of the work. 2-1, the full-text query full-text search is to use the query, in the storage engine to find meet the conditions of the record. This should be the most familiar is the application of the most, is this paper focus on, behind a special section is described in detail in this part of the function. To point out is that, the full-text query process relies on MS Search and the maintenance of the index data, but it is a storage engine support functions, this part of the application is the storage engines interaction, MS Search the search engine for us is transparent. According to the different storage engine, the query (basically is FREE) is different, but there is no essential difference. 2-2-2, MS Search under ifilter the development of full-text search most feel the magic of the place is, how could he from Word or PDF such binary or specific encoded file to retrieve the text string.? Another problem is that MS Search is not support all of the file format? MS Search does not support all of the file format, even by MS of tough, can't fight against the world hero, but MS Search provides a mechanism, the support of any format document possible, this mechanism is the Index Filter, or ifilter. I personally thought ifilter eventually to the work of the two, one is the read specifies the format of the file, analysis of Internal text content (and fai time ge shi or graphics and other binary accommodating object) and in a document properties (such as: the author, classification, etc ); the second is the participle Parsing Word or Phrase), full-text retrieval high hit rate is not high, the key to see the key generated is reasonable or not, I always feel now after these ifilter for Chinese support is not ideal, retrieve, Chinese often baffled. Under Windows Default of ifilter there are four: according to the text, the HTML type the document, aiming at the Office series documentation, for MIME file. If you want to support other types of documents, you must download this class document developers of ifilter, for example, PDF, you need to the Adobe website download (http://www.adobe.com/support/downloads/product.jsp? Product = 1 & platform = windows ). Unfortunately, if your system is needed to support the document does not ifilter, or your own definition of the document is a, need to develop a custom ifilter. When the time comes to specific issues, do not say anymore. 2-3, development, support for full-text search storage engine 2-1 focus on the user's needs, is the most close to the application in the field of development; 2-2 focus on the storage of content; this part of the concern is the store itself. Simple says that when your job is to develop a storage management system, and hope to use existing search engine Search Service) then you need to retrieve the engine itself dealt. In the introduction of the concept of time mentioned, this kind of interaction is divided into two major categories, A class is created or maintenance index, and the other one is a query. Simply look at the search engine, is divided into three parts: the index support (Indexing Support) /index engine (Index Engine) and query support (querying Support) /find engine (Search Engine ). In creating an index, storage engine main and Indexing Support part of dealing, the index create and maintain a work by Index Engine finish; in full-text query, the main and querying Support with query by Search Engine is complete. The so-called support is to perform a series of interface and a series of interface, in particular, I didn't do nothing, say is nonsense, and that's it. 3, the full-text query guides below will be more detail on the following in the Exchange Server 2000 platform how the full-text query. 3-1-1, configuration, by default the Exchange is don't start the full-text search, need to do some simple configuration. Configuration is the process of the specified storage space (Public Store or Private Store), create an index. In Exchange for the management of the Tool System Manager, these configurations are very easy to do, only three steps. A) To select specific Public Store, and implement the "Create Full-Text Index, this is the need to select a store Index Catalog directory. B) You just created after the full-text index is not any content, do I need to Start Full Population ". This is a time-consuming operation, not immediately effect, it is also because the usual created index information, in the use of time can only be compared fast positioning of the record. C) and then you need to modify the Store properties of "on Full-Text Index" tab, to modify one of the "Update Interval property items, Select Automatic Updates index information interval of time, if you select a "Always run, then it will be updated in real time, of course system overhead, also is relatively large. Of course, can also Start Incremental Population, "to manually update index information. D) In addition to need in the Full-Text Index tab, select This index is currently available for searching by clients property items, this query in the process will use this part create index information. In this part of the function I didn't find a programming interface, so only can manually configure. Want to learn more about the details of the content, you can look at MS of white paper. 3-2, the query syntax description 2-1 mentioned in the query is not a storage engine to offer, usually for a SELECT statement of the expansion of, the following will Exchange Store SQL Example Explain the full-text query. Exchange under the full-text query statement is not complicated, in the Exchange Store in SQL provides the following verbs (Predicate) to support, the WHERE clause (CLAUSE) contains the following verbs, that is, you can continue to full-text search: CONTAINS: the key to complete words matching, the format is, CONTAINS (["propertyNAME" | *.]'s searchspec ') if the "propertyName" part of the * (astrisk), in all of the properties in the search keywords, does not contain "propertyName", in the news or document, the body of the retrieval keywords, the same below. "Searchspec" part contains keywords, wildcards *, for example, 'GOOD *'; also can contain more than one keyword combinations to find support logical operators and OR. For each keyword need to use "(Quote) contains, otherwise, it will produce a syntax error. FREETEXT: CONTAINS similar, but can the keyword in each word or group of words variation form (Loosely) matching format, FREETEXT (["propertyNAME,"] "searchspec ')
        here it's important to focus on matching the keyword transformation form, not the keyword substring (Substring), which means you can find ' roses ' through ' rose ' , but can not find ' Republican ' through ' public '.     formsof: This predicate needs to be included in the contains or FREETEXT predicate, which can be modified to match every transformation form of the keyword, and the form of the keyword is determined by the search engine. Format is,        formsof (type, "string" [, "string"])          the value of the type parameter in Exchange is constant inflectional.     RANK by: This predicate is commonly used to modify contains or FREETEXT predicates to indicate how frequently a keyword appears. Format is,        RANK by CLAUSE (mechanism, Weight)          the clause parameter is weight or coercion,weight, corecion without any explanation. Mechanism parameters that represent behavior, such as weight or multiphy. Weight is a number between 0-1, representing the weight.         when a WHERE clause contains more than one part of contains and FREETEXT, it is useful to have a rank by, which can be considered a match degree. After you add the rank by verb, you can read the value of the "Urn:schemas.microsoft.com:fulltextqueryinfo" property to compare the degree of match between records. I don't know what the maximum value of this attribute is, but from the actual data obtained, the maximum is 128.     to see a complete example:       &nbsp Select "Dav:href", "Urn:schemas.microsoft.com:fulltextqueryinfo:rank"                from Scope (' DEEP traversal of ' "')               WHERE FREETEXT (' program ' or ' software ') RANK by WEIGHT (1.0) or CONTAINS (' Formsof ( inflectional, "Java") and "VB" "RANK by WEIGHT (0.5)     above only describes the content related to Full-text queries, complete store SQL syntax, and refer to the documentation in MSDN.      by the way, exchange mainly provides collaborative services rather than document management, so it is not particularly powerful and flexible to support full-text search. Another server product for MS SharePoint Portal Server, the main function is document management, one of its important applications is the provision of retrieval services, so its Full-text search function more powerful. In addition to the above content, also provides near, Isabout, Rankmethod and other modifiers (Term), you can better control the query conditions. Please refer to the relevant documentation for details.  3-3, ADO & webdav    The query can be executed in two ways, ADO or WebDAV. Here we only enumerate the implementation code to illustrate.     3-3-1, WebDAV.     sends HTTP SEARCH requests, command parameters, and corresponding data to the URL of a specified query as XML documents in a certain format (see the Reference Manual for WebDAV). Examples are as follows:   private System.Xml.XmlDocument sendsearchrequest (System.String surl,system.string squery)
{
System.Net.HttpWebRequest orequest = null;
System.Net.HttpWebResponse oresponse = null;
System.Net.NetworkCredential ocredential = null;
System.IO.Stream ostream = null;
System.Text.UTF8Encoding Oencoder = new System.Text.UTF8Encoding ();
system.byte[] Abdata = null;
System.Xml.XmlDocument xmldoc = null; if (sURL = null | | surl = = String.Empty)
return null;
if (squery = null | | squery = = String.Empty)
return null; Abdata = Oencoder.getbytes (squery);
if (Abdata = null)
return null; ocredential = new NetworkCredential ("Administrator", "Server", String.Empty);
Orequest = (System.Net.HttpWebRequest) webrequest.create (sURL);
if (orequest!= null)
{
Preparing Search Request
Orequest.protocolversion = httpversion.version11;
Orequest.method = @ "SEARCH";
if (ocredential!= null)
Orequest.credentials = ocredential.getcredential (new System.Uri (sURL), String.Empty);
Orequest.contenttype = @ "Text/xml";
Orequest.contentlength = Abdata.length;
Ostream = Orequest.getrequeststream ();
Ostream.write (abdata,0,abdata.length);
Ostream.close ();
Waiting for response
Try
{
Oresponse = (System.Net.HttpWebResponse) orequest.getresponse ();
Orequest = null;
}
catch (System.Exception e)
{
Trace.WriteLine ("sendsearchrequest:" + e.message);
}
if (oresponse!= null)
{
Ostream = Oresponse.getresponsestream ();
Get data from Stream
if (ostream!= null)
{
Try
{
xmldoc = new XmlDocument ();
if (xmldoc!= null)
{
XmlDoc. Load (ostream);
}
}
catch (System.Exception e)
{
Trace.WriteLine ("sendsearchrequest:" + e.message);
}
Ostream.close ();
Ostream = null;
}
}
Oresponse = null;
return xmldoc;
}
The incoming arguments are the root path (which can be considered the table name, here is the HTTP URL) and the query statement (see the previous section) for the query, and the result returned is the XmlDocument instance that contains the query result.    3-3-2, ADO. Exchange provides two types of provider,exoledb and MSDAIPP that can be invoked by ADO, but only MSDAIPP this provider supports full-text retrieval, and when you use MSDAIPP for Full-text retrieval on hosts on Exchange installations,    There are some unknowable errors, such as hanging, so it is recommended to use WebDAV. Examples of ADO are as follows: Private ADODB. RecordsetClass Getqueryresult (System.String surl,system.string squery)
{
ADODB. RecordsetClass rsresult = null;
ADODB. Connectionclass cnnexchange = null;
Adodb.commandclass cmdquery = null;
System.Object objaffectedrecords = Null,objparams = null; if (sURL = null | | surl = = String.Empty)
return null;
if (squery = null | | squery = = String.Empty)
return null; Try
{
Cnnexchange = new Connectionclass ();
if (Cnnexchange = null)
return null;
Cnnexchange.provider = "Provider=msdaipp.dso";
Cnnexchange.open (surl,string.empty,string.empty,0);
}
catch (System.Exception e)
{
Trace.WriteLine ("Getqueryresult:create connection failed!" + e.message);
Cnnexchange = null;
return null;
} cmdquery = new Commandclass ();
if (cmdquery!= null)
{
Cmdquery.activeconnection = Cnnexchange;
Cmdquery.commandtype = Commandtypeenum.adcmdtext;
Cmdquery.commandtext = squery;
Try
{
Rsresult = (ADODB. RecordsetClass) Cmdquery.execute (out Objaffectedrecords,ref objparams,0);
}
catch (System.Exception e)
{
Trace.WriteLine ("Getqueryresult:query data failed!" + e.message);
}
Cmdquery = null;
}
Cnnexchange = null; return rsresult;
}
The incoming arguments are the root path (which can be considered a table name) and the query statement that you want to query, and the result returned is the recordset instance that contains the query results.     It is to be emphasized that MSADIPP provider does not support specifying access username and password when opening a connection. The above is a brief introduction to Full-text search in exchange.  The key points for IFilter and storage engine development will be further elaborated. Reference documentation A, SQL Server architecture (Full-text Support) (http://msdn.microsoft.com/library/en-us/architec/8_ar_sa2_0ehx.asp) B, Using Custom Filter with Index Service (http://msdn.microsoft.com/library/en-us/indexsrv/html/ixufilt_912d.asp) C, Exchagne Store SQL (http://msdn.microsoft.com/library/en-us/wss/wss/_exch2k_sql_web_storage_system_sql.asp)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.