Search word segmentation refers to word segmentation for user-input search words, such as "national belief ". If you do not perform word segmentation, you may not be able to search for things. However, you can only search for things such as "nation", "belief", or "national belief". Therefore, you must perform word segmentation on things.
Two methods:
(1) Space word segmentation, string processing
Word searchCode:
// Search for the original statement of the text field <br/> // term T = new term ("text", phrase ); <br/> // generate a query object <br/> // query q = new termquery (t );
Search for multiple word items separated by spaces after modification
Booleanquery q = new booleanquery (); <br/> string STR [] = phrase. split (""); <br/> for (INT I = 0; I <Str. length; I ++) {<br/> q. add (New termquery (new term ("text", STR [I]), booleanclause. occur. must); <br/>}
(2) Use ik and other Chinese Word Segmentation Components
You can use the same word divider as when creating an index to achieve the best search results.
First, introduce the class related to the word divider.
Import org. Mira. Lucene. analysis. ik_canalyzer; // word divider for search term Segmentation
Import org. Apache. Lucene. analysis. tokenstream;
Import org. Apache. Lucene. analysis. Token;
Main Code:
// Word segmentation component <br/> booleanquery q = new booleanquery (); <br/> stringreader sr = new stringreader (phrase ); <br/> ik_canalyzer ik = new ik_canalyzer (); <br/> tokenstream Ts = IK. tokenstream ("*", Sr); <br/> token T = NULL; <br/> while (t = ts. next ())! = NULL) {<br/> q. add (New termquery (new term ("text", T. termtext (), booleanclause. occur. must); <br/>}