This paper first introduces some of the production of Lucene entity class introduction. This paper focuses on the centralized query method of Lucene.
1, Analysis: Word breaker
The analysis includes some built-in parsers, such as the whitespaceanalyzer of Word segmentation by whitespace, adding stopwrod filtered Stopanalyzer, the most commonly used standardanalyzer.
2, Documet: Documents
Is our source data encapsulation structure, we need to divide the source data into different domains, put into the documet inside, when the search can also specify which fields (field).
3. Directory: Catalogue
This is an abstraction of the directory, which can be a dir (fsdirectory) on the file system, or a piece of memory (ramdirectory), mmapdirectory an index that uses memory mappings. If you put it in memory, you will avoid the IO operation and choose it as needed.
4, IndexWriter : Index writer, that is, the maintenance of the index to read and delete operations of the class
5, Indexreader : Index Reader, for reading the index of the specified directory.
6, Indexsearcher : Index of the search engine, is the user input to the index list to search for a class
It should be noted that this search is the (topdocs) index number, is not a real article.
7, query: Search statements, we need to our query string encapsulated into query can be handed to searcher to search, the smallest unit of inquiry is Term,lucene query there are many kinds of, According to different needs of different query is the choice.
I. Termquery:
If you want to execute a query like: "Include the document of Lucene in the Content field," You can use Termquery:
Term t = new Term ("Content", "Lucene"); Query query = new Termquery (t);
Ii. booleanquery: Queries for "and or" relationships in multiple query
If you want to query this: "Include Java or Perl document in the Content field," You can create two termquery and connect them with Booleanquery:
Termquery termQuery1 = new Termquery (new Term ("content", "Java"); Termquery termquery 2 = new Termquery (new Term ("Content", "Perl"); Booleanquery booleanquery = new Booleanquery (); Booleanquery.add (TermQuery1, BooleanClause.Occur.SHOULD); Booleanquery.add (TermQuery2, BooleanClause.Occur.SHOULD);
Iii. wildcardquery: Wildcard Query
If you want to make a wildcard query on a word, you can use Wildcardquery, wildcard characters include '? ' Match an arbitrary character and ' * ' match 0 or more arbitrary characters, such as you search ' use* ', you may find ' useful ' or ' useless ':
Query query = new Wildcardquery (new Term ("Content", "use*");
Iv. phrasequery: query for words appearing within the specified text distance
You may be interested in the relationship between China and Japan, to find the ' middle ' and ' Day ' close (5 words within the distance) of the article, beyond this distance is not considered, you can:
Phrasequery query = new Phrasequery ();
Query.setslop (5);
Query.add (New Term ("Content", "medium"));
Query.add (New Term ("Content", "Day"));
Then it may search "Sino-Japanese cooperation ...", "China and Japan ...", but not found "a senior Chinese leader said Japan is not flat".
V. Prefixquery: The query word begins with a character
If you want to search for words that start with ' in ', you can use Prefixquery:
Prefixquery query = new Prefixquery (new Term ("Content", "medium");
Vi. Fuzzyquery: Similar search
Fuzzyquery is used to search for similar term, using the Levenshtein algorithm. If you want to search for words similar to ' Wuzza ', you can:
Query query = new Fuzzyquery (new Term ("Content", "Wuzza");
You may get ' fuzzy ' and ' Wuzzy '.
Vii. termrangequery: In-scope search
You may want to search the document from 20060101 to 20060130 in the time domain, and you can use Termrangequery:
Termrangequery Query2 = Termrangequery.newstringrange ("Time", "20060101", "20060130", true, true);
The last true indicates a closed interval.
8, Topdocs: result set, is the result of searcher search, inside is some scoredoc, this object's DOC member is this ID.
To get an article, you need to use this ID to fetch the article, Searcher provides a way to obtain the document with ID, and then you have the data.