The search function of "Lucene" Apache lucene full-text search engine architecture

Source: Internet
Author: User

The previous section summarizes how Lucene builds the index, and this section briefly summarizes the search functionality in Lucene. Mainly divided into several parts, the search for specific items, the use of query expression Queryparser, the search within a specified number range, and the search at the beginning of a string and a multi-criteria query.

1. Search for a specific item

To use Lucene's search function, first of all have to have an index, that is, Lucene first has to generate a specific index for a particular file, and then we can search, which is described in the first section is very clear, then the example of building the index is also used in the first section of the example, in this will not repeat, and then generated the index , how to search it? Let's look at the first search method: A search for a specific item. The files you use and the indexes you build are the ones that use the first section, and after you have built the index, you can search for a specific item.

 Public  class searchtest {    PrivateDirectory dir;PrivateIndexreader reader;PrivateIndexsearcher search;@Before     Public void setUp()throwsException {dir = Fsdirectory.open (Paths.get ("D:\\lucene"));//Index of the directory in D:\\lucenereader = Directoryreader.open (dir);//Get Indexreader by directorySearch =NewIndexsearcher (reader);//Get Indexsearcher according to Indexreader}@After     Public void TearDown()throwsException {reader.close ();//Close Inderxreader}//Search for a specific item    @Test     Public void Testtermquery()throwsException {String Searchfield ="Contents"; String q ="Particular"; term term =NewTerm (Searchfield, q); Query query =NewTermquery (term); Topdocs hits = search.search (query,Ten); System.out.println ("Match"+ q +"Query in total"+ Hits.totalhits +"Documents"); for(Scoredoc Score:hits.scoreDocs)            {Document doc = Search.doc (score.doc); System.out.println (Doc.get ("FullPath")); }    }}

First initialize the Indexsearcher, in the search, to a specific field in the specific string Q Search, by the above program, I want to search the contents field particular this string. Contents is created when an index is built, including the FullPath field in the last line of the program, which is built at the time of indexing. With field and search strings, you can generate a search term, and then create a search based on this search term. Finally, you can search for the path to the file containing the string.
This is a search for a specific item, why is it called for a specific item? Because if I search particul, then the result is 0, that is, I have to be specific to a word, that is, lucene in the indexing is also based on a word, if I only search for a part of the word, then is not searchable, so this specific item search for its practical not much , because in practice, if I search particul, I should theoretically be able to find out the particular. So we use the query expression Queryparser.

2. Queryparser Search using query expressions

First look at how to use this queryparser.

@Test Public void Testqueryparser()throwsException {Analyzer Analyzer =NewStandardAnalyzer ();//Standard word breaker, will automatically remove the space Ah, is a the wordString Searchfield ="Contents"; String q ="Particular";//or and particular~Queryparser parser =NewQueryparser (Searchfield, analyzer);//Query parserQuery query = parser.parse (q);//To get the query object by parsing the string to queryTopdocs docs = search.search (query,Ten);//Start query, query the first 10 data, save the record in DocsSystem.out.println ("Match"+ q +"Query in total"+ Docs.totalhits +"Documents"); for(Scoredoc ScoreDoc:docs.scoreDocs) {//Remove each query resultDocument doc = Search.doc (scoredoc.doc);//scoredoc.doc equivalent to DocId, according to this docid to obtain the documentSystem.out.println (Doc.get ("FullPath"));//fullpath is a field we defined when we just built the index.}}

As can be seen from the program, the initialization of the Queryparser need to pass in a word breaker, where the standard word breaker is used, and then the same as above, you have to specify the specific field and the string to query. This looks as if it is no different from the above based on a specific item, but in fact, the advantage of using Queryparser is that when initializing the query string q, there is a syntax, and the program simply queries a particular word.
If I change Q to "particular or Unicode", then Lucene will query all documents that contain particular or Unicode (case-insensitive), and the or can also omit to write. Similarly, if I change or to and, then I query all documents that contain particular and contain Unicode. So what if I want to resemble the fuzzy query mentioned above? For example, I input particul want to find out particular how the whole? Q can be defined as "particul~" so that it is OK. The actual use of the more is this queryparser, this piece of more content can look at the official documents.

3. Specifying a numeric range search

This is used primarily for a field that is of type int, and can then be searched based on this field to search for all items within a range of two int values. To emulate this scenario, I use the example in the previous section to build the index, because there is an ID in it and it can be changed to an integer type. Then look at how to specify a range of numbers within a search.

@Test Public void Testnumericrangequery()throwsException {numericrangequery<integer> query = Numericrangequery.newintrange ("id",1,2,true,true); Topdocs hits = search.search (query,Ten); System.out.println ("Query in total"+ Hits.totalhits +"Documents"); for(Scoredoc Score:hits.scoreDocs)        {Document doc = Search.doc (score.doc); System.out.println (Doc.get ("id")); System.out.println (Doc.get ("City")); System.out.println (Doc.get ("desc")); }}

First you have to create a Numericrangequery object, initialize the first parameter is the field name, the second and third argument is the beginning and end of the number, the following two is the case, usually set to true, followed by the same as the previous query. The above program can query to two records.

4. Specify the beginning of the string search

This is a bit similar to the above number range, except that the search conditions are different, the initialization is different, specifying the beginning of the string search needs to first create a Prefixquery object, the field to be searched and the beginning of the string in, and then search. Search for all items in city that begin with S as follows.

@Test Public void Testprefixquery()throwsException {prefixquery query =NewPrefixquery (NewTerm ("City","S")); Topdocs hits = search.search (query,Ten); System.out.println ("Query in total"+ Hits.totalhits +"Documents"); for(Scoredoc Score:hits.scoreDocs)        {Document doc = Search.doc (score.doc); System.out.println (Doc.get ("id")); System.out.println (Doc.get ("City")); System.out.println (Doc.get ("desc")); }}
5. Multi-Criteria Query (combination query)

Multi-conditional query, also known as combinatorial query, as the name implies, is to combine multiple query conditions together to query, this is more powerful. For example, I now want to combine the above two queries, first the ID is between 1 and 2, and then the city is the beginning of s, you can do this:

@Test Public void Testbooleanquery()throwsException {numericrangequery<integer> Query1 = Numericrangequery.newintrange ("id",1,2,true,true); Prefixquery Query2 =NewPrefixquery (NewTerm ("City","S")); Booleanquery.builder Booleanquery =NewBooleanquery.builder ();    Booleanquery.add (Query1, BooleanClause.Occur.MUST);    Booleanquery.add (Query2, BooleanClause.Occur.MUST); Topdocs hits = Search.search (Booleanquery.build (),Ten); System.out.println ("Query in total"+ Hits.totalhits +"Documents"); for(Scoredoc Score:hits.scoreDocs)        {Document doc = Search.doc (score.doc); System.out.println (Doc.get ("id")); System.out.println (Doc.get ("City")); System.out.println (Doc.get ("desc")); }}

The combination query uses Booleanquery, and then the combination of the criteria is the above conditions, which of these conditions originally used to initialize the class or use those classes to initialize, just add to the booleanquery in the line. This is very convenient, the general query conditions are many times, you can use this combination of query methods to query.

-Willing to share and progress together!
--My Blog home: http://blog.csdn.net/eson_15

The search function of "Lucene" Apache lucene full-text search engine architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.