Lucene (01), javase01

Source: Internet
Author: User

Lucene (01), javase01

My blog address: http://www.cnblogs.com/tenglongwentian/

Lucene: the latest version is javase6.2.1, and the matching jdk version is the official version 1.8.
The last jdk7 version is used here, So javase5.3.3 is used.

Create a maven project. If you do not know how to create a maven project, refer to the previous blog post.
<Packaging> jar </packaging>,

 1 <dependencies> 2         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core --> 3         <dependency> 4             <groupId>org.apache.lucene</groupId> 5             <artifactId>lucene-core</artifactId> 6             <version>5.5.3</version> 7         </dependency> 8         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser --> 9         <dependency>10             <groupId>org.apache.lucene</groupId>11             <artifactId>lucene-queryparser</artifactId>12             <version>5.5.3</version>13         </dependency>14         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-analyzers-common -->15         <dependency>16             <groupId>org.apache.lucene</groupId>17             <artifactId>lucene-analyzers-common</artifactId>18             <version>5.5.3</version>19         </dependency>20     </dependencies>

Because I use jdk 7 and do not like to manually adjust the jdk version of the project every time I update the maven repository

1 <! -- Source code directory, plug-in management, and other configurations --> 2 <build> 3 <finalName> Lucene </finalName> 4 <plugins> 5 <plugin> 6 <groupId> org. apache. maven. plugins </groupId> 7 <artifactId> maven-compiler-plugin </artifactId> 8 <version> 3.3 </version> 9 <configuration> 10 <! -- Specify the source and target versions --> 11 <! -- Source specifies the version of the compiler used to compile the java source code --> 12 <source> 1.7 </source> 13 <! -- The class file generated by target is compatible with the virtual machine of the specified version. --> 14 <target> 1.7 </target> 15 </configuration> 16 </plugin> 17 </ plugins> 18 </build>

Yes.

Create two classes:

Indexer

Import java. io. file; import java. io. fileReader; import java. nio. file. paths; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. textField; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; public class Indexer {private IndexWriter writer; // write index instance/*** constructor instantiate IndexWriter ** @ param indexDir * @ throws Exception */public Indexer (String indexDir) throws Exception {Directory dir = FSDirectory. open (Paths. get (indexDir); Analyzer analyzer = new StandardAnalyzer (); // standard analyzer IndexWriterConfig iwc = new IndexWriterConfig (Analyzer); writer = new IndexWriter (dir, iwc );} /*** close write index ** @ throws Exception */public void close () throws Exception {writer. close ();}/*** index all files in the specified directory ** @ param dataDir * @ throws Exception */public int index (String dataDir) throws Exception {File [] files = new File (dataDir ). listFiles (); for (File f: files) {indexFile (f);} return writer. numDocs ();}/*** specify the index File ** @ param f */private void indexFile (File f) throws Exception {// TODO Auto-generated method stub System. out. println ("index file:" + f. getCanonicalFile (); Document doc = getDocument (f); writer. addDocument (doc);}/*** get the Document. In this Document, set each field ** @ param f * @ return * @ throws Exception */private Document getDocument (File f) throws Exception {// TODO Auto-generated method stub Document doc = new Document (); doc. add (new TextField ("contents", new FileReader (f); doc. add (new TextField ("fileName", f. getName (), Field. store. YES); doc. add (new TextField ("fullPath", f. getCanonicalPath (), Field. store. YES); return doc;} public static void main (String [] args) {String indexDir = "E: \ lucene"; String dataDir = "E: \ lucene \ data "; Indexer indexer = null; int numIndexed = 0; long start = System. currentTimeMillis (); try {indexer = new Indexer (indexDir); numIndexed = indexer. index (dataDir);} catch (Exception e) {// TODO Auto-generated catch block e. printStackTrace ();} finally {try {indexer. close ();} catch (Exception e) {// TODO Auto-generated catch block e. printStackTrace () ;}long end = System. currentTimeMillis (); System. out. println ("index:" + numIndexed + "files," + (end-start) + "millisecond ");}}
String indexDir = "E: \ lucene"; String dataDir = "E: \ lucene \ data ";
Do not be curious when you see this. The drive letter is random. Create a folder under the root directory of any drive letter. It is best to have no space in English and the Chinese language is not tested. Then copy a few txt files to the data folder, it will be used for testing later.
Run this class and you can see


Then we can see these strange files in the lucene folder. What will be mentioned later.

Create another class:

Searcher

1 import java. nio. file. paths; 2 3 import org. apache. lucene. analysis. analyzer; 4 import org. apache. lucene. analysis. standard. standardAnalyzer; 5 import org.apache.e.doc ument. document; 6 import org. apache. lucene. index. directoryReader; 7 import org. apache. lucene. index. indexReader; 8 import org. apache. lucene. queryparser. classic. queryParser; 9 import org. apache. lucene. search. indexSearcher; 10 import org. apache. lucene. search. query; 11 import org. apache. lucene. search. scoreDoc; 12 import org. apache. lucene. search. topDocs; 13 import org. apache. lucene. store. directory; 14 import org. apache. lucene. store. FSDirectory; 15 16 public class Searcher {17 public static void search (String indexDir, String q) throws Exception {18 Directory dir = FSDirectory. open (Paths. get (indexDir); 19 IndexReader reader = DirectoryReader. open (dir); 20 IndexSearcher is = new IndexSearcher (reader); 21 Analyzer analyzer = new StandardAnalyzer (); 22 QueryParser parse = new QueryParser ("contents", analyzer ); 23 Query query = parse. parse (q); 24 long start = System. currentTimeMillis (); 25 TopDocs hits = is. search (query, 10); 26 long end = System. currentTimeMillis (); 27 System. out. println ("matching" + q + ", total cost" + (end-start) + "millisecond," + "found" + hits. totalHits + "records"); 28 for (ScoreDoc scoreDoc: hits. scoreDocs) {29 Document doc = is.doc(scoreDoc.doc); 30 System. out. println (doc. get ("fullPath"); 31} 32 reader. close (); 33} 34 35 public static void main (String [] args) {36 String indexDir = "E: \ lucene "; 37 // String q = "LICENSE-2.0"; 38 String q = "Zygmunt Saloni"; 39 try {40 search (indexDir, q); 41} catch (Exception e) {42 // TODO Auto-generated catch block43 e. printStackTrace (); 44} 45} 46}

Run this class,

Do not delete several special files generated by the first class. If you are willful, try it and an error will be reported, if you delete several special files generated by the first class and run the second class, an error is returned.

Let's try it out.

Compared with String q = "Zygmunt Saloni", it turns out that it has no effect because of Word Segmentation and overall cutting.

Add-if you run the second class, the result will be the same. Try it yourself.

Please indicate the source for reprinting. Thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.