Lucene (01), javase01
My blog address: http://www.cnblogs.com/tenglongwentian/
Lucene: the latest version is javase6.2.1, and the matching jdk version is the official version 1.8.
The last jdk7 version is used here, So javase5.3.3 is used.
Create a maven project. If you do not know how to create a maven project, refer to the previous blog post.
<Packaging> jar </packaging>,
1 <dependencies> 2 <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core --> 3 <dependency> 4 <groupId>org.apache.lucene</groupId> 5 <artifactId>lucene-core</artifactId> 6 <version>5.5.3</version> 7 </dependency> 8 <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser --> 9 <dependency>10 <groupId>org.apache.lucene</groupId>11 <artifactId>lucene-queryparser</artifactId>12 <version>5.5.3</version>13 </dependency>14 <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-analyzers-common -->15 <dependency>16 <groupId>org.apache.lucene</groupId>17 <artifactId>lucene-analyzers-common</artifactId>18 <version>5.5.3</version>19 </dependency>20 </dependencies>
Because I use jdk 7 and do not like to manually adjust the jdk version of the project every time I update the maven repository
1 <! -- Source code directory, plug-in management, and other configurations --> 2 <build> 3 <finalName> Lucene </finalName> 4 <plugins> 5 <plugin> 6 <groupId> org. apache. maven. plugins </groupId> 7 <artifactId> maven-compiler-plugin </artifactId> 8 <version> 3.3 </version> 9 <configuration> 10 <! -- Specify the source and target versions --> 11 <! -- Source specifies the version of the compiler used to compile the java source code --> 12 <source> 1.7 </source> 13 <! -- The class file generated by target is compatible with the virtual machine of the specified version. --> 14 <target> 1.7 </target> 15 </configuration> 16 </plugin> 17 </ plugins> 18 </build>
Yes.
Create two classes:
Indexer
Import java. io. file; import java. io. fileReader; import java. nio. file. paths; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. textField; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; public class Indexer {private IndexWriter writer; // write index instance/*** constructor instantiate IndexWriter ** @ param indexDir * @ throws Exception */public Indexer (String indexDir) throws Exception {Directory dir = FSDirectory. open (Paths. get (indexDir); Analyzer analyzer = new StandardAnalyzer (); // standard analyzer IndexWriterConfig iwc = new IndexWriterConfig (Analyzer); writer = new IndexWriter (dir, iwc );} /*** close write index ** @ throws Exception */public void close () throws Exception {writer. close ();}/*** index all files in the specified directory ** @ param dataDir * @ throws Exception */public int index (String dataDir) throws Exception {File [] files = new File (dataDir ). listFiles (); for (File f: files) {indexFile (f);} return writer. numDocs ();}/*** specify the index File ** @ param f */private void indexFile (File f) throws Exception {// TODO Auto-generated method stub System. out. println ("index file:" + f. getCanonicalFile (); Document doc = getDocument (f); writer. addDocument (doc);}/*** get the Document. In this Document, set each field ** @ param f * @ return * @ throws Exception */private Document getDocument (File f) throws Exception {// TODO Auto-generated method stub Document doc = new Document (); doc. add (new TextField ("contents", new FileReader (f); doc. add (new TextField ("fileName", f. getName (), Field. store. YES); doc. add (new TextField ("fullPath", f. getCanonicalPath (), Field. store. YES); return doc;} public static void main (String [] args) {String indexDir = "E: \ lucene"; String dataDir = "E: \ lucene \ data "; Indexer indexer = null; int numIndexed = 0; long start = System. currentTimeMillis (); try {indexer = new Indexer (indexDir); numIndexed = indexer. index (dataDir);} catch (Exception e) {// TODO Auto-generated catch block e. printStackTrace ();} finally {try {indexer. close ();} catch (Exception e) {// TODO Auto-generated catch block e. printStackTrace () ;}long end = System. currentTimeMillis (); System. out. println ("index:" + numIndexed + "files," + (end-start) + "millisecond ");}}
String indexDir = "E: \ lucene"; String dataDir = "E: \ lucene \ data ";
Do not be curious when you see this. The drive letter is random. Create a folder under the root directory of any drive letter. It is best to have no space in English and the Chinese language is not tested. Then copy a few txt files to the data folder, it will be used for testing later.
Run this class and you can see
Then we can see these strange files in the lucene folder. What will be mentioned later.
Create another class:
Searcher
1 import java. nio. file. paths; 2 3 import org. apache. lucene. analysis. analyzer; 4 import org. apache. lucene. analysis. standard. standardAnalyzer; 5 import org.apache.e.doc ument. document; 6 import org. apache. lucene. index. directoryReader; 7 import org. apache. lucene. index. indexReader; 8 import org. apache. lucene. queryparser. classic. queryParser; 9 import org. apache. lucene. search. indexSearcher; 10 import org. apache. lucene. search. query; 11 import org. apache. lucene. search. scoreDoc; 12 import org. apache. lucene. search. topDocs; 13 import org. apache. lucene. store. directory; 14 import org. apache. lucene. store. FSDirectory; 15 16 public class Searcher {17 public static void search (String indexDir, String q) throws Exception {18 Directory dir = FSDirectory. open (Paths. get (indexDir); 19 IndexReader reader = DirectoryReader. open (dir); 20 IndexSearcher is = new IndexSearcher (reader); 21 Analyzer analyzer = new StandardAnalyzer (); 22 QueryParser parse = new QueryParser ("contents", analyzer ); 23 Query query = parse. parse (q); 24 long start = System. currentTimeMillis (); 25 TopDocs hits = is. search (query, 10); 26 long end = System. currentTimeMillis (); 27 System. out. println ("matching" + q + ", total cost" + (end-start) + "millisecond," + "found" + hits. totalHits + "records"); 28 for (ScoreDoc scoreDoc: hits. scoreDocs) {29 Document doc = is.doc(scoreDoc.doc); 30 System. out. println (doc. get ("fullPath"); 31} 32 reader. close (); 33} 34 35 public static void main (String [] args) {36 String indexDir = "E: \ lucene "; 37 // String q = "LICENSE-2.0"; 38 String q = "Zygmunt Saloni"; 39 try {40 search (indexDir, q); 41} catch (Exception e) {42 // TODO Auto-generated catch block43 e. printStackTrace (); 44} 45} 46}
Run this class,
Do not delete several special files generated by the first class. If you are willful, try it and an error will be reported, if you delete several special files generated by the first class and run the second class, an error is returned.
Let's try it out.
Compared with String q = "Zygmunt Saloni", it turns out that it has no effect because of Word Segmentation and overall cutting.
Add-if you run the second class, the result will be the same. Try it yourself.
Please indicate the source for reprinting. Thank you.