I. My original procedure.
In fact, my original program is quite simple, completely modified from the demo inside the Searchfiles and indexfiles. The only difference is that it refers to the SMARTCN word breaker.
I post the code that modifies that point.
Indexhchinese.java:
Date start = new Date();
try {
IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR),
new SmartChineseAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED);
indexDocs(writer, docDir);
System.out.println("Indexing to directory '" +INDEX_DIR+ "'...");
System.out.println("Optimizing...");
//writer.optimize();
writer.close();
Date end = new Date();
System.out.println(end.getTime() - start.getTime() + " total milliseconds");
}
SearchChinese.java
Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_CURRENT);
BufferedReader in = null;
if (queries != null) {
in = new BufferedReader(new FileReader(queries));
} else {
in = new BufferedReader(new InputStreamReader(System.in, "GBK"));
}
Here, I have developed the input of the query is encoded using GBK.
Then I am full of confidence after running ... Found unable to retrieve the Chinese, the English search inside is normal.