一般的索引結構建立的是一種“文檔到單詞”的映射關係,而倒排索引建立的則是一種“單詞到文檔”的映射關係。因為在日常的檢索中,通常都是按照關鍵字進行搜尋的,所以,倒排索引可以更好地適合這種檢索機制的需要。這也是倒排索引如今被大規模使用的原因.
BuildIndex
import java.io.File;import java.io.IOException;import java.util.Date;import org.apache.lucene.analysis.SimpleAnalyzer;import org.apache.lucene.demo.FileDocument;import org.apache.lucene.index.IndexWriter;public class BuildIndex{public static void main(String[] args){//計時用
Date start = new Date();try{//建立索引目錄IndexWriter writer = new IndexWriter("C:\\IndexDir", new SimpleAnalyzer(),true);//索引的文字文件File file = new File("C:\\IndexData.txt");//¨將文檔添加到索引種以及最佳化
System.out.println("adding " + file);writer.addDocument(FileDocument.Document(file));writer.optimize();writer.close();} catch (IOException e){e.printStackTrace();}//結束時間
Date end = new Date();//
System.out.print(end.getTime() - start.getTime());System.out.println(" total milliseconds");}}
DoSearch
import org.apache.lucene.store.*;import org.apache.lucene.document.*;import org.apache.lucene.analysis.*;import org.apache.lucene.index.*;import org.apache.lucene.search.*;import org.apache.lucene.queryParser.*;class DoSearch{public static void main(String[] args){try{//建立索引
Directory directory = new RAMDirectory();//產生分析器對象,用於分詞等
Analyzer analyzer = new SimpleAnalyzer();//索引書寫器
IndexWriter writer = new IndexWriter(directory, analyzer, true);//建立索引
String[] docs ={ "a b c d e", "a b c d e a b c d e", "a b c d e f g h i j", "a c e", "e c a", "a c e a c e", "a c e a b c"};for (int j = 0; j < docs.length; j++){Document d = new Document();d.add(Field.Text("contents", docs[j]));writer.addDocument(d);}writer.close();//產生搜尋對象
Searcher searcher = new IndexSearcher(directory);//產生結果集對象,初始化為空白值
String[] queries = {"\"a c e\"",};//¨¦¨²3¨¦?¨¢1??¡¥???¨¡ê?3?¨º??¡¥?a???¦ÌHits hits = null;//產生QueryParser 對象, 分詞
QueryParser parser = new QueryParser("contents",analyzer);//依次使用查詢字串產生查詢對象Query
for (int j = 0; j < queries.length; j++){Query query = parser.parse(queries[j]);System.out.println("Query: " + query.toString("contents"));//結果集
hits = searcher.search(query);//輸出搜尋到的總文檔數System.out.println(hits.length() + " total results");//依次輸出搜尋到的文檔的內容
for (int i = 0 ; i < hits.length() && i < 10; i++){Document d = hits.doc(i);System.out.println(i + " " + hits.score(i)+ " " +d.get("contents")); }}searcher.close();} catch (Exception e){System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());}}}
最基礎的就這兩個部分了吧. 選自<<征服ajax和lucene架構搜尋>>.
再參考吧, 當需要的時候再參考之.. 可在百度文庫下載完整pdf資料閱讀:
http://wenku.baidu.com/search?word=%D5%F7%B7%FEajax%20lucene&lm=0&od=0