Use Lucene 2.31 to index Oracle 10 Gb Databases

Source: Internet
Author: User

1. Main technologies used:
Lucene 2.3.1
IK_CAnalyzer 1.4 Chinese Word Segmentation
HtmlParser 1.6 HTML file/Text parser disadvantage: cannot ignore <! -----> Content


2. Other implementation methods:

Perform incremental index content for each category every day: type, URL, TEXT content, title, author, and time.

3. Create a table on Oracle 10 GB:

-- Create table
Create table IZ_SEARCH_ENGINE
(
Id number not null,
INDEX_DIR VARCHAR2 (50 ),
TYPE VARCHAR2 (500), TYPE
TYPE_DESC VARCHAR2 (50), type Annotation
TABLE_MAXVALUE VARCHAR2 (50), maximum value of a table
TABLE_SQLS CLOB, (the SQL statement that is not indexed to a table, such as select... from XXX where id> # ID #, # ID # from TABLE_MAXVALUE)
STATUS VARCHAR2 (20) default 'offline', useless currently
TYPE_TRUETYPE VARCHAR2 (50) temporarily useless
)

4. Key JAVA code for indexing:

String INDEX_DIR = "/home/xue24_index_book"; // specify the INDEX DIRECTORY
IndexWriter writer = new IndexWriter (INDEX_DIR, new IK_CAnalyzer (), true); // prepare the index area and specify the word segmentation Analyzer
Document doc = new Document (); // instance the new document
Doc. add (new Field ("type", "Community", Field. Store. YES, Field. Index. TOKENIZED); // set the Field for document: type
Doc. add (new Field ("title", "title" Field. Store. YES, Field. Index. TOKENIZED); // set the Field for document: title
Writer. addDocument (doc); // Add this document to the INDEX DIRECTORY
Writer. optimize (); // optimized
Writer. close (); // close the index

5. Key JSP code to be searched:

String INDEX_DIR_BOOK = "/home/xue24_index/book ";
String INDEX_DIR_BBS = "/home/xue24_index/bbs ";

Searcher [] searchers = new Searcher [2];
Searchers [0] = new IndexSearcher (INDEX_DIR_BOOK );
Searchers [1] = new IndexSearcher (INDEX_DIR_BBS );

Searcher searcher = new MultiSearcher (searchers );
MultiFieldQueryParser queryParser = new MultiFieldQueryParser (new String [] {"title", "content", "author"}, new IK_CAnalyzer ());
Query query = queryParser. parse (keyword); // analyze and Query

Hits hits = searcher. search (query); // search index
Out. println ("Total found results:" + hits. length ());
For (int I = 0; I Document doc = hits.doc (I );
Out. println ("title:" + doc. get ("title "));
}


6. Write another linux cron for regular execution, or use the quartz plug-in to complete the incremental index.

Lucene details: click here
Lucene: click here

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.