Application of a SOLR (multicore) Search Project

Last Update:2014-12-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Feature implementation:

One: After the project starts, automatically monitors all data models, the data that is queried. Create an index

Two: dynamic automatic updating of incremental Data Index and maintenance index.

This is a project that builds an index based on the data model, the coupling degree is low, the expansibility is high. Different from the general full-text search project with business nature. For example: Common e-commerce business, Consumer

class and other search engine systems. This kind of system usually has the business operation (to change the database table addition and deletion) the situation, simultaneously carries on the corresponding operation to the index information. This project ... Only for the number of

not for any business, It can dynamically realize the index of increment and maintenance of data under various data models. can manage the resulting data model for different projects.

The project uses the multi-core multicore in SOLR. The idea is: each solrcore corresponding to a data model scheme, after the project started, for in-memory scheme data model, through

The last Update,add time in this scheme and the timing task in scheme (how often to listen to data increment and maintenance in the database), go to the database to listen to the data to get the required

After the data you want. Build indexes on them (index exception State, logging) ... Since SOLR implements Lucene's HTTP-based interface, there is no need to consider what is relevant to Lucene's near real-time search

Nrtmanager and so on. SOLR's Commonshttpsolrserver (changed to Httpsolrserver after 4.0). If you do not consider the HTTP Request Delay , you can completely replace Lucene. and

commonshttpsolrserver This class is thread-safe.

the whole project is realized, divided into 2 sections:

Manage Data Model (Scheme) Section : First, configure the management data source, table: Manage the table information in the required remote connection database.

Second, configuration Management data Model: Manage the scheme job (data source SQL, primary key information, field configuration information, field storage, index, weight ,

Participle, add field, Update field, scheduled task, batch fetch data volume, fetch data frequency, request server information and so on. ）

-----> Note here: If a one-to-many multi-table query is involved, you must configure two primary keys. The server is automatically combined into a new federated primary key and must be guaranteed Schema.xml

The UniqueKey is unique in that otherwise the data is lost: SOLR document UniqueKey differs from Lucene's scores[i]. Doc.

Third, management and monitoring of abnormal data: monitoring abnormal data log.

This section is not described in detail. Want to do simple also can, do complex also line.

Build index section (details):

One: server initialization.

1: Load all scheme configuration scenarios. ----> Start in Webapplicationcontext.

1.1---To obtain all of its valid configured index scheme list<scheme> through the IP Port service name.

1.2---Data source SQL, primary key information, field configuration information, field store, index, weight, word breaker, search scheme, add field, update field, fetch data volume, crawl

Data frequency, request server information and so on. Assembled into a scheme object.

1.3---to load the obtained index scheme list<scheme> into memory in a singleton mode

2: Generate the corresponding SOLR all related profiles on the server

2.1---Build SOLR.FTL,SOLRCONFIG.FTL,SCHEMA.FTL template--------> based on your business needs

2.2---loop all list<scheme>, call the Freemarker template engine.

2.3---Use the freemarker template to generate the solr.xml,solrconfig.xml,schema.xml,solrserver.xml required for different scheme (httpsolrserver parameter letter

Note the load order. SOLR startup automatically scans these profiles to start properly.

-----------> Refer to one of my previous blogs http://blog.csdn.net/hu948162999/article/details/39891493

II: Implementation Scheme job.

1: Use quartz for job scheduling, implement scheme job

3.1---Activate the appropriate scheduled tasks.

3.2---Gets the maximum insert time and update time in the current batch data. Comparison of------and scheme schemes

3.3---Gets the incremental data and compares it to the amount of batch data for the scheme job. Calculates the number of times the batch is executed.

3.4---Start the thread pool to get the result set for each batch of queries, and the batch to create the index set.

3.5---Logs track their index information creation and new conditions, as well as index anomalies. ----------> for the Data Management model section

3.2---Timed job completion, recording and modifying the latest insert and update times in the scheme job.

Two kinds of word breakers:

Ictclas: The old version of the project is the use of a Chinese Academy of Sciences, C language development of the word breaker Ictclas (lexical analysis System), this thing does not say really powerful, spit trough a bit:

Because there may be a very wonderful format of data in the library, Ictclas processing this form of data, error. Stop the Web container directly:

Modified the mmseg word breaker, plus the synonym processing (this is OK, unrecognized, throw an exception, do not stop the Tomcat server), you can select the 2 types of participle in the Model Management module

Device: The new version of this project uses this word breaker.

This is a full-text retrieval system, originally divided into 3 systems. A data model system, a data indexing system, and a search section in a Business project system.

As for the specific implementation of the search section, refer to my next blog post.

Recommended downloads for some resources:

Very classical algorithm book: http://download.csdn.net/detail/hu948162999/8262987;

About the design mode of the book: Big Talk Design mode (resource is too large, upload not);

And a copy of a friend's blog about Java JVM memory management: http://blog.csdn.net/hu948162999/article/details/41948599----> This item compares to eating memory.

Personal Referral a website: solr China http://www.solr.cc/blog/ This site is very specific to the project experience, especially for the e-commerce project in terms of search solutions.

Application of a SOLR (multicore) Search Project

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Application of a SOLR (multicore) Search Project

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support