Iii. SOLR multi-core and IK Configuration

Source: Internet
Author: User
Multi-core concept

To put it bluntly, it is a multi-index database. It can also be understood as multiple "database tables"

Let's talk about the real scenarios where multicore is used. If you say that product search and member information search are not used with multiple cores, there is no problem. As a result, there are many indexs files, in addition, the index file of the product is mixed with the index file of the member information, and backup is also a problem. If multiple cores are used, the product and Members can use different URLs for submission. The business is also clear, and the generated index files will not be mixed together, it is also easy to back up.

???? Each index database is accessed through a relatively independent URL.

?

Multi-core Configuration

Do you still remember SOLR home? Since multiple cores are configured, we can create a directory as SOLR home and build it from scratch, this will give you a deeper understanding (remember to modify the SOLR home path in Tomcat ).

The SOLR home path I use here is: D: \ Installed applications \ solrindex, then decompress SOLR and copy all files under solr-4.9.0 \ example \ multicore to SOLR home.

We can see that there are two cores, core0 and core1, and a SOLR. xml. Core0 and core1 can be seen from the name as two example cores. The file structure is very simple, and there are only two files, schema. XML and solrconfig. XML, so we can modify or create the core as needed, as long as the directory structure is based on the core of the instance. The next step is schema. xml. This file is equivalent to telling SOLR how many core and core names and core locations are there:

The structure is as follows:

  1. <Coresadminpath = "/admin/cores" host = "$ {Host:}" hostport = "$ {Jetty. port: 8983} "hostcontext =" $ {hostcontext: SOLR} ">
  2. ???? <Corename = "core0" instancedir = "core0"/>
  3. ???? <Corename = "core1" instancedir = "core1"/>
  4. ?
  5. ???? <Shardhandlerfactoryname = "shardhandlerfactory" class = "httpshardhandlerfactory">
  6. ?????? <Strname = "urlscheme" >$ {urlscheme :}</STR>
  7. ???? </Shardhandlerfactory>
  8. ?? </Cores>

?

?

Shardhandlerfactory does not care for the time being. It mainly modifies the core. The name is the core name, And the instancedir is the core path. The default value is the current directory. It is best to keep this consistent, that is, add the core name core0, create a core0 folder under SOLR home and put the configuration file in it. This is the core.

The SOLR. xml configuration after I modify is as follows:

  1. <Coresadminpath = "/admin/cores" host = "$ {Host:}" defaultcorename = "artist" hostport = "$ {port: 8983}" hostcontext = "$ {hostcontext: SOLR} ">
  2. ???? <Corename = "aritstcategory" instancedir = "aritstcategory"/>
  3. ???? <Corename = "artist" instancedir = "artist"/>
  4. ??? <Corename = "song" instancedir = "song"/>
  5. ??? <Corename = "songartist" instancedir = "songartist"/>
  6. ??? <Corename = "songcategory" instancedir = "songcategory"/>
  7. ??? <Corename = "songmenu" instancedir = "songmenu"/>
  8. ??? <Corename = "spaceaudio" instancedir = "spaceaudio"/>
  9. ??? <Corename = "spacevideo" instancedir = "spacevideo"/>
  10. ??? <Corename = "spaceavnum" instancedir = "spaceavnum"/>
  11. ?
  12. ???? <Shardhandlerfactoryname = "shardhandlerfactory" class = "httpshardhandlerfactory">
  13. ?????? <Strname = "urlscheme" >$ {urlscheme :}</STR>
  14. ???? </Shardhandlerfactory>
  15. ?? </Cores>

?

The directory structure is as follows:

?

?

You may have noticed the following configuration items:

  1. <Coresadminpath = "/admin/cores" host = "$ {Host:}" defaultcorename = "artist" hostport = "$ {port: 8983}" hostcontext = "$ {hostcontext: SOLR} ">

?

Adminpath refers to the URL path.

Host indicates the host name.

Defaultcorename refers to the Core Used by default (not configured)

Hostport refers to the access port (consistent with Tomcat port)

Hostcontext refers to the context of the host, that is, the SOLR project name in webapps.

In fact, it is a bit like the configuration of the Tomcat project.

?

Multi-core access

Enable the Tomcat service and access: localhost: 8983/SOLR

As shown in:

You can see multiple cores. Of course, you can also access different core libraries on urls:

Localhost: 8983/SOLR/admin/corename

The defaultcorename mentioned above means that if the access core is not specified, which core is accessed by default.

?

?

Introduction and configuration of word Divider

SOLR has no Chinese word segmentation by default. SOLR has the following common data types by default: String, long, and Int. For more information, see my other blog: 1. SOLR Overview

I use the IK tokenizer, which is an open-source tokenizer made by Chinese people. So I will focus on the configuration of the IK tokenizer.

Download

Download? "Ik? Analyzer? 2012ff_hf1.zip "package .? See http://zhengchao730.iteye.com/blog/1833000

Extract

Decompressed directory structure:

There are more detailed documents, but I found that there is no detailed description of the SOLR splitter configuration in this document. So please continue reading.

?

Configuration

Step 1:Copy ikanalyzer2012ff_u1.jar to the directory "$ atat_home \ webapps \ SOLR \ WEB-INF \ Lib"

Step 2:Copy ikanalyzer. cfg. xml and stopword. DIC to the $ atat_home \ webapps \ SOLR \ WEB-INF \ Classes directory. If no, create a new classes directory.

Step 3:Configure the IK tokenizer in schema. XML in each core:

  1. <Fieldtypename = "text_ik" class = "SOLR. textfield">
  2. ???????? <Analyzertype = "Index" ismaxwordlength = "false" class = "org. wltea. analyzer. Lucene. ikanalyzer"/>
  3. ???????? <Analyzertype = "query" ismaxwordlength = "true" class = "org. wltea. analyzer. Lucene. ikanalyzer"/>
  4. ??? </Fieldtype>

?

?

In this way, you can use the IK tokenizer.

Here, ismaxwordlength refers to the fine-grained word segmentation. You can specify the fine-grained word segmentation for index indexes and query queries respectively. We recommend that you set ismaxwordlength of index to false to use the finest word segmentation, the index is more accurate, and the query can be matched as much as possible. The ismaxwordlength of the query is set to true and the maximum word segmentation is used, so that the query results can better meet the user's needs.

I also need to pay special attention to solr4.9, so we need to put each core schema. in XML, the <schema name = "example core zero" version = "1.1"> version is changed from 1.1 to 1.5.

<Schema name = "example core zero" version = "1.5">.

In this way, words can be queried successfully. For example, if the People's Republic of China is not configured, It is a phrase match by default. Only the results of the People's Republic of China are included in the search document, but if the word segmentation is configured, so the Chinese people .... Can be matched.

Use and test the IK Splitter

Then configure a field in schema. XML for testing, as shown below:

  1. <Fieldname = "artist_name" type = "text_ik" indexed = "true" stored = "true"/>

Then open the SOLR admin page:

You can see the effect after word segmentation.

Iii. SOLR multi-core and IK Configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.