SOLR cluster configuration

Source: Internet
Author: User
Tags solr zookeeper zookeeper download tomcat
< c13>1 Overview

SOLR is a text retrieval application service based on Lucene. Lucene is a text retrieval class library written in the Java language, which is implemented by the inverted-platoon principle. Solrcloud is a distributed search solution based on SOLR and zookeeper. When the index becomes larger, a single system fails to meet the disk requirements and the query is slow, and a distributed index is required at this time. In a distributed index, the original large index is divided into small indexes, and SOLR can merge the results returned by these small indexes and return them to the client.

2 Parameter Description

The cluster includes the following parameters:

Cluster cluster: A set of SOLR nodes, logically managed as a unit, using the same schema and solrconfig for the entire cluster

Node nodes: A JVM instance running SOLR

Collection: A complete index in the logical sense of a solrcloud cluster, often divided into one or more shard. These shard use the same config set, and if the number of Shard is more than one, the index scheme is a distributed index.

Core: Solr Core, a SOLR with one or more solrcore, each SOLR core can provide independent indexing and querying capabilities, and the SOLR core is proposed for increased management flexibility and shared resources. The configuration used in Solrcloud is in zookeeper, while the traditional SOLR core configuration file is in the configuration directory on disk.

Config SET:SOLR Core provides a set of configuration files that a service must have, and each config set has a name. Must contain Solrconfig.xml and Schema.xml, for the first time, depending on the configuration of the two files, additional files may be required. ConfigSet is stored in zookeeper, can be re-uploaded or updated with the Upconfig command, and can be initialized or updated with SOLR's startup parameters Bootstrap_confdir.

Shard Shards: Logical Shards of collection. Each shard is divided into one or more replicas, which is determined by an election to be leader.

A copy of the Replica:shard. Each replica exists in a core of SOLR.

Leader: Shard Replicas, who won the election, has multiple replicas for each shard, and these replicas need to be elected to determine a Leader. Elections can take place at any time. When an index operation is performed, Solrcloud uploads the index operation request to this shard corresponding Leader,leader and distributes them to the replicas of all shard.

3 Logical View

4 cluster configuration
4.1 Downloads

SOLR Download: http://archive.apache.org/dist/lucene/solr/

Tomcat Download: http://tomcat.apache.org/

Zookeeper Download: http://www.apache.org/dyn/closer.cgi/zookeeper/

4.2 Configuration

The cluster configuration is based on a single machine configuration, and some of SOLR's basic configurations are not specified here, only the configuration required by the cluster is recorded,

For a tomcat stand-alone configuration, refer to: http://blog.csdn.net/vtopqx/article/details/76165305

Operation is under the Windows SOLR cluster configuration, Linux under the configuration is similar.

Quantity 1:zookeeper

Quantity 3:solr

Quantity 3:tomcat (requires more than TOMCAT8 version)

Zookeeper does not do cluster, use single zookeeper to manage SOLR

The directory is as follows:

4.2.1 SOLR Configuration

1. Modifying Solr's Web. XML Reference Path

Entry: G:\solr_cloud\tomcat1\webapps\solr\WEB-INF

Modify Web. xml, remove <env-entry> comment, configure the corresponding solr_home path;


Each of the remaining TOMCATX under SOLR is configured with the corresponding Solr_home path.

2, modify solr_home under Solr.xml configuration Tomcat Port path

Entry: G:\SOLR_CLOUD\SOLR_HOME1\SOLR

Modify the Solr.xml, respectively, to the corresponding port and IP,


Make a tomcat configuration for the remaining solr_home, respectively.

4.2.2 Tomcat Configuration

Configuring the Tomcat Connection zookeeper

Go to the Tomcat directory: G:\solr_cloud\tomcat1\bin modify Catalina.bat

Set the leader node first and add

Setjava_opts=-dbootstrap_confdir=g:\solr_cloud\solr_home1\solr\test_core\conf-dcollection.configname= clustercore-dzkrun-dzkhost=localhost:2181-dnumshards=2

The following figure:

Perform non-leader configuration of the remaining Tomcat separately:

Set java_opts=-dzkrun-dzkhost=localhost:2181-dnumshards=2

Where-dzkhost=localhost:2181 is the zookeeper default port connection

Parameter description: -dbootstrap_confdir zookeeper need to prepare a copy of the cluster configuration, this parameter is to tell Solrcloud where these configurations are placed, and as a common configuration file for the entire cluster. -dcollection.configname Specifies the name of your profile uploaded to zookeeper, which is the same as the core name you upload, so it's easy to identify. -dzkrun launches an embedded zookeeper server in SOLR that manages the associated configuration of the cluster. -dzkhost , like the above parameter, allows you to configure an IP and port to specify which zookeeper server to coordinate with. -dnumshards=2 configuration need to separate your data into how many shard -dbootstrap_conf=true will upload all the data in Solr/home to zookeeper home/ The data directory, that is, all core will be managed by the cluster.

4.2.3 Zookeeper Configuration

1, enter zookeeper root directory, create data empty folder;

2, Configuration Zoo.cfg

Entry: G:\solr_cloud\zookeeper-3.4.8\conf

Configure the DataDir directory path to the data directory created for the previous step

4.2.4 Start Service

1. Start the Zookeeper service first:

Enter G:\solr_cloud\zookeeper-3.4.8\bin

Start: Zkserver.cmd

2. Start Tomcat in turn

Enter Tomcat\bin

Start: Startup.bat

After you have successfully started each service, log in to http://localhost:8080/solr/admin.html, or log in to another Tomcat port,

You can see that the cloud has been deployed successfully with three SOLR nodes under Live_nodes

4.2.5 Creating a core

After starting the service on the Platform page, open the graph diagram interface to view or empty, that is because the SOLR instance has not yet created the core, so in this does not display (PS: I was in the deployment of this situation has been tangled for a long time, why do not appear in the diagram, online various search information, basically no one encountered ... Behind the coincidence created the core can have a relationship graph out)

Open Core Amin to create the core, which will be stored in the solr_home corresponding to the Tomat,


plug in a problem that you might encounter,

If you have just created the core and immediately go to query querying data, you will encounter an error, prompting No Servers hosting SHARD:SHARDX,

This is because-dnumshards=2 was set up before we configured it, so it is impossible to query the core of creating a single machine.


So:

Access the remaining tomcat pages and create the core separately, noting that the core name needs to be consistent

Http://localhost:8081/solr/admin.html

Http://localhost:8082/solr/admin.html

Once created, refresh the page again to view the graph diagram, and you can see that the three machines are already associated. At the same time query data can also be queried out.

4.3 testing

The so-called test, here is the main next simple test:


4.3.1 test Data

Simply log in to a machine SOLR,

For example: http://localhost:8080/solr/admin.html

After logging in, add data

After adding data to the core on any one machine, you can view the data on other machines.

4.3.2 Test Downtime

The main test under the machine off or off, see Zookeeper switching situation:

1, try to turn 8080 of the Tomcat machine off, after the shutdown, the refresh interface can see the cluster diagram, has become gray, while the 8082 is automatically replaced with leader, while viewing 8082 of the Tomcat log can see the replacement information:


2, will 8080 restart, you can see that the previous 8082 has become leader, so 8080 start after just as replica from the role of the machine, as shown below:

Now that the SOLR cluster configuration is complete, just a few tests are done, followed by a number of parameters and terminology will be recorded.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.