The building and use of SOLR clusters (a guide to building the Content zookeeper cluster)

Last Update:2016-08-08 Source: Internet

Author: User

Tags apache solr solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 What is Solrcloud

Solrcloud (Solr Cloud) is a distributed search solution provided by SOLR, when you need large-scale, fault-tolerant, distributed indexing and retrieval capabilities when using Solrcloud. When the index data of a system is less than the need to use Solrcloud, when the index is very large, the search request concurrency is very high, then need to use Solrcloud to meet these requirements.

　　Solrcloud is a distributed search scheme based on SOLR and zookeeper , and its main idea is to use zookeeper as the configuration information Center for the cluster.

It has several features:

1) Centralized configuration information

2) Automatic fault tolerance

3) near real-time search

4) Automatic load balancing when querying

What the hell is 1.1 zookeeper?

As the name implies, Zookeeper is a zoo administrator, he is the administrator of the Hadoop (elephant), Hive (Bee), pig (pig), Apache HBase and Apache SOLR distributed clusters are used Zookeeper;zookeeper : A distributed, open source program coordination Service that is a sub-project under the Hadoop project.

What 1.2 zookeeper can do

1. Configuration Management

In addition to the code in our application, there are a number of configurations. such as database connections. In general, we are using configuration files to introduce these configuration files in code. But it's a good idea to use a configuration file when we have only one configuration, one server, and it's not often modified, but if we have a lot of configuration, there are many servers that need this configuration, and it might be dynamic to use a configuration file. At this point, we often need to find a way to centrally manage the configuration, where we have modified the configuration in this centralized location, and all interested in this configuration can be changed. For example, we can put the configuration in the database, and then all the services that need to be configured go to this database to read the configuration. However, because many of the services are very dependent on this configuration, the service required to provide configuration services in a centralized setting is highly reliable. Generally we can use a cluster to provide this configuration service, but with the cluster to improve the reliability, how to ensure that the configuration in the cluster consistency? This is the time to use a service that implements a consistency protocol. Zookeeper is this service, which uses Zab as a consistency protocol to provide consistency. There are many open source projects that use zookeeper to maintain the configuration, such as in HBase, where the client connects to a zookeeper and obtains the necessary configuration information for the HBase cluster before it can be further manipulated. Also in the open source Message Queuing Kafka, zookeeper is used to maintain broker information. The Alibaba Open source SOA Framework Dubbo also extensively uses zookeeper to manage a number of configurations to implement service governance.

2 , Name Services

Name Service This is a good understanding. For example, in order to access a system through the network, we need to know each other's IP address, but the IP address is very unfriendly to people, this time we need to use the domain name to access. But the computer cannot be a domain name. What do we do? If we have a domain name to IP address mapping in each machine, this can solve some of the problems, but what if the domain name corresponding to the IP has changed? So we have the DNS this thing. We only need to access a well-known (known) point and it will tell you what the IP address of the domain corresponds to. There are a lot of these problems in our application, especially when we have a very large number of services, and it will be very inconvenient if we save the address of the service locally, but if we only need access to a well-known access point where we provide a unified portal, it will be much easier to maintain.

3 , distributed lock

In fact, the first article has introduced the zookeeper is a distributed coordination service. This allows us to use zookeeper to coordinate the activities between multiple distributed processes. For example, in a distributed environment, the same service is deployed on every server in our cluster to improve reliability. However, if each server in the cluster is going to be coordinated, it will be very complex to program. And if we only let one service operate, there is a single point. A common practice is to use a distributed lock, and at some point only one service goes to work, and when the service is faulty, the lock is released, and immediately fail over to another service. This is done in many distributed systems, and the design has a more pleasant name called leader election (leader election). For example, the master of HBase is using this mechanism. However, it is important to note that there is a difference between a distributed lock and a lock on the same process, so use it more cautiously than a lock in the same process.

4 , cluster Management

In distributed clusters, often due to various reasons, such as hardware failure, software failure, network problems, some nodes will enter and exit. There are new nodes to join in, and there are old nodes exiting the cluster. At this point, other machines in the cluster need to perceive this change and then make corresponding decisions based on that change. For example, we are a distributed storage system, there is a central control node responsible for the allocation of storage, when there are new storage in the current state of the cluster to allocate storage nodes. At this point we need to dynamically perceive the current state of the cluster. Also, for example, in a distributed SOA architecture, a service is provided by a cluster, and when a consumer accesses a service, a mechanism is needed to discover which nodes are now available for that service (also known as service discovery, For example, the Alibaba Open source SOA framework Dubbo uses zookeeper as the underlying mechanism for service discovery). There is also the open source Kafka queue to use zookeeper as the cosnumer of the upper and lower line management.

2 structure of the SOLR cluster

3 SOLR Cluster Build-up

The installation of this tutorial is a stand-alone version of the installation, so the use of pseudo-cluster installation, if it is a real production environment, the pseudo-cluster IP can be changed, the steps are the same.

The Solrcloud structure diagram is as follows:

Requires three zookeeper nodes

Four SOLR nodes.

Use pseudo-distributed implementations of SOLR clusters. Requires three zookeeper instances, 4 tomcat instances, which can be emulated on a single virtual machine. It is recommended that the virtual machine 1G or more memory.

4 Zookeeper Cluster Construction 4.1 front-desk conditions

Three instances of zookeeper. Zookeeper is also Java-developed so you need to install the JDK.

1. Linux System

2, JDK environment.

3, Zookeeper.

Installation steps for 4.2 zookeeper

Step one: Upload the Zookeeper installation package to the server

Second step: Unzip.

[Email protected] ~]# TAR-ZXF zookeeper-3.4.6.tar.gz

[Email protected] ~]#

Step three: Create a solrcloud directory under the/usr/local/directory. Copy the Zookeeper extracted folder to this directory for three copies. Named Zookeeper1, 2, 3, respectively.

[Email protected] ~]# Mkdir/usr/local/solrcloud

[Email protected] ~]# MV Zookeeper-3.4.6/usr/local/solrcloud/zookeeper1

[Email protected] ~]# Cd/usr/local/solrcloud

[email protected] solrcloud]# LL

Total 4

Drwxr-xr-x. 4096 Zookeeper1

[Email protected] solrcloud]# cp-r zookeeper1/zookeeper2

[Email protected] solrcloud]# cp-r Zookeeper1/zookeeper3

[Email protected] solrcloud]#

Fourth step: Configure Zookeeper.

1. Create a data directory under each Zookeeper folder.

2, in the Data folder to create a file name is called myID, the contents of the file is this zookeeper number 1, 2, 3

[Email protected] data]# echo 1 >> myID

[email protected] data]# LL

Total 4

-rw-r--r--. 1 root root 2 Sep 23:43 myID

[email protected] data]# cat myID

[Email protected] data]#

Create the Data directory and the myID file separately under the Zookeeper2, 3 folder

[Email protected] solrcloud]# mkdir zookeeper2/data

[Email protected] solrcloud]# Echo 2 >> Zookeeper2/data/myid

[email protected] solrcloud]# ll Zookeeper2/data

Total 4

-rw-r--r--. 1 root root 2 Sep 23:44 myID

[email protected] solrcloud]# cat Zookeeper2/data/myid

[Email protected] solrcloud]# mkdir zookeeper3/data

[Email protected] solrcloud]# echo 3 >> Zookeeper3/data/myid

[Email protected] solrcloud]#

3. Copy the Zoo_sample.cfg file under the Conf directory under the Zookeeper1 name to Zoo.cfg

4, modify the configuration of zoo.cfg

Fifth step: Start zookeeper. Enter the Zookeeper1/bin directory.

Start zookeeper:./zkserver.sh Start

Close:./zkserver.sh stop

View status:./zkserver.sh status

[Email protected] solrcloud]# zookeeper1/bin/zkserver.sh status

JMX enabled by default

Using config:/usr/local/solrcloud/zookeeper1/bin/. /conf/zoo.cfg

Mode: Follower

[Email protected] solrcloud]# zookeeper2/bin/zkserver.sh status

JMX enabled by default

Using config:/usr/local/solrcloud/zookeeper2/bin/. /conf/zoo.cfg

Mode: leader

[Email protected] solrcloud]# zookeeper3/bin/zkserver.sh status

JMX enabled by default

Using config:/usr/local/solrcloud/zookeeper3/bin/. /conf/zoo.cfg

Mode: Follower

[Email protected] solrcloud]#

5. SOLR Instance Building

Step One: Create 4 tomcat instances and modify their ports. 8080-8083

Step two: Unzip the solr-4.10.3.tar.gz compressed package. Copy Solr.war from the compressed package to Tomcat.

Step three: Start the Tomcat decompression war package. Add a jar package related to the log in the example directory under the solr-4.10.3 directory to the SOLR project.

Fourth step: Create Solrhome. Modifies the location of the solrhome specified by Web. Xml.

6 SOLR Cluster Build 6.1 first step

Upload the configuration file in the Solrhome to the zookeeper cluster. Use zookeeper for client uploads.

Client command location:/root/solr-4.10.3/example/scripts/cloud-scripts

./zkcli.sh-zkhost 192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183 -cmd upconfig-confdir/usr/ Local/solrcloud/solrhome1/collection1/conf-confname myconf

The IP portion of the red font represents the IP address of the zookeeper cluster and the corresponding port.

To see if the configuration file was uploaded successfully:

[Email protected] bin]#./zkcli.sh

Connecting to localhost:2181

[zk:localhost:2181 (CONNECTED) 0] LS/

[Configs, Zookeeper]

[Zk:localhost:2181 (CONNECTED) 1] Ls/configs

[Myconf]

[Zk:localhost:2181 (CONNECTED) 2] ls/configs/myconf

[Admin-extra.menu-top.html, Currency.xml, Protwords.txt, Mapping-foldtoascii.txt, _schema_analysis_synonyms_ English.json, _rest_managed.json, Solrconfig.xml, _schema_analysis_stopwords_english.json, stopwords.txt, Lang, Spellings.txt, Mapping-isolatin1accent.txt, admin-extra.html, XSLT, Synonyms.txt, scripts.conf, Update-script.js, Velocity, Elevate.xml, admin-extra.menu-bottom.html, clustering, Schema.xml]

[Zk:localhost:2181 (CONNECTED) 3]

6.2 Step Two

Modify the Solr.xml file under Solrhome to specify the IP address and port number that the current instance is running on.

6.3 Step Three

Modify the Bin directory of each SOLR tomcat to include the Dzkhost specified zookeeper server address in the catalina.sh file:

java_opts= "-dzkhost=192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183" The IP of the red font represents the IP of the zookeeper cluster and the port number

(You can use Vim's find function to find the location of the java_opts definition and add it)

6.4 Fourth Step

Restart Tomcat

One master node with multiple backup nodes, and the cluster has only one piece.

6.5 Fifth Step

Create a two-piece collection, each piece is a master one.

Use the following command to create:

http://192.168.25.154:8080/solr/admin/collections?action=create&name=collection2&numshards=2& replicationfactor=2

6.6 Sixth Step

Delete Collection1.

Http://192.168.25.154:8080/solr/admin/collections?action=DELETE&name=collection1

7 Use of SOLR clusters

Use SOLRJ to manipulate the index library of the clustered environment.

7.1 SOLRJ Test

1  Public classSolrcloudtest {2 3 @Test4      Public voidTestadddocument ()throwsException {5         //Create a connection to a SOLR cluster6         //The parameter is the address list of the zookeeper, separated by commas7String zkhost = "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183";8Cloudsolrserver Solrserver =NewCloudsolrserver (zkhost);9         //Set the default collectionTenSolrserver.setdefaultcollection ("Collection2"); One         //Create a Document object ASolrinputdocument document =Newsolrinputdocument (); -         //to add a field to your document -Document.addfield ("id", "test001"); theDocument.addfield ("Item_title", "Test Product"); -         //add a document to the index library - Solrserver.add (document); -         //Submit + solrserver.commit (); -     } +      A @Test at      Public voidDeletedocument ()throwssolrserverexception, IOException { -         //Create a connection to a SOLR cluster -         //The parameter is the address list of the zookeeper, separated by commas -String zkhost = "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183"; -Cloudsolrserver Solrserver =NewCloudsolrserver (zkhost); -         //Set the default collection inSolrserver.setdefaultcollection ("Collection2"); -          to          +Solrserver.deletebyquery ("*:*"); - solrserver.commit (); the     } *}

7.2 SOLRJ and Spring integration

To modify the spring configuration file, add the configuration for the cluster version:

1 <!--Cluster Edition -2     <BeanID= "Cloudsolrserver"class= "Org.apache.solr.client.solrj.impl.CloudSolrServer">3         <Constructor-argname= "Zkhost"value= "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183"></Constructor-arg>4         < Propertyname= "Defaultcollection"value= "Collection2"></ Property>5     </Bean>

The building and use of SOLR clusters (a guide to building the Content zookeeper cluster)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More