1 What is Solrcloud
Solrcloud (Solr Cloud) is a distributed search solution provided by SOLR, when you need large-scale, fault-tolerant, distributed indexing and retrieval capabilities when using Solrcloud. When the index data of a system is less than the need to use Solrcloud, when the index is very large, the search request concurrency is very high, then need to use Solrcloud to meet these requirements.
Solrcloud is a distributed search scheme based on SOLR and zookeeper , and its main idea is to use zookeeper as the configuration information Center for the cluster.
It has several features:
1) Centralized configuration information
2) Automatic fault tolerance
3) near real-time search
4) Automatic load balancing when querying
What the hell is 1.1 zookeeper?
As the name implies, Zookeeper is a zoo administrator, he is the administrator of the Hadoop (elephant), Hive (Bee), pig (pig), Apache HBase and Apache SOLR distributed clusters are used Zookeeper;zookeeper : A distributed, open source program coordination Service that is a sub-project under the Hadoop project.
What 1.2 zookeeper can do
1. Configuration Management
In addition to the code in our application, there are a number of configurations. such as database connections. In general, we are using configuration files to introduce these configuration files in code. But it's a good idea to use a configuration file when we have only one configuration, one server, and it's not often modified, but if we have a lot of configuration, there are many servers that need this configuration, and it might be dynamic to use a configuration file. At this point, we often need to find a way to centrally manage the configuration, where we have modified the configuration in this centralized location, and all interested in this configuration can be changed. For example, we can put the configuration in the database, and then all the services that need to be configured go to this database to read the configuration. However, because many of the services are very dependent on this configuration, the service required to provide configuration services in a centralized setting is highly reliable. Generally we can use a cluster to provide this configuration service, but with the cluster to improve the reliability, how to ensure that the configuration in the cluster consistency? This is the time to use a service that implements a consistency protocol. Zookeeper is this service, which uses Zab as a consistency protocol to provide consistency. There are many open source projects that use zookeeper to maintain the configuration, such as in HBase, where the client connects to a zookeeper and obtains the necessary configuration information for the HBase cluster before it can be further manipulated. Also in the open source Message Queuing Kafka, zookeeper is used to maintain broker information. The Alibaba Open source SOA Framework Dubbo also extensively uses zookeeper to manage a number of configurations to implement service governance.
2 , Name Services
Name Service This is a good understanding. For example, in order to access a system through the network, we need to know each other's IP address, but the IP address is very unfriendly to people, this time we need to use the domain name to access. But the computer cannot be a domain name. What do we do? If we have a domain name to IP address mapping in each machine, this can solve some of the problems, but what if the domain name corresponding to the IP has changed? So we have the DNS this thing. We only need to access a well-known (known) point and it will tell you what the IP address of the domain corresponds to. There are a lot of these problems in our application, especially when we have a very large number of services, and it will be very inconvenient if we save the address of the service locally, but if we only need access to a well-known access point where we provide a unified portal, it will be much easier to maintain.
3 , distributed lock
In fact, the first article has introduced the zookeeper is a distributed coordination service. This allows us to use zookeeper to coordinate the activities between multiple distributed processes. For example, in a distributed environment, the same service is deployed on every server in our cluster to improve reliability. However, if each server in the cluster is going to be coordinated, it will be very complex to program. And if we only let one service operate, there is a single point. A common practice is to use a distributed lock, and at some point only one service goes to work, and when the service is faulty, the lock is released, and immediately fail over to another service. This is done in many distributed systems, and the design has a more pleasant name called leader election (leader election). For example, the master of HBase is using this mechanism. However, it is important to note that there is a difference between a distributed lock and a lock on the same process, so use it more cautiously than a lock in the same process.
4 , cluster Management
In distributed clusters, often due to various reasons, such as hardware failure, software failure, network problems, some nodes will enter and exit. There are new nodes to join in, and there are old nodes exiting the cluster. At this point, other machines in the cluster need to perceive this change and then make corresponding decisions based on that change. For example, we are a distributed storage system, there is a central control node responsible for the allocation of storage, when there are new storage in the current state of the cluster to allocate storage nodes. At this point we need to dynamically perceive the current state of the cluster. Also, for example, in a distributed SOA architecture, a service is provided by a cluster, and when a consumer accesses a service, a mechanism is needed to discover which nodes are now available for that service (also known as service discovery, For example, the Alibaba Open source SOA framework Dubbo uses zookeeper as the underlying mechanism for service discovery). There is also the open source Kafka queue to use zookeeper as the cosnumer of the upper and lower line management.
2 structure of the SOLR cluster
3 SOLR Cluster Build-up
The installation of this tutorial is a stand-alone version of the installation, so the use of pseudo-cluster installation, if it is a real production environment, the pseudo-cluster IP can be changed, the steps are the same.
The Solrcloud structure diagram is as follows:
Requires three zookeeper nodes
Four SOLR nodes.
Use pseudo-distributed implementations of SOLR clusters. Requires three zookeeper instances, 4 tomcat instances, which can be emulated on a single virtual machine. It is recommended that the virtual machine 1G or more memory.
4 Zookeeper Cluster Construction 4.1 front-desk conditions
Three instances of zookeeper. Zookeeper is also Java-developed so you need to install the JDK.
1. Linux System
2, JDK environment.
3, Zookeeper.
Installation steps for 4.2 zookeeper
Step one: Upload the Zookeeper installation package to the server
Second step: Unzip.
[Email protected] ~]# TAR-ZXF zookeeper-3.4.6.tar.gz
[Email protected] ~]#
Step three: Create a solrcloud directory under the/usr/local/directory. Copy the Zookeeper extracted folder to this directory for three copies. Named Zookeeper1, 2, 3, respectively.
[Email protected] ~]# Mkdir/usr/local/solrcloud
[Email protected] ~]# MV Zookeeper-3.4.6/usr/local/solrcloud/zookeeper1
[Email protected] ~]# Cd/usr/local/solrcloud
[email protected] solrcloud]# LL
Total 4
Drwxr-xr-x. 4096 Zookeeper1
[Email protected] solrcloud]# cp-r zookeeper1/zookeeper2
[Email protected] solrcloud]# cp-r Zookeeper1/zookeeper3
[Email protected] solrcloud]#
Fourth step: Configure Zookeeper.
1. Create a data directory under each Zookeeper folder.
2, in the Data folder to create a file name is called myID, the contents of the file is this zookeeper number 1, 2, 3
[Email protected] data]# echo 1 >> myID
[email protected] data]# LL
Total 4
-rw-r--r--. 1 root root 2 Sep 23:43 myID
[email protected] data]# cat myID
1
[Email protected] data]#
Create the Data directory and the myID file separately under the Zookeeper2, 3 folder
[Email protected] solrcloud]# mkdir zookeeper2/data
[Email protected] solrcloud]# Echo 2 >> Zookeeper2/data/myid
[email protected] solrcloud]# ll Zookeeper2/data
Total 4
-rw-r--r--. 1 root root 2 Sep 23:44 myID
[email protected] solrcloud]# cat Zookeeper2/data/myid
2
[Email protected] solrcloud]# mkdir zookeeper3/data
[Email protected] solrcloud]# echo 3 >> Zookeeper3/data/myid
[Email protected] solrcloud]#
3. Copy the Zoo_sample.cfg file under the Conf directory under the Zookeeper1 name to Zoo.cfg
4, modify the configuration of zoo.cfg
Fifth step: Start zookeeper. Enter the Zookeeper1/bin directory.
Start zookeeper:./zkserver.sh Start
Close:./zkserver.sh stop
View status:./zkserver.sh status
[Email protected] solrcloud]# zookeeper1/bin/zkserver.sh status
JMX enabled by default
Using config:/usr/local/solrcloud/zookeeper1/bin/. /conf/zoo.cfg
Mode: Follower
[Email protected] solrcloud]# zookeeper2/bin/zkserver.sh status
JMX enabled by default
Using config:/usr/local/solrcloud/zookeeper2/bin/. /conf/zoo.cfg
Mode: leader
[Email protected] solrcloud]# zookeeper3/bin/zkserver.sh status
JMX enabled by default
Using config:/usr/local/solrcloud/zookeeper3/bin/. /conf/zoo.cfg
Mode: Follower
[Email protected] solrcloud]#
5. SOLR Instance Building
Step One: Create 4 tomcat instances and modify their ports. 8080-8083
Step two: Unzip the solr-4.10.3.tar.gz compressed package. Copy Solr.war from the compressed package to Tomcat.
Step three: Start the Tomcat decompression war package. Add a jar package related to the log in the example directory under the solr-4.10.3 directory to the SOLR project.
Fourth step: Create Solrhome. Modifies the location of the solrhome specified by Web. Xml.
6 SOLR Cluster Build 6.1 first step
Upload the configuration file in the Solrhome to the zookeeper cluster. Use zookeeper for client uploads.
Client command location:/root/solr-4.10.3/example/scripts/cloud-scripts
./zkcli.sh-zkhost 192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183 -cmd upconfig-confdir/usr/ Local/solrcloud/solrhome1/collection1/conf-confname myconf
The IP portion of the red font represents the IP address of the zookeeper cluster and the corresponding port.
To see if the configuration file was uploaded successfully:
[Email protected] bin]#./zkcli.sh
Connecting to localhost:2181
[zk:localhost:2181 (CONNECTED) 0] LS/
[Configs, Zookeeper]
[Zk:localhost:2181 (CONNECTED) 1] Ls/configs
[Myconf]
[Zk:localhost:2181 (CONNECTED) 2] ls/configs/myconf
[Admin-extra.menu-top.html, Currency.xml, Protwords.txt, Mapping-foldtoascii.txt, _schema_analysis_synonyms_ English.json, _rest_managed.json, Solrconfig.xml, _schema_analysis_stopwords_english.json, stopwords.txt, Lang, Spellings.txt, Mapping-isolatin1accent.txt, admin-extra.html, XSLT, Synonyms.txt, scripts.conf, Update-script.js, Velocity, Elevate.xml, admin-extra.menu-bottom.html, clustering, Schema.xml]
[Zk:localhost:2181 (CONNECTED) 3]
6.2 Step Two
Modify the Solr.xml file under Solrhome to specify the IP address and port number that the current instance is running on.
6.3 Step Three
Modify the Bin directory of each SOLR tomcat to include the Dzkhost specified zookeeper server address in the catalina.sh file:
java_opts= "-dzkhost=192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183" The IP of the red font represents the IP of the zookeeper cluster and the port number
(You can use Vim's find function to find the location of the java_opts definition and add it)
6.4 Fourth Step
Restart Tomcat
One master node with multiple backup nodes, and the cluster has only one piece.
6.5 Fifth Step
Create a two-piece collection, each piece is a master one.
Use the following command to create:
http://192.168.25.154:8080/solr/admin/collections?action=create&name=collection2&numshards=2& replicationfactor=2
6.6 Sixth Step
Delete Collection1.
Http://192.168.25.154:8080/solr/admin/collections?action=DELETE&name=collection1
7 Use of SOLR clusters
Use SOLRJ to manipulate the index library of the clustered environment.
7.1 SOLRJ Test
1 Public classSolrcloudtest {2 3 @Test4 Public voidTestadddocument ()throwsException {5 //Create a connection to a SOLR cluster6 //The parameter is the address list of the zookeeper, separated by commas7String zkhost = "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183";8Cloudsolrserver Solrserver =NewCloudsolrserver (zkhost);9 //Set the default collectionTenSolrserver.setdefaultcollection ("Collection2"); One //Create a Document object ASolrinputdocument document =Newsolrinputdocument (); - //to add a field to your document -Document.addfield ("id", "test001"); theDocument.addfield ("Item_title", "Test Product"); - //add a document to the index library - Solrserver.add (document); - //Submit + solrserver.commit (); - } + A @Test at Public voidDeletedocument ()throwssolrserverexception, IOException { - //Create a connection to a SOLR cluster - //The parameter is the address list of the zookeeper, separated by commas -String zkhost = "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183"; -Cloudsolrserver Solrserver =NewCloudsolrserver (zkhost); - //Set the default collection inSolrserver.setdefaultcollection ("Collection2"); - to +Solrserver.deletebyquery ("*:*"); - solrserver.commit (); the } *}
7.2 SOLRJ and Spring integration
To modify the spring configuration file, add the configuration for the cluster version:
1 <!--Cluster Edition -2 <BeanID= "Cloudsolrserver"class= "Org.apache.solr.client.solrj.impl.CloudSolrServer">3 <Constructor-argname= "Zkhost"value= "192.168.25.154:2181,192.168.25.154:2182,192.168.25.154:2183"></Constructor-arg>4 < Propertyname= "Defaultcollection"value= "Collection2"></ Property>5 </Bean>
The building and use of SOLR clusters (a guide to building the Content zookeeper cluster)