SOLR: Getting Started

Source: Internet
Author: User
Tags solr zookeeper

SOLR is a high-performance search program with advanced features such as faceting (arranging search results in columns with numeric counts of key terms). SOLR is built on Lucene. Lucene is a Java library that provides indexing, querying, spelling checker, keyword highlighting, and word-breaking techniques. Both SOLR and Lucene are managed by the Apache Software Foundation.

The SOLR Search server provides a ready-to-use search platform. This section will learn about SOLR, run SOLR, introduce the SOLR infrastructure and features, and include the following points:

    • Installing SOLR
    • Run SOLR
    • How SOLR Works
    • SOLR home directory and configuration options
    • SOLR Script
1. Install SOLR (1) To make sure Java is installed
Java-version

Output

Java version "1.8.0_71" Java (tm) SE Runtime Environment (build 1.8.0_71-b15) Java HotSpot (tm) 64-bit Server VM (Build 25.71- B15, Mixed mode)

is the success

(2) Installing SOLR

It's actually extracting the files.

Unzip Solr-5.4.1.zip
2. Run SOLR (1) to start the server
BIN/SOLR start

This will start the server in the background, which is monitored by default on port 8983.

The BIN/SOLR script allows you to customize the way you start SOLR.

Scripting Help

See how to use the BIN/SOLR script

Bin/solr-help

Output

USAGE:SOLR command OPTIONS       where command is one of:start, stop, restart, status, Healthcheck, create, Create_core, C Reate_collection, delete, version  Standalone Server example (start SOLR running in the background on port 8984):    . /SOLR start-p 8984  Solrcloud Example (start SOLR running in Solrcloud mode using localhost:2181 to connect to Zookeep Er, with 1g max heap size and remote Java debug options enabled):    ./SOLR start-c-M 1g-z localhost:2181-a "-xdebug -xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044 "Pass-help after any COMMAND to the see command-specific Usage information,  such as:    ./SOLR start-help or./SOLR stop-help

You can see 10 commands to see the details of each command usage as follows. Example: View usage details for startup

BIN/SOLR Start-help
(2) Add documents

SOLR's schema provides an approach to how content is structured, but no content is structured. SOLR needs to enter content.

Bin/post is the command used to index documents, and use-HELP to see the relevant usage information. Bin/post can post various types of content to SOLR.

Bin/post-c FY example/exampledocs/*.xml
(3) Enquiry

The index is now complete and can be queried. The simplest way is to build a URL that contains the query parameters. Example: Querying video

Http://localhost:8983/solr/fy/select?q=video

Example: Querying video, but the document returns only the ID, name, and price columns

Http://localhost:8983/solr/fy/select?q=video&fl=id,name,price

Example: The query has black in the Name field. If no field is specified, the default filed (set in schema) will be searched

Http://localhost:8983/solr/fy/select?q=name:black

Fields can be provided. Example: Query price in 0~400

HTTP://LOCALHOST:8983/SOLR/FY/SELECT?Q=PRICE:[0%20TO%20400]

Faceted browsing is one of the key features of SOLR. It allows users to narrow their search results. Example: E-commerce website provides facets to narrow search results by manufacturer and price.

The faceted information is in the third part of the query return package. Example:

http://localhost:8983/solr/fy/select?q=*:* &facet=true&face.field=districtname

The results are as follows

Facet information shows how many results each districtname has. You can use this information to make it easier to narrow your query results. You can filter the results by adding filter conditions.

http://localhost:8983/solr/fy/select?q=*:* &facet=true&face.field=districtname&facet.query= Putuo
3.SOLR Working principle

SOLR adds the ability to query by following these steps:

    1. Define schema. Schema tells SOLR the document content of the index. SOLR's schema is powerful, scalable, and allows you to customize SOLR's behavior for your app
    2. Deploy SOLR to the application server
    3. Add document to SOLR
    4. Provide search functionality in the app

Sharding is a large-scale technology where clusters are divided into logical shards (called shards) to increase the size of the document in the cluster. The query requests sent over are distributed to the various shard of the cluster, and the results are merged back. Another technique is to increase the "Replication Factor" of the cluster, which allows you to increase the server, deploy the cluster's copies on top, and handle the high concurrency query pressure by dispersing the request. Sharding and replication are not mutually exclusive and can be used at the same time.

4.SOLR home directory and configuration options

When SOLR is running on the application server, it needs to access the home directory. The home directory contains important configuration information and is where the index is stored. When you run SOLR in standalone mode and Solrcound mode, the home directory layout is not a bit different.
The important parts are as follows:
Standalone Mode
<solr-home-directory>
Solr.xml
core_name1/
Core.properties
conf/
Confsolrconfig.xml
Schema.xml
data/
core_name2/
Core.properties
conf/
Solrconfig.xml
Schema.xml
data/

Solrclound Mode
<solr-home-directory>
Solr.xml
core_name1/
Core.properties
data/
core_name2/
Core.properties
data/
Each file functions as follows:

    • Solr.xml specifying configuration options for SOLR server instances
    • Per SOLR Core
      • Core.properties specify special attributes for each core, such as name, owning cluster, schema location, and other parameters
      • Solrconfig.xml controls more advanced behavior. For example, you can specify the storage directory for index data
      • Schema.xml describes the documents for the SOLR index. Defines the field type and fields. The field type definition is powerful and contains information about how SOLR handles field values and query values
      • data/directory containing the index file

Note that the Solrcloud example does not include the Conf directory for each core and therefore does not have solrconfig.xml and schema.xml. This is because the configuration file is stored in the zookeeper and thus can be propagated in the cluster.
If you are running the built-in zookeeper Solrcloud, see ZOO.CFG and Zoo.data, respectively, zookeeper configuration and data files.

5.SOLR Script (1) Start and close A. Start-up and restart

The start and restart commands have many options for you to run in Solrcloud mode, use the sample configuration, hostname as the start or non-default port, point to local zookeeper.

bin/solr start [options]bin/solr Start-helpbin/solr restart [options]bin/SOLR Restart-help

When using the restart command, all parameters must be passed in at the time of startup. SOLR will be shut down before restarting. If no node is running, restart will skip shutdown and start SOLR directly.

Available parameters
Parameters Describe Example
-A "<string>"

Use the JVM parameter to start SOLR, such as-X.

If you pass in a JVM parameter that starts with "-D", you can remove the-a option

BIN/SOLR start-a "-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044"
-cloud

If started in Solrcloud mode, a built-in zookeeper instance containing SOLR will be launched

can be abbreviated to-C

BIN/SOLR start-c
-D <dir> Define the server directory, the default value is $solr_home/server. This option is not generally defined. Typically, each instance uses the same server directory and uses a unique SOLR home directory BIN/SOLR start-d Newserverdir
-E <name>

Use the sample configuration to start SOLR. These examples are used to help you get started with SOLR faster, or to try features.

Available options are:clound, Techproducts, Dih, schemaless

BIN/SOLR start-e schemaless
-F Start Solr in the foreground, this option is not available when using the-e option BIN/SOLR start-f
-H Use custom hostname to start SOLR, default value ' localhost ' BIN/SOLR start-h search.mysolr.com
M <memory> Use a custom value as the minimum and maximum heap value for the JVM BIN/SOLR start-m 1g
-noprompt

Silent, default fully accepted

Example: When using-cloud, there will be interactive boot you multiple options, if you want to receive by default, only need to use this option

BIN/SOLR start-e Cloud-noprompt
-P <port> Specify port BIN/SOLR Start-p 8655
-S <dir>

Set Solr.solr.home System Properties. SOLR will create the core directory under this directory. This will allow you to run multiple instances on the same host, use the same server directory, and use the-D option

Once set, the specified directory needs to contain the Solr.xml file, unless Solr.xml exists in zookeeper, and the default value is SERVER/SOLR

When you run the example (-e), the parameter is ignored because Solr.solr.home depends on which example is running

BIN/SOLR start-s NewHome
-V Print redundant information BIN/SOLR start-v
-Z <zkHost> Use the custom zookeeper connection string to start SOLR. This option can only be used with the-c option, which is Solrcloud mode. If this option is not provided, SOLR launches the built-in zookeeper instance BIN/SOLR Start-c-Z server1:2181,server2:2181

To emphasize how the default settings work, take the time to understand the following commands:

bin/solr startbin/solr start-h localhost-p 8983-d Server

It is not necessary to define all the options, because some defaults.

B. Setting Java System Properties

Passing properties to the JVM using-D

Example: Set the automatic Soft-commit frequency to 3 seconds

BIN/SOLR start-dsolr.autosoftcommit.maxtime=3000
C.solrcloud mode
BIN/SOLR Start-c/cloud

If you specify a zookeeper connection string, such as "-Z 192.168.1.4:2181", SOLR connects to zookeeper and joins the cluster. If you start in cloud mode but do not specify the-Z option, SOLR launches the built-in zookeeper server, firm on SOLR Port +1000, such as SOLR on Port 8983, then the built-in zookeeper listens on port 9983

D. Running with the sample configuration
BIN/SOLR start-e <name>
    • Cloud: Start a 1-4-node Solrcloud cluster on one machine.
    • Techproducts: Started in standalone mode. Because there is no solrcloud and no schemaless mode is started, fields must be explicitly defined in Schema.xml. Directory of configuration files in $solr_home/server/solr/configsets/sample_techproducts_configs
    • Dih: Started in standalone mode. Use Dataimporthandler (DIH) and some dataconfig.xml files
    • Schemaless: Started in standalone mode. Use the managed schema, which will be explained in the following article, to provide a minimized configuration. SOLR will run in schemaless mode, and SOLR will create fields at run time and guess the field type. Configuration file in $solr_home/server/solr/configsets/data_driver_schema.configs

Note: The run in foreground (-f) option fails with the-e option because the script needs to perform additional tasks.

E. Cessation

The Stop command sends a STOP request to the running SOLR node, which waits for 5s to let SOLR slip and then force the process to kill (kill-9)

bin/SOLR Stop [options]bin/SOLR Stop-help
Available parameters
Parameters Describe Example
-P <port> Close SOLR on the specified port. If you are running multiple instances or Solrcloud mode, you will need to specify port shutdown or use the-all option separately BIN/SOLR stop-p 8983
-all Close all the running SOLR BIN/SOLR Stop-all
-K <key> Stop key is used to prevent false shutdown, the default value is "Solrrocks" BIN/SOLR stop-k Solrrocks

(2) View information

A. Version
BIN/SOLR version
B. Status

The Status command line displays basic JSON-formatted information. The status command line uses the SOLR-PID-DIR environment variable to locate the SOLR process number file to locate the SOLR instance that is running, and the Solr-pid-dir default value is the bin directory

BIN/SOLR status

Output

Found 1 SOLR NODES:SOLR process 975 running on port 8983{  "Solr_home": "/USERS/HS/PACKAGE/SOLR/SERVER/SOLR",  " Version ":" 5.4.1 1725212-jpountz-2016-01-18 11:51:45 ",  " StartTime ":" 2016-02-14t02:06:05.35z ",  " uptime ":" 0 Days, 0 hours, 0 minutes, seconds ",  " Memory ":" MB (%7.3) of 490.7 MB "}
C. Health Testing

The prerequisite is that the Solrcloud mode starts. The Health test report provides information about the status of each replica for all shards, including the number of documents submitted and the current status

bin/SOLR healthcheck [options]bin/SOLR Healthcheck-help
Available parameters
Parameters Describe Example
-C <collection> Name of the cluster object running health monitoring BIN/SOLR healthcheck-c gettingstarted
-Z <zkhost> ZooKeeper connection string, the default value is localhost:8983. If it is not 8983, you need to specify the port, the default is SOLR port +1000 BIN/SOLR healthcheck-z localhost:2181

Here is an example of a health test

./SOLR healthcheck-c gettingstarted-z localhost:9983

Output to

{"Collection": "GettingStarted", "status": "Healthy", "Numdocs": 0, "Numshards": 2, "shards": [{"Shard": "Shard1 "," status ":" Healthy "," replicas ": [{" Name ":" Core_node2 "," url ":" Http://10.8.204.89:89 83/solr/gettingstarted_shard1_replica1/"," Numdocs ": 0," status ":" Active "," uptime ":" 0 days, 0 h          Ours, 4 minutes, Seconds "," Memory ":" 85.6 mb (%17.5) of 490.7 MB "," Leader ": true}, {          "Name": "Core_node3", "url": "http://10.8.204.89:7574/solr/gettingstarted_shard1_replica2/", "Numdocs": 0, "Status": "Active", "uptime": "0 days, 0 hours, 4 minutes, Seconds", "Memory": "41.4 MB (%8.4) of 490.7 MB "}]}, {" Shard ":" Shard2 "," status ":" Healthy "," replicas ": [{" Name ":" Core_nod E1 "," url ":" http://10.8.204.89:8983/solr/gettingstarted_shard2_replica1/"," Numdocs ": 0," status ":" Active "," upTime ":" 0 days, 0 hours, 4 minutes, Seconds "," Memory ":" 85.9 mb (%17.5) of 490.7 MB "," Leader ": true},          {"Name": "Core_node4", "url": "http://10.8.204.89:7574/solr/gettingstarted_shard2_replica2/", "Numdocs": 0, "status": "Active", "uptime": "0 days, 0 hours, 4 minutes, Seconds", "memory ":" 41.4 mb (%8.4) of 490.7 MB "}]}]}
(3) Collections and Coresa. creating

Note: The user who executes the Create command is consistent with the user who started SOLR

The Create command line detects the health status of SOLR (standalone or solrcloud), creating a core or collection based on the state

bin/SOLR Create Optionsbin/SOLR create-help
Available parameters
Parameters Describe Example
-C <name> Create a name for the core or collection BIN/SOLR create-c mycollection
-D <confdir> Configuration directory, default value is Data_driven_schema_configs BIN/SOLR create-d Basic_configs
-N <configName> The configuration file name. Default to the same name as core or collection BIN/SOLR Create-n Basic
-P <port> Specifies the SOLR port. When you run multiple standalone instances, you need to specify BIN/SOLR create-p 8983

-S <shards>

-shards

The number of split collection is shards, default 1. The premise is that Solrcloud mode BIN/SOLR Create-s 2

-RF <replicas>

-replicationfactor

Number of copies per document in the cluster, default value 1 (no replication) BIN/SOLR CREATE-RF 2
(4) configuration directory and Solrcloud

Before creating a Solrcloud cluster, the configuration directory used must be uploaded to zookeeper. The main decision you need to make is whether the configuration directory in zookeeper should be shared across multiple clusters. Here's an example of how the configuration directory works in Solrcloud.

First, if the-D or-n option is not provided, the default configuration ($SOLR _home/server/solr/configsets/data_driven_schema_cinfigs/conf) is uploaded to the zookeeper, Use the same name as the cluster. For example, the following command will cause the Data-driven_schema_configs configuration to be uploaded to zookeeper

BIN/SOLR create-c Contacts

If you create another cluster

BIN/SOLR create-c contacts2

Another copy of Data_driven_schema_configs will be uploaded to zookeeper, under/configs/contacts2. Changes to the contacts cluster do not affect contacts2. In short, the default is to create a unique copy of the configuration directory for each collection.

Use the-n option to overwrite the Zookeeper configuration directory. Example

BIN/SOLR create-c logs-d basic_configs-n Basic

Will upload server/solr/configsets/basic_configs/conf directory to zookeeper as/configs/basic

Note that we use the-D option to specify a different configuration directory. SOLR offers a number of built-in configurations under Server/solr/configsets. However, you can also provide your own configuration directory path. Example:

BIN/SOLR create-c mycoll-d/tmp/myconfigs

Upload the/tmp/myconfigs to the Zookeeper directory/configs/mycoll. Again, the configuration directory name is consistent with the cluster unless you use-N to specify

Other clusters can share the configuration, using the-n option. Example: Create a new cluster that shares the basic configuration created before

BIN/SOLR create-c logs2-n Basic
A.data-driven Schema and shared configuration

The data_driven_schema_configs will be transformed into index data. Therefore, it is not recommended to share the configuration unless you determine that all clusters should inherit the index changes of a cluster.

B.delete

The Delete command line detects the operating mode of SOLR, delete core or collection

bin/SOLR Delete [options]bin/SOLR Delete-help

If in Solrcloud mode, the Delete command line checks whether the configuration directory is in use by another cluster, and if not, it will be removed from zookeeper.

Available parameters
Parameters Describe Example
-C <name> Core/collection Name BIN/SOLR delete-c Mycoll
-deleteconfig <trueIfalse>

Remove the configuration directory from zookeeper, the default value is True

If the configuration directory is used by another cluster, specifying true will not be removed

BIN/SOLR Delete-deleteconfig False
-P <port> Useful when multiple standalone instances BIN/SOLR delete-p 8983

SOLR: Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.