Distributed search engine Elasticsearch installation configuration

Source: Internet
Author: User
Tags couchdb

Distributed search Engine ElasticsearchIntroduced

Elasticsearch is an open source distributed search engine based on Lucene, with distributed multiuser capability. Elasticsearch is developed in Java, provides a restful interface, can achieve real-time search, high-performance computing, while the elasticsearch scale is very strong, do not need to restart the service, basically up to 0 configuration. But at the moment, there is little data, and the version is updated quickly, bugs exist, APIs are numerous and varied.

Concept and DesignIndex

Index is the place where the data is stored by the Elasticsearch. If you are familiar with relational databases, you can interpret the index as a table of relational databases. However, compared with relational databases, Elasticsearch can quickly and easily perform full-text retrieval of the data in the index without having to store the original data. If you are familiar with MongoDB, you can understand the Elasticsearch index as a collection in MongoDB. If you are familiar with COUCHDB, you can interpret the index as a database in Couchdb.

Document

Document is the primary entity stored in Elasticsearch. A document consists of a field (a column of row data), and Elasticsearch allows a field to appear more than once, which is called a multivalued field (multivalued). Each field corresponds to one type (string, numeric, date, and so on). The field type can be compound, and the field can contain other sub-documents or arrays. field types are important in Elasticsearch, which enables search engines to know how to perform different operations, such as comparisons, sorting, and so on. Fortunately, Elasticsearch can automatically determine the field type. Unlike relational databases, Elasticsearch documents do not need to have a fixed structure, different documents can have different field collections, and you do not need to know the fields of the document when the program is developed. Of course, the user can also define the document structure through schema Mapping (schema mapping).

Document type

In Elasticsearch, an index can store objects of many different uses. For example, Elasticsearch-based blogs can store articles and comments. Document types help us to easily differentiate between these objects. It is important to note that each document can have a different structure. In practice, dividing the document into different types is a significant aid to data manipulation. There are a few limitations to keep in mind when partitioning, one of which is that different document types cannot be set to a different field type for the same field.

Nodes and clusters

Elasticsearch can work as a standalone search service. However, to be able to handle large datasets and implement fault tolerance, Elasticsearch supports running on multiple servers that work together. These servers are collectively referred to as a cluster (cluster), and each server in the cluster is called a node. Large amounts of data can be segmented and distributed to different nodes by indexing shards (divided into smaller individuals). More availability and higher performance can be achieved through replicas (copies of the index portion).

Sharding

When you need to store large-scale documents, it is not enough to use only one node due to limitations of RAM space, hard disk capacity, and so on. Another problem is that the computational power of a node does not reach the requirements of the desired complex functionality. In these cases, the data can be sliced, each part being a separate Apachelucene index, called a shard (Shard). Each shard can be stored on different nodes in the cluster. When you need to query an index that consists of multiple shards, Elasticsearch sends the query to each related shard and merges the results. These processes are transparent to the application and do not need to know the presence of the shards.

Copy

To increase the throughput of queries or to achieve high availability, you can enable the Shard replicas feature. A replica shard is an exact copy of the original Shard, which is called the primary shard. All modifications to the index are directly on the primary shard, and each primary shard can have 0 or more replica shards. When a primary shard is lost, such as when the server that stores the data is unavailable, the cluster can promote a replica shard to a new primary shard.

Installing and configuring Elasticsearch clusters

Elasticsearch is installed in many ways, supporting multiple platforms (including Windows), with corresponding installation packages (tar.gz, zip, Rpm[centos], Deb[ubuntu]), Here, for example, the elasticsearch-1.4.4 version, the system is centos6.5.

tar.gz Installation and Configuration

This is relatively straightforward, but it is not easy to manage and run on the command line, not as a system service, but this approach allows you to run multiple instances on a single host.

    1. Unzip the elasticsearch-1.4.4.tar.gz to get the elasticsearch-1.4.4 directory.

    2. Enter the Elasticsearch-1.4.4/config directory, modify the Elasticsearch.yml, find Cluster.name, remove the previous #, change to Cluster.name:supconit, this is the name of the cluster, All nodes must be set to the same, Elasticsearch automatically identifies and associates nodes (the same network segment), which makes up the cluster.

    3. Modify Elasticsearch.yml, find Node.name, remove the previous #, modify it to node.name:node1 (take it yourself), but the name of each node must be different.

    4. Copy the elasticsearch-1.4.4 to a different host and modify the Node.name.

    5. Other configuration, you can go into the Elasticsearch-1.4.4/bin directory to modify the elasticsearch.in.sh (mainly the JVM parameters), ELASTICSEARCH.YML also have many other parameters (cluster, shard, replica and other configuration), Log configuration Modification logging.yml.

RPM Installation and Configuration

This way of installation is also very convenient, system services run, but after the installation of the directory is more decentralized, inconvenient configuration, a host can only run one instance.

    1. Run sudo rpm-ivh elasticsearch-1.4.4.noarch.rpm.

    2. After the installation is successful, the Elasticsearch program directory is/usr/share/elasticsearch, and the input sudo chkconfig--add elasticsearch Add the service.

    3. Enter the/etc/elasticsearch directory, modify the Elasticsearch.yml, find Cluster.name, remove the previous #, change to Cluster.name:supconit, this is the name of the cluster, All nodes must be set to the same, Elasticsearch automatically identifies and associates nodes (the same network segment), which makes up the cluster.

    4. Modify Elasticsearch.yml, find Node.name, remove the previous #, modify it to node.name:node1 (take it yourself), but the name of each node must be different.

    5. Copy the elasticsearch-1.4.4.noarch.rpm to a different node and perform the 1~4 step.

    6. Other configuration, you can go into the Elasticsearch-1.4.4/bin directory to modify the elasticsearch.in.sh (mainly the JVM parameters), ELASTICSEARCH.YML also have many other parameters (cluster, shard, replica and other configuration), Log configuration Modification logging.yml.

Run Elasticsearch
    • $./elasticsearch (front desk operation)

    • $./elasticsearch-d (background process Run)

    • $ sudo service elasticsearch start (RPM installation, service mode start)


Close Elasticsearch
    • Front-end operation, can be stopped by "CTRL + C" key combination

    • Run in the background, can be stopped by the "kill-9 process number", or you can shut down the entire cluster through the REST API Interface "Curl-xpost http://host Ip:9200/_cluster/nodes/_shutdown" Curl-xpost /HTTP/Host ip:9200/_cluster/nodes/node identifier (such as Bjkhlujigopojhih)/_shutdown "to close a single node

    • RPM installation, "sudo service elasticsearch stop" shutdown service

Plug - ins


Site plugin (in Web Form)


    1. Bigdesk Plugin------Monitor the ES status plug-in

    2. Elasticsearch Head Plugin------Very convenient to the ES for various operations of the client, such as various ways of query, index View, node status View, etc.


Plug-in Installation

There are two types of installation, offline installation and online installation, the online installation is relatively convenient, the execution of the command line, but a lot of corporate environment is the intranet, can not access the network, so it is necessary to install offline, offline installation of online information is very little, so this is a benefit to everyone (I tried for a long time to try out). Both installations need to go to the bin directory in the home directory and execute the plugin script.

    • $./plugin-install Mobz/elasticsearch-head (This is the install Head plugin, installed online)

    • $./plugin Install Head-url File:/downloads/elasticsearch-head-master.zip (this is an offline installation, file is behind the plugin path, go to GitHub to download directly, Address is not provided, the time will change, directly on the GIT search on the line)

After the installation is complete, the browser input http://localhost:9200/_plugin/head can open the plugin to see the details, "head" replaced by "Bigdesk", entered is the Bigdesk page.

Participle plugin ikPlug-in Installation

Analysis-ik installation is not so convenient, can not be installed with the command, relatively troublesome.

1. Download Https://github.com/medcl/elasticsearch-analysis-ik

2. Unzip into the directory to execute the "MVN clean package" and generate the target directory.

3. After decompression, copy the Config/ik directory to the config directory of your Elasticsearch home directory (if the RPM installation method, copy to the/etc/esticsearch directory).

4. Edit Config/elasticsearch.yml (rpm installation, edit/etc/elasticsearch/elasticsearch.yml), add the following at the end of the file (Cannot have tab key, only space allowed)

Index:analysis:analyzer:ik:alias: [Ik_analyzer] Type:org.elastics          Earch.index.analysis.IkAnalyzerProvider Ik_max_word:type:ik use_smart:false Ik_smart: Type:ik Use_smart:trueindex.analysis.analyzer.default.type:ik

5. Create a new Analysis-ik directory under the Elasticsearch home Directory plugins directory (without creating a new directory for yourself). Copy unzip package generated file Elasticsearch-analysis-ik-1.2.9.jar (located in target directory) to the new Analysis-ik directory.

6. Copy all the jar packages under Target/releases to the Elasticsearch home directory under the Lib directory.

Test
    • New index named Test "Curl-xput Http://localhost:9200/test"

    • Create mapping for Index

Curl -xpost http://localhost:9200/test/test/_mapping -d ' {     "test":  {          "Properties": {              "Content": {                  "Type"  :  "string",                  "Boost"  : 8.0,                  "Term_vector"  :  "With_positions_ Offsets ",                " Analyzer " : " IK ",                  "Include_in_all"  : true             }        }    }} ' 

    • Test command

Curl ' http://localhost:9200/test/_analyze?analyzer=ik&pretty=true '-d ' {"text": "This is my first elasticsearch cluster"} "

    • Test results

{   "tokens"  : [ {     "token"  :  "text",      "Start_offset"  : 4,     "End_offset"  : 8,      "type"  :  "中文版",     "position"  : 1  }, {      "token"  :  "This is",     "Start_offset"  : 11,      "End_offset"  : 13,     "type"  :  "Cn_word",      "position"  : 2  }, {     "token"  :  "Me",      "Start_offset"  : 13,     "End_offset"  : 14,      "type"  :  "Cn_char",     "position"  : 3   }, {     "token"  :  "first",     "Start_offset"  :  15,    "End_offset"  : 18,     "type"  :  "Cn_word",     " Position " : 4  }, {     token"  :  "first",      "Start_offset"  : 15,     "End_offset"  : 17,      "type"  :  "Cn_word",     "position"  : 5  }, {      "token"  :  "one",     "Start_offset"  : 16,      "End_offset"  : 18,     "type"  :  "Cn_word",      "position"  : 6  }, {     "token"  :  "one" ,     "Start_offset"  : 16,     "End_offset"  : 17,      "type"  :  "Type_cnum",     "position"  : 7   }, {     "token"  :  ",    "  : 17,     "Start_offset" End_offset " : 18,    " type " : " COUNT ",    " Position " : 8  }, {     token"  :  "Elasticsearch",      "Start_offset"  : 18,     "End_offset"  : 31,      "type"  :  "中文版",     "position"  : 9  },  {     "token"  :  "cluster",     "Start_offset"  : 31,      "End_offset"  : 33,     "type"  :  "Cn_word",      "position"  : 10  } ]}


Distributed search engine Elasticsearch installation configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.