Elasticsearch Overview and single-machine elasticsearch installation under Linux

Source: Internet
Author: User
Tags scp command

Original address: http://blog.csdn.net/w12345_ww/article/details/52182264. Copyright belongs to the original author

These two days in the project to involve the use of elasticsearch, on the internet to search for some of this information, found that Elasticsearch installation is divided into single-machine and cluster two ways. In this example, we focus on the installation of Elasticsearch under a single machine, the pro-test can be used, recorded and shared with colleagues.

I. Overview of Elasticsearch

Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. Elasticsearch is a popular enterprise-class search engine developed in Java and published as an open source under the Apache license terms. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use.

We build a website or application, and to add search functionality, what strikes us is that it is difficult to search for work. We want our search solution to be fast, we want to have a 0 configuration and a completely free search mode, we want to be able to simply use JSON indexed data via HTTP, we want our search server to always be available, we want to be able to start one and expand to hundreds of, we want to search in real time, We want simple multi-tenancy and we want to build a cloud-based solution. Elasticsearch is designed to solve all these problems and more. "Baidu Encyclopedia"

A few important core concepts

1, near real-time
Elasticsearch is a near real-time search platform, which means that from indexing this document to the document can be searched to have a slight delay.

2. Clusters and nodes

A cluster is organized by one or more nodes together, they hold your entire data, common is one or more servers, that is, the formation of a cluster, and each independent server is a node, generally in the cluster will have a master node, the rest is the slave node, The primary node and the slave node are responsible for participating in the storage and management of the data.

3. Indexes (Index)

An index is a collection of documents that have a few similar characteristics. For example, I'm sure you've had an experience of looking for a book in a library, and every book has an index, and we can think of each page in each book as a document, so the index is a collection of a series of documents. An index is identified by a name and is used when we want to index the document that corresponds to the index, search, update, and delete. Of course, in a cluster, you can define as many indexes as you want.

4. Types (Type)

In an index, you can define one or more types. A type is a logical classification/partition of an index, and its semantics are entirely up to you. Typically, you define a type for a document that has a common set of fields. Taking the example above, we can define the text in the book as a data type, and define the data as a different data type for the book.

5. Documents (document)

A document is a basic unit of information that can be indexed. The document is represented in JSON format, and JSON is a ubiquitous form of Internet data interaction.

6. Sharding and Replication

An index can store large amounts of data beyond the limits of a single node's hardware. For example, an index with 1 billion documents occupies 1TB of disk space, and either node does not have such large disk space, or a single node processes search requests and responds too slowly.

To solve this problem, Elasticsearch provides the ability to divide the index into multiple parts, which are called shards. When you create an index, you can specify the number of shards you want. Each shard itself is a fully functional and independent "index" that can be placed on any node in the cluster.

Reasons for the importance of sharding:
Allows you to split/expand your content capacity horizontally
Allows you to perform distributed, parallel operations on shards (potentially, on multiple nodes) to improve performance/throughput

In a network/cloud environment, failure can occur at any time, when a shard/node is somehow offline, or for any reason, there is a failover mechanism that is very useful and highly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard, which are called replication shards, or directly called replication.

Reasons for the importance of replication:
High availability is provided in the case of fragmentation/node failure. For this reason, it is important to note that replication shards are never placed on the same node as the original/primary (original/primary) shards.
Expand your search volume/throughput because the search can run in parallel on all replication

In summary, each index can be divided into multiple shards. An index can also be duplicated 0 times (meaning no replication) or multiple times. Once replicated, each index has a primary shard (as the original shard of the replication source) and a copy of the Shard (the copy of the primary shard). The number of shards and copies can be specified when the index is created. After the index is created, you can dynamically change the number of copies at any time, but you cannot change the number of shards afterwards.

By default, each index in Elasticsearch is fragmented by 5 primary shards and one copy, which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and 5 additional replication shards (1 full copies), so that each index has a total of 10 shards.

Second, Linux under the stand-alone Elasticsearch installation

Preparatory work:

Operating system: Centos 7 ip:192.168..18.228

1, install the JDK, and set the environment variables and make it effective (slightly, this step can refer to my previous post "Hadoop+hive installation Configuration" written in very detailed), the JDK installation success:

2, add a new user, in this case the user named es:

[Email protected]e2 ~] Useradd es
[[Email protected] ~] passwd ES

Switch to ES User:

3. Create a new folder and put all the files after it into this folder,

4. Copy the downloaded elasticsearch-1.7.3.tar.gz to the new folder in 3 and unzip it:

How do I upload it? Please refer to my previous post "Hadoop+hive installation Configuration" and use the SCP command to upload the elasticsearch-1.7.3.tar.gz from local to the new folder in 3.

Extract:

TAR-ZXVF elasticsearch-1.7.3.tar.gz

After decompression, and give the ordinary user es, such as:

5, enter the Elasticsearch bin directory, start the Elassticsearch script, such as:

After successful start as shown:

You can also start the background with the following command:

6. Testing

(1) can be viewed through JPS,

(2) Use the following method to view: 192.168.18.228 can sometimes be changed to be localhost, (in addition to the Elascticsearch under the config under the elasticsearch.yml of the network in the replacement of 192.168.18.228, I did not Verify that if it does not appear, replace the YML's network with 192.168.18.228, embarrassing)

If present, the Elasticsearch installation is successful.

7, install plug-in Elasticsearch-servicewrapper convenient management Elasticsearch service:

Download the service folder from Https://github.com/elastic/elasticsearch-servicewrapper and put it in the ES bin directory. How do I upload a folder? Refer to the previous blog post, hehe, with the SCP command.

Report:
./elasticsearch Console ——-front-desk operation
./elasticsearch Start ——-background run
./elasticsearch install ——-Add to System Auto-start
./elasticsearch Remove ——-Cancel Auto-start with system

./elasticsearch Stop

8, the back of the two plug-in head and Bigdesk is for the cluster management, I in this example is only a single-machine installation, so there is no need to install the two plug-ins, there is a need to refer to the following links:

http://blog.csdn.net/a806267365/article/details/51020633

or more links can refer to learning communication:

http://blog.csdn.net/sinat_28224453/article/details/51134978

This article also refers to the following links:

http://blog.csdn.net/cnweike/article/details/33736429

Elasticsearch Overview and single-machine elasticsearch installation under Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.