Elasticsearch Study Notes-01 Introduction, Installation, configuration and core concepts

Source: Internet
Author: User

First, Introduction

Elasticsearch is an open source, distributed, restful search engine built on Lucene. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use. Supports data indexing using JSON with HTTP.

Lucene is just a framework to take advantage of its functionality, to use Java, and to integrate Lucene into the program. To make things worse, Lucene is very complex and requires a lot of learning to understand how it works.

Elasticsearch uses Lucene as an internal engine, but when using it for full-text search, you only need to use a uniformly developed API, so you don't need to understand how the complex lucene behind it works. So elasticsearch can be seen as a package of lucene. Second, install 1, install JDK

ElasticSearch, referred to as ES, uses Lucene as the internal engine, and Lucene is written in the Pure Java language and therefore runs on the JVM.

Assuming the JDK installation package is located in the/home/test/java directory, install the JDK in Linux:

    • Installing the JDK

Create a Java directory under/usr/as root: sudo mkdir/usr/java

rpm–ivh/home/test/java/jdk-7u67-linux-i586.rpm (install with this command, it will be installed in the/usr/java/directory)

    • Configure the JDK environment variable (requires Administrator privileges):

sudo vi/etc/profile

#JAVA_HOME

Export java_home=/usr/java/jdk1.7.0_67

Export classpath= $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar

Export path= $PATH: $JAVA _home/bin

    • Verifying the installation

Java-version

Outputs the Java version of the information, and indicates that the installation was successful. 2. Download Elasticsearch

Es:http://www.elasticsearch.org/download is currently the latest version: 1.4.1

3, Installation Elasticsearch

Put the installation package in the/homg/test/directory, unzip Elasticsearch-1.4.1.tar:tar–zxvf/home/test/elasticsearch-1.4.1.tar 4, run the Elasticsearch

After unpacking the installation, there will be three folders in the/HOME/TEST/ELASTICSEARCH-1.4.1/directory: Bin, config, Lib

Run the Elasticsearch under the Bin folder to run the ES system:./elasticsearch 5, Attention point

The latest version of ES requires JDK 1.7 and above

If you use the Java API to operate ES, the JDK versions of both should be consistent

III. configuration 1, configuration file description

The configuration file is located in the Config directory, mainly: Elasticsearch.yml (core configuration of ES framework), LOGGING.YML (logging related configuration, using log4j for logging) 2, configuration item detailed

It mainly explains the meanings of each configuration item under ELASTICSEARCH.YML.

################################### Cluster ###################################
# Cluster Name, default is Elasticsearch, in the same network segment, nodes with the same cluster name will automatically become a cluster
#cluster. Name:elasticsearch
################################### #节点 #####################################
# node name, the node name is automatically created when ES is started
#node. Name: "Franz Kafka"


# whether as the primary node, each node can be configured to become the primary node, the default value is True, only if set to true, it is possible to become master, first started, and the parameter set to true node will be elected master
#node. master:true

# Each node can define some common properties associated with it for filtering when a post-cluster is fragmented
#node. rack:rack314


# By default, multiple nodes can be started on the same installation path, if you want your ES to start only one node, you can set the following
#node. max_local_storage_nodes:1


#################################### Index ####################################
# Set the number of shards for an index, the default value is 5
#index. Number_of_shards:5


# set the number by which an index can be copied, the default value is 1
#index. number_of_replicas:1


# When you want to disable the publication, you can set the following
#index. number_of_shards:1
#index. number_of_replicas:0


# The settings of these two properties directly affect the execution of indexes and search operations in the cluster. Assuming you have enough machines to hold fragments and replicas, you can set these two values as follows:
# 1) Having more fragments can improve the index execution capability and allow a large index to be distributed through the machine;
# 2) have more replicators to improve search execution capability and cluster capability.
# for an index, number_of_shards can only be set once, while Number_of_replicas may be incremented or reduced at any time using the index Update settings API
#################################### Paths ####################################
# where the configuration files are located, i.e. where Elasticsearch.yml and Logging.yml are located
#path. conf:/path/to/conf

# The location of the index data assigned to the current node, you can set multiple paths, and the paths are separated by commas
#path. Data:/path/to/data

# Temporary file location
#path. Work:/path/to/work


# where the log files are located
#path. Logs:/path/to/logs

# plug-in installation location
#path. Plugins:/path/to/plugins


#################################### plug-in ###################################
# Plugin Hosting location, if one of the plugins in the list is not installed, the node cannot start
#plugin. Mandatory:mapper-attachments,lang-groovy


################################### Memory ####################################
# Elasticsearch does not perform well when the JVM starts swapping: you need to protect the JVM from swapping,
# Bootstrap.mlockall can be set to true to prohibit swapping
#bootstrap. mlockall:true


############################# #网络和HTTP ###############################
# By default, Elasticsearch uses the 0.0.0.0 address and opens port 9200-9300 for HTTP transmission.
# Open port 9300-9400 for node-to-node communication or set your own IP address
#network. bind_host:192.168.0.1


# Set other nodes to connect to the address of this node, if not set, then automatically get
#network. publish_host:192.168.0.1


# A port that customizes the node's interaction with other nodes
#transport. tcp.port:9300


# when interacting between nodes, you can set whether to compress, default to not compress
#transport. tcp.compress:true


# HTTP Transport Listener custom port
#http. port:9200


# Set the maximum length of content
#http. MAX_CONTENT_LENGTH:100MB


# Disable HTTP
#http. Enabled:false


################################### Gateway ###################################
# Gateway allows the cluster state to be held after all cluster restarts, and changes to the cluster state will be preserved.
# when the cluster is first enabled, it can be read from the gateway to the State, and the default gateways type (also recommended) is the local
#gateway. type:local


# Allow recovery process after n nodes are started
#gateway. recover_after_nodes:1


# Set the time-out for initializing the recovery process
#gateway. recover_after_time:5m


# Set the maximum node that can exist in the cluster
#gateway. Expected_nodes:2


############################# Restore Settings #############################
# Set the concurrency of a node in two cases, one in the initial recovery process
#cluster. routing.allocation.node_initial_primaries_recoveries:4

# Another is when adding, deleting nodes and adjusting
#cluster. Routing.allocation.node_concurrent_recoveries:2


# Set throughput at recovery, 20MB by default
#indices. RECOVERY.MAX_BYTES_PER_SEC:20MB


# Set the maximum number of streams opened when recovering fragments from a peer node
#indices. Recovery.concurrent_streams:5
################################## Discovery ##################################
# Set the number of primary nodes in a cluster, when more than three nodes, the value can be between 2-4
#discovery. zen.minimum_master_nodes:1


# Set a time-out when pinging other nodes, which can be set to a large network when it is slow
#discovery. zen.ping.timeout:3s


# prevents the current node from discovering multiple cluster nodes, the default value is False
#discovery. Zen.ping.multicast.enabled:false

# set up a list of master nodes that can be discovered when a new node is started
#discovery. Zen.ping.unicast.hosts: ["host1", "Host2:port"]

################################# #日志显示 ##################################

# Shard level Query and fetch threshold logging.

#index. search.slowlog.threshold.query.warn:10s
#index. search.slowlog.threshold.query.info:5s
#index. search.slowlog.threshold.query.debug:2s
#index. search.slowlog.threshold.query.trace:500ms

#index. search.slowlog.threshold.fetch.warn:1s
#index. search.slowlog.threshold.fetch.info:800ms
#index. search.slowlog.threshold.fetch.debug:500ms
#index. search.slowlog.threshold.fetch.trace:200ms

#index. indexing.slowlog.threshold.index.warn:10s
#index. indexing.slowlog.threshold.index.info:5s
#index. indexing.slowlog.threshold.index.debug:2s
#index. indexing.slowlog.threshold.index.trace:500ms

################################# #垃圾回收日志记录 ################################

#monitor. jvm.gc.young.warn:1000ms
#monitor. jvm.gc.young.info:700ms
#monitor. jvm.gc.young.debug:400ms

#monitor. jvm.gc.old.warn:10s
#monitor. jvm.gc.old.info:5s
#monitor. jvm.gc.old.debug:2s

################################## Safety ################################

# uncomment if want to enable JSONP as a valid return transport on the
# HTTP Server. With this enabled, it could pose a security risk, so disabling
# It unless need it is recommended (it's disabled by default).
#http. jsonp.enable:true

Iv. Core Concepts
    • Cluster

A cluster is organized by one or more nodes that collectively hold the entire data and provide indexing and search functionality together. A cluster is identified by a unique name, which by default is "Elasticsearch". A node can join this cluster only by specifying the name of a cluster.

    • Node

A node is a server in a cluster that, as part of a cluster, stores your data and participates in the indexing and searching capabilities of the cluster. Like a cluster, a node is also identified by a name, by default, the name is random. This name will be assigned to the node when it is started. This name is important for management, because in this management process, you determine which servers in the network correspond to which nodes in the Elasticsearch cluster.

In a cluster, you can have any number of nodes. Also, if no Elasticsearch node is running on the current network, a node is started and a cluster called "Elasticsearch" is created by default.

    • Index

An index is a collection of documents that have a few similar characteristics. An index is identified by a name (which must be all lowercase letters), and is used when we want to index, search, update, and delete the document that corresponds to the index. Similar to database db.

    • Type

In an index, you can define one or more types. A type is a logical classification/partition of an index, and its semantics are entirely up to you. Typically, you define a type for a document that has a common set of fields. Similar to datasheet table

    • Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a customer, a document for a product, and, of course, a document for an order. The document is represented in JSON (Javascript Object Notation) format, where you can store as many documents as you want in a index/type. Note that although a document is physically present in an index, the document must be indexed/given an indexed type. Similar to a row of records.

    • Slicing and copying

An index can store large amounts of data beyond the limits of a single node's hardware. For example, an index with 1 billion documents occupies 1TB of disk space, and either node does not have such large disk space, or a single node processes search requests and responds too slowly. To solve this problem, Elasticsearch provides the ability to divide the index into multiple parts, which are called shards. When you create an index, you can specify the number of shards you want. Each shard itself is a fully functional and independent "index" that can be placed on any node in the cluster.

    • Summary--Comparison with relational database
关系数据库     ? 数据库 ? 表    ? 行    ? 列(Columns)Elasticsearch  ? 索引   ? 类型  ? 文档  ? 字段(Fields)
V. links to Related Materials

http://www.elasticsearch.org/guide/en/elasticsearch/

http://www.elasticsearch.cn/

Http://learnes.net/getting_started/installing_es.html

Elasticsearch Study Notes-01 Introduction, Installation, configuration and core concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.