A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
In Elasticsearch, document terminology is a type, and a variety of types exist in an index . You can also get some general similarities by analogy to traditional relational databases:
关系数据库 ⇒ 数据库 ⇒ 表 ⇒ 行 ⇒ 列(Columns)Elasticsearch ⇒ 索引 ⇒ 类型 ⇒ 文档 ⇒ 字段(Fields)
Elasticsearch is a distributed and extensible real-time search and analysis engine. It can help you search, analyze, and browse data, and often people don't anticipate the need for these features at the start of a project. Elasticsearch's appearance is to re-give the hard disk seemingly useless raw data new vitality.
Whether you need full-text search, real-time statistics for structured data, or a combination of both, this guide will help you understand the most basic concepts and start learning elasticsearch from the basics. After that, we will gradually begin to explore more complex search techniques that you can learn at your own pace.
Elasticsearch is not simply a full-text search. We'll introduce you to structured search, statistics, query filtering, geo-location, AutoComplete, and tips you're not looking for. We'll also explore how to model data to improve the performance of elasticsearch and how to configure and monitor your cluster in a production environment.
Elasticsearch is an open source, distributed, restful search engine built on Lucene. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use. Supports data indexing using JSON with HTTP.
We build a website or application, and to add search functionality, what strikes us is that it is difficult to search for work. We want our search solution to be fast, we want to have a 0 configuration and a completely free search mode, we want to be able to simply use JSON indexed data via HTTP, we want our search server to always be available, we want to be able to start one and expand to hundreds of, we want to search in real time, We want simple multi-tenancy and we want to build a cloud-based solution. Elasticsearch is designed to solve all these problems and more.Installation
Take the Windows operating system and the ES0.19.7 version as an example:
① Download Elasticsearch-0.19.7.zip
② extract directly to a directory, set the directory as Es_home environment variable
③ Installing the JDK and setting the JAVA_HOME environment variable
④ under Windows, run%es_home%\bin\elasticsearch.bat to run
Distributed search Elasticsearch stand-alone and server environment construction
First to http://www.elasticsearch.org/download/download the latest version of the Elasticsearch run package, this article is the latest 0.19.1, the author is a very diligent person, ES update is very frequent, bug fixes quickly. After downloading the unpacked there are three packages: the bin is the running script, config is the setup file, and Lib is the dependent package. If you want to install a plug-in, you need to create a new plugins folder, put the plugin into this folder.
1. Stand-alone environment:
Standalone version of the Elasticsearch run is very simple, Linux under the direct Bin/elasticsearch run, Windows run Bin/elasticsearch.bat. If you are running the Elasticsearch cluster in the LAN is also very simple, as long as the cluster.name settings consistent, and the machine in the same network segment, the boot ES will automatically find each other, forming a cluster.
2. Server environment:
If this ES plugin is available on the server, it supports the use of parameters, specifying that ES is run in the background or foreground, and supports starting, stopping, and restarting the ES service (the default ES script can only be turned off ES by CTRL + C). Elasticsearch-servicewrapper Use this method to download the service folder to Https://github.com/elasticsearch/elasticsearch-servicewrapper and place it in the ES bin directory. Here is the command collection:
Console running ES in the foreground
Start runs in the background es
Stop Stop ES
Install enables ES to start automatically when the server starts as a service
Remove starts automatically when cancel starts
In the service directory there is a elasticsearch.conf configuration file, mainly to set some Java operating environment parameters, which is more important is the following
#es的home路径, you don't have to use the default value to
Set.default.es_home=<path to ElasticSearch home>
# Startup Wait Time-out (in seconds)
# time to close wait timeout (in seconds)
# ping time-out (in seconds)
Take head plugin as an example:
When connected, run%es_home%\bin\plugin-install Mobz/elasticsearch-head directly
When not connected, download Elasticsearch-head's Zipball Master package and extract the contents to the%es_home%\plugin\head\_site directory, [the plugin is the site type plugin]
Installation complete, restart the service, open http://localhost:9200/_plugin/head/in the browserES concept
Represents a cluster, there are multiple nodes in the cluster, there is a primary node, the main node can be elected, the master-slave node for the internal cluster. One of the concepts of ES is to center, literally understand that there is no central node, this is for the outside of the cluster, because the ES cluster from the outside, in a logical whole, you communicate with any one node and the entire ES cluster communication is equivalent.
Represents the index Shard, es can divide a complete index into multiple shards, the advantage is that a large index can be split into multiple, distributed to different nodes. constitute a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created.
Represents a copy of the index, ES can set a copy of multiple indexes, the role of a copy is to improve the system's fault tolerance, when a node a shard corruption or loss can be recovered from the replica. The second is to improve the query efficiency of ES, ES will automatically load balance the search request.
Represents data recovery or redistribution of data, ES when a node joins or exits the index shards are redistributed based on the load of the machine, and data recovery occurs when the node is restarted.
Represents a data source for ES and is also a way to synchronize data to ES with other storage methods (such as databases). It is an ES service that exists in plug-in mode, by reading the data in the river and indexing it into ES, the official river is couchdb, RABBITMQ, Twitter, Wikipedia.
Represents the persistent storage of ES indexes, es default is to store the index in memory, and then persist to the hard disk when the memory is full. When the ES cluster is shut down and restarted, the index data is read from the gateway. ES supports multiple types of gateway, with local file system (default), Distributed File System, Hadoop HDFs and Amazon's S3 cloud storage service.
Represents the automatic discovery node mechanism of ES, ES is a peer-based system that first searches for existing nodes by broadcasting, and then communicates between nodes through multicast protocols, and also supports point-to-point interactions.
Represents the way in which ES internal nodes or clusters interact with the client, and by default it interacts with the TCP protocol, and it supports transport protocols (integrated via plug-ins) for the HTTP protocol (JSON format), thrift, servlet, memcached, ZEROMQ, and so on.
Distributed Search Elasticsearch Chinese word segmentation integration
Elasticsearch official only provide SMARTCN this Chinese word breaker, the effect is not very good, fortunately, there are MEDCL in the country (one of the earliest research es) written two Chinese word-breaker, one is IK, one is mmseg, the following respectively introduce the use of the two, Actually all the same, install the plugin first, command line:
To install the IK plugin:
Download IK related configuration dictionary file to config directory
To install the MMSEG plugin:
Download the relevant configuration dictionary file to the Config directory
IK participle configuration, in the Elasticsearch.yml file, add
The meaning of the two sentences is the same
MMSEG participle configuration, also in the Elasticsearch.yml file
Mmseg participle also has some more personalized parameters set as follows
When the plug-in installation is complete, the plug-in will be loaded when the boot es is finished.
You can define a word breaker when you add an indexed mapping
Indexanalyzer is the word breaker used when indexing, Searchanalyzer is the word breaker used when searching.
The Java mapping code is as follows:
After the definition is finished, the operation index will be participle with the specified word breaker.
IK participle plugin project address: Https://github.com/medcl/elasticsearch-analysis-ik
Mmseg word breaker add-on project address: https://github.com/medcl/elasticsearch-analysis-mmseg
If you feel the configuration is troublesome, you can also download a configured ES version, the address is as follows: Https://github.com/medcl/elasticsearch-rtfbasic usage of Elasticsearch
Elasticsearch is a distributed and extensible real-time search and analysis engine, Elasticsearch installation configuration and Chinese word segmentation
Start building with 50+ products and up to 12 months usage for Elastic Compute Service