Elasticsearch is a distributed and extensible real-time search and analysis engine, Elasticsearch installation configuration and Chinese word segmentation

Source: Internet
Author: User

http://fuxiaopang.gitbooks.io/learnelasticsearch/content/(English)

In Elasticsearch, document terminology is a type, and a variety of types exist in an index . You can also get some general similarities by analogy to traditional relational databases:

关系数据库     ⇒ 数据库 ⇒ 表    ⇒ 行    ⇒ 列(Columns)Elasticsearch  ⇒ 索引   ⇒ 类型  ⇒ 文档  ⇒ 字段(Fields)
一个Elasticsearch集群可以包含多个索引(数据库),也就是说其中包含了很多类型(表)。这些类型中包含了很多的文档(行),然后每个文档中又包含了很多的字段(列)

Elasticsearch is a distributed and extensible real-time search and analysis engine. It can help you search, analyze, and browse data, and often people don't anticipate the need for these features at the start of a project. Elasticsearch's appearance is to re-give the hard disk seemingly useless raw data new vitality.

Whether you need full-text search, real-time statistics for structured data, or a combination of both, this guide will help you understand the most basic concepts and start learning elasticsearch from the basics. After that, we will gradually begin to explore more complex search techniques that you can learn at your own pace.

Elasticsearch is not simply a full-text search. We'll introduce you to structured search, statistics, query filtering, geo-location, AutoComplete, and tips you're not looking for. We'll also explore how to model data to improve the performance of elasticsearch and how to configure and monitor your cluster in a production environment.

Elasticsearch is an open source, distributed, restful search engine built on Lucene. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use. Supports data indexing using JSON with HTTP.

We build a website or application, and to add search functionality, what strikes us is that it is difficult to search for work. We want our search solution to be fast, we want to have a 0 configuration and a completely free search mode, we want to be able to simply use JSON indexed data via HTTP, we want our search server to always be available, we want to be able to start one and expand to hundreds of, we want to search in real time, We want simple multi-tenancy and we want to build a cloud-based solution. Elasticsearch is designed to solve all these problems and more.

Installation

Take the Windows operating system and the ES0.19.7 version as an example:

① Download Elasticsearch-0.19.7.zip

② extract directly to a directory, set the directory as Es_home environment variable

③ Installing the JDK and setting the JAVA_HOME environment variable

④ under Windows, run%es_home%\bin\elasticsearch.bat to run

Distributed search Elasticsearch stand-alone and server environment construction

First to http://www.elasticsearch.org/download/download the latest version of the Elasticsearch run package, this article is the latest 0.19.1, the author is a very diligent person, ES update is very frequent, bug fixes quickly. After downloading the unpacked there are three packages: the bin is the running script, config is the setup file, and Lib is the dependent package. If you want to install a plug-in, you need to create a new plugins folder, put the plugin into this folder.

1. Stand-alone environment:

Standalone version of the Elasticsearch run is very simple, Linux under the direct Bin/elasticsearch run, Windows run Bin/elasticsearch.bat. If you are running the Elasticsearch cluster in the LAN is also very simple, as long as the cluster.name settings consistent, and the machine in the same network segment, the boot ES will automatically find each other, forming a cluster.

2. Server environment:

If this ES plugin is available on the server, it supports the use of parameters, specifying that ES is run in the background or foreground, and supports starting, stopping, and restarting the ES service (the default ES script can only be turned off ES by CTRL + C). Elasticsearch-servicewrapper Use this method to download the service folder to Https://github.com/elasticsearch/elasticsearch-servicewrapper and place it in the ES bin directory. Here is the command collection:
Bin/service/elasticsearch +
Console running ES in the foreground
Start runs in the background es
Stop Stop ES
Install enables ES to start automatically when the server starts as a service
Remove starts automatically when cancel starts

In the service directory there is a elasticsearch.conf configuration file, mainly to set some Java operating environment parameters, which is more important is the following

Parameters:

#es的home路径, you don't have to use the default value to
Set.default.es_home=<path to ElasticSearch home>

#分配给es的最小内存
set.default.es_min_mem=256

#分配给es的最大内存
set.default.es_max_mem=1024


# Startup Wait Time-out (in seconds)
wrapper.startup.timeout=300

# time to close wait timeout (in seconds)

wrapper.shutdown.timeout=300

# ping time-out (in seconds)

wrapper.ping.timeout=300

Installing plugins

Take head plugin as an example:

When connected, run%es_home%\bin\plugin-install Mobz/elasticsearch-head directly

When not connected, download Elasticsearch-head's Zipball Master package and extract the contents to the%es_home%\plugin\head\_site directory, [the plugin is the site type plugin]

Installation complete, restart the service, open http://localhost:9200/_plugin/head/in the browser

ES concept

Cluster

Represents a cluster, there are multiple nodes in the cluster, there is a primary node, the main node can be elected, the master-slave node for the internal cluster. One of the concepts of ES is to center, literally understand that there is no central node, this is for the outside of the cluster, because the ES cluster from the outside, in a logical whole, you communicate with any one node and the entire ES cluster communication is equivalent.

Shards

Represents the index Shard, es can divide a complete index into multiple shards, the advantage is that a large index can be split into multiple, distributed to different nodes. constitute a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created.

Replicas

Represents a copy of the index, ES can set a copy of multiple indexes, the role of a copy is to improve the system's fault tolerance, when a node a shard corruption or loss can be recovered from the replica. The second is to improve the query efficiency of ES, ES will automatically load balance the search request.

Recovery

Represents data recovery or redistribution of data, ES when a node joins or exits the index shards are redistributed based on the load of the machine, and data recovery occurs when the node is restarted.

River

Represents a data source for ES and is also a way to synchronize data to ES with other storage methods (such as databases). It is an ES service that exists in plug-in mode, by reading the data in the river and indexing it into ES, the official river is couchdb, RABBITMQ, Twitter, Wikipedia.

Gateway

Represents the persistent storage of ES indexes, es default is to store the index in memory, and then persist to the hard disk when the memory is full. When the ES cluster is shut down and restarted, the index data is read from the gateway. ES supports multiple types of gateway, with local file system (default), Distributed File System, Hadoop HDFs and Amazon's S3 cloud storage service.

Discovery.zen

Represents the automatic discovery node mechanism of ES, ES is a peer-based system that first searches for existing nodes by broadcasting, and then communicates between nodes through multicast protocols, and also supports point-to-point interactions.

Transport

Represents the way in which ES internal nodes or clusters interact with the client, and by default it interacts with the TCP protocol, and it supports transport protocols (integrated via plug-ins) for the HTTP protocol (JSON format), thrift, servlet, memcached, ZEROMQ, and so on.

Distributed Search Elasticsearch Chinese word segmentation integration

Elasticsearch official only provide SMARTCN this Chinese word breaker, the effect is not very good, fortunately, there are MEDCL in the country (one of the earliest research es) written two Chinese word-breaker, one is IK, one is mmseg, the following respectively introduce the use of the two, Actually all the same, install the plugin first, command line:
To install the IK plugin:

Plugin-install medcl/elasticsearch-analysis-ik/1.1.0

Download IK related configuration dictionary file to config directory

    1. CD Config
    2. wget Http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip--no-check-certificate
    3. Unzip Ik.zip
    4. RM Ik.zip

To install the MMSEG plugin:

    1. Bin/plugin-install medcl/elasticsearch-analysis-mmseg/1.1.0

Download the relevant configuration dictionary file to the Config directory

    1. CD Config
    2. wget Http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip--no-check-certificate
    3. Unzip Mmseg.zip
    4. RM Mmseg.zip

Participle configuration

IK participle configuration, in the Elasticsearch.yml file, add

    1. Index
    2. Analysis
    3. Analyzer
    4. Ik
    5. Alias: [Ik_analyzer]
    6. Type:org.elasticsearch.index.analysis.IkAnalyzerProvider

Or

    1. Index.analysis.analyzer.ik.type: "Ik"

The meaning of the two sentences is the same
MMSEG participle configuration, also in the Elasticsearch.yml file

    1. Index
    2. Analysis
    3. Analyzer
    4. MMSEG:
    5. Alias: [News_analyzer, Mmseg_analyzer]
    6. Type:org.elasticsearch.index.analysis.MMsegAnalyzerProvider

Or

    1. Index.analysis.analyzer.default.type: "Mmseg"

Mmseg participle also has some more personalized parameters set as follows

    1. index:  
    2.   analysis:  
    3.      tokenizer:  
    4.       mmseg_maxword:  
    5.           type: mmseg  
    6.            seg_type:  "Max_word" &NBSP;&NBSP;
    7.        mmseg_complex:  
    8.            type: mmseg  
    9.            seg_type:  "Complex" &NBSP;&NBSP;
    10.       mmseg_simple:  
    11.           type: mmseg  
    12.           seg_type:  "Simple"   

When the plug-in installation is complete, the plug-in will be loaded when the boot es is finished.

Define Mapping

You can define a word breaker when you add an indexed mapping

  1. {
  2. "Page": {
  3. "Properties": {
  4. "title": {
  5. ' Type ': ' String ',
  6. "Indexanalyzer": "Ik",
  7. "Searchanalyzer": "IK"
  8. },
  9. "Content": {
  10. ' Type ': ' String ',
  11. "Indexanalyzer": "Ik",
  12. "Searchanalyzer": "IK"
  13. }
  14. }
  15. }
  16. }

Indexanalyzer is the word breaker used when indexing, Searchanalyzer is the word breaker used when searching.

The Java mapping code is as follows:

  1. Xcontentbuilder content = Xcontentfactory.jsonbuilder (). StartObject ()
  2. . StartObject ("page")
  3. . StartObject ("Properties")
  4. . StartObject ("title")
  5. . Field ("Type", "string")
  6. . Field ("Indexanalyzer", "IK")
  7. . Field ("Searchanalyzer", "IK")
  8. . EndObject ()
  9. . StartObject ("code")
  10. . Field ("Type", "string")
  11. . Field ("Indexanalyzer", "IK")
  12. . Field ("Searchanalyzer", "IK")
  13. . EndObject ()
  14. . EndObject ()
  15. . EndObject ()
  16. . EndObject ()

After the definition is finished, the operation index will be participle with the specified word breaker.

Report:

IK participle plugin project address: Https://github.com/medcl/elasticsearch-analysis-ik

Mmseg word breaker add-on project address: https://github.com/medcl/elasticsearch-analysis-mmseg

If you feel the configuration is troublesome, you can also download a configured ES version, the address is as follows: Https://github.com/medcl/elasticsearch-rtf

basic usage of Elasticsearch
The biggest features:
1. Database is the index
2. The table of the database is the tag
3. Do not use browser to perform client operations using curl. Otherwise Java heap Ooxx will appear ...

Curl:-X back with Restful:get, POST ...
-D followed by data. (d = data to send)

1. Create:

Specify an ID to create a new record. (Looks like put, post all can)
$ curl-xpost LOCALHOST:9200/FILMS/MD/2-d '
{"Name": "Hei yi Ren", "tag": "Good"} '

Establish a new record with an automatically generated ID:
$ curl-xpost localhost:9200/films/md-d '
{"Name": "Ma da jia si jia3", "tag": "Good"} '

2. Enquiry:
2.1 Query All index, type:
$ Curl Localhost:9200/_search?pretty=true

2.2 Query all the type under one index:
$ Curl Localhost:9200/films/_search

2.3 Query for all records under a type under one index:
$ Curl Localhost:9200/films/md/_search?pretty=true

2.4 Queries with parameters:
$ Curl Localhost:9200/films/md/_search?q=tag:good
{"Took": 7, "timed_out": false, "_shards": {"Total": 5, "successful": 5, "failed": 0}, "hits": {"Total": 2, "Max_score": 1.0, "Hits": [{"_index": "Film", "_type": "MD", "_id": "2", "_score": 1.0, "_source":
{"Name": "Hei yi Ren", "tag": "Good"}},{"_index": "Film", "_type": "MD", "_id": "1", "_score": 0.30685282, "_source":
{"Name": "Ma da jia si jia", "tag": "Good"}}]}}

2.5 queries using JSON parameters: (Note the query and term keywords)
$ Curl localhost:9200/film/_search-d '
{"Query": {"term": {"tag": "Bad"}} '

3. Update
$ curl-xput LOCALHOST:9200/FILMS/MD/1-D {... (data) ...}

4. Delete. Delete all of:
$ curl-xdelete Localhost:9200/films

Elasticsearch is a distributed and extensible real-time search and analysis engine, Elasticsearch installation configuration and Chinese word segmentation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.