Give us a brief introduction: Elasticsearch

Last Update:2016-12-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Elasticsearch is an open-source system with search engines and NoSQL database features that has sprung up in the last two years, built on Java/lucene. Recently studied, feel Elasticsearch architecture and its open source ecological construction have a lot to learn from, so organized into article sharing under. The Code and schema analysis of this article is based primarily on the latest stable version of Elasticsearch 2.X.

Elasticsearch look at the name can probably understand that it is a flexible search engine. First, the implied meaning of elasticity is distributed, the single-machine system is not able to play up, and then add a flexible mechanism, is the meaning of the Elastic contained here. Its search storage function is mainly provided by Lucene, and Lucene is the equivalent of its storage engine, which encapsulates indexes, queries, and distributed related interfaces.

Several concepts in the Elasticsearch

Cluster (Cluster) a group of nodes that have a common Cluster name.

A Elasticearch instance in a node cluster.

Index is equivalent to the database concept in relational databases and can contain multiple indexes in a cluster. This is a logical concept.

A subset of the primary shard (Primary Shard) index, which can be sliced into multiple shards and distributed across different cluster nodes. A shard corresponds to an index in Lucene.

Replica shards (Replica shard) Each primary shard can have one or more replicas.

Type is equivalent to the table concept in the database, mapping is for type. Multiple Type can be contained in the same index.

Mapping is equivalent to the schema in the database, which is used to constrain the type of the field, but Elasticsearch's Mapping can be automatically created based on the data.

Document is equivalent to row in the database.

field is equivalent to column in the database.

Allocation (Allocation) the process of assigning shards to a node, including allocating primary shards or replicas. If it is a replica, it also contains the process of copying data from the primary shard.

Search Engines

In addition to supporting the search function of Lucene itself, Elasticsearch has made some extensions on top of it. 1. Scripting support
Elasticsearch supports groovy scripts by default, and expands the Lucene scoring mechanism to easily support complex custom scoring algorithms. It only supports sandboxed scripting languages (such as Lucene expression,mustache) by default, and groovy must be explicitly set before it can be turned on. The security mechanism of groovy is to control permissions through the Java.security.AccessControlContext setting a class whitelist, while the 1.x version is a whitelist filter of its own, but the restriction policy has a vulnerability that causes a remote code execution vulnerability. 2. By default, a _all field is generated and the values of all other fields are stitched together. This allows you to search without specifying a field and facilitates cross-field retrieval. 3. Suggester Elasticsearch through an extended indexing mechanism, you can implement auto-complete suggestion like Google and the suggestion of search word error correction.

NoSQL Database

Elasticsearch can be used as a database, relying primarily on its following features:

The original data is saved in the index by default and can be obtained. This relies primarily on Lucene's store functionality.

Translog provides real-time data read capability and complete data persistence capability (the data is still not lost if the server is out of the ordinary). Lucene because there is indexwriter buffer, if the process is abnormally hung, the data in buffer will be lost. So Elasticsearch through Translog to ensure that data is not lost. When the document is read directly through the ID, Elasticsearch attempts to read from the Translog before it is read from the index. That is, even if the data in the buffer has not been flushed to the index, it can still provide real-time data read capability. Elasticsearch Translog defaults to Fsync once per write request and has a scheduled task detection (default 5 seconds). If the business scenario requires greater write throughput, you can tune the Translog-related configuration for optimization.

Strong, the Kibana in its biosphere is mainly dependent on aggregation to achieve data analysis and visualization.

Typical application Scenario One: Cloud analytics Business

Solution: Set the number of shards separately based on the index size and take full advantage of the type merge index

In addition to the word breaker field, all other fields are stored as Doc value, master node, data node, client node Detach deployment conservative settings Fielddata memory footprint, and other memory usage limits

Set the fielddata validity period.

Typical application Scenario two: Casio business

Solution:

Automatically match unknown fields using dynamic mapping

Data distribution to all nodes bulk Import

Use all Doc value storage to reduce memory consumption

Use templates to automatically create indexes at the day and hour levels

SSD and SATA packet, cold data automatically migrated periodically

Give us a brief introduction: Elasticsearch

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Give us a brief introduction: Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Give us a brief introduction: Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support