Give us a brief introduction: Elasticsearch

Source: Internet
Author: User

Elasticsearch is an open-source system with search engines and NoSQL database features that has sprung up in the last two years, built on Java/lucene. Recently studied, feel Elasticsearch architecture and its open source ecological construction have a lot to learn from, so organized into article sharing under. The Code and schema analysis of this article is based primarily on the latest stable version of Elasticsearch 2.X.

Elasticsearch look at the name can probably understand that it is a flexible search engine. First, the implied meaning of elasticity is distributed, the single-machine system is not able to play up, and then add a flexible mechanism, is the meaning of the Elastic contained here. Its search storage function is mainly provided by Lucene, and Lucene is the equivalent of its storage engine, which encapsulates indexes, queries, and distributed related interfaces.

Several concepts in the Elasticsearch

Cluster (Cluster) a group of nodes that have a common Cluster name.

A Elasticearch instance in a node cluster.

Index is equivalent to the database concept in relational databases and can contain multiple indexes in a cluster. This is a logical concept.

A subset of the primary shard (Primary Shard) index, which can be sliced into multiple shards and distributed across different cluster nodes. A shard corresponds to an index in Lucene.

Replica shards (Replica shard) Each primary shard can have one or more replicas.

Type is equivalent to the table concept in the database, mapping is for type. Multiple Type can be contained in the same index.

Mapping is equivalent to the schema in the database, which is used to constrain the type of the field, but Elasticsearch's Mapping can be automatically created based on the data.

Document is equivalent to row in the database.

field is equivalent to column in the database.

Allocation (Allocation) the process of assigning shards to a node, including allocating primary shards or replicas. If it is a replica, it also contains the process of copying data from the primary shard.

Search Engines

In addition to supporting the search function of Lucene itself, Elasticsearch has made some extensions on top of it. 1. Scripting support
Elasticsearch supports groovy scripts by default, and expands the Lucene scoring mechanism to easily support complex custom scoring algorithms. It only supports sandboxed scripting languages (such as Lucene expression,mustache) by default, and groovy must be explicitly set before it can be turned on. The security mechanism of groovy is to control permissions through the Java.security.AccessControlContext setting a class whitelist, while the 1.x version is a whitelist filter of its own, but the restriction policy has a vulnerability that causes a remote code execution vulnerability. 2. By default, a _all field is generated and the values of all other fields are stitched together. This allows you to search without specifying a field and facilitates cross-field retrieval. 3. Suggester Elasticsearch through an extended indexing mechanism, you can implement auto-complete suggestion like Google and the suggestion of search word error correction.

NoSQL Database

Elasticsearch can be used as a database, relying primarily on its following features:

The original data is saved in the index by default and can be obtained. This relies primarily on Lucene's store functionality.

Translog provides real-time data read capability and complete data persistence capability (the data is still not lost if the server is out of the ordinary). Lucene because there is indexwriter buffer, if the process is abnormally hung, the data in buffer will be lost. So Elasticsearch through Translog to ensure that data is not lost. When the document is read directly through the ID, Elasticsearch attempts to read from the Translog before it is read from the index. That is, even if the data in the buffer has not been flushed to the index, it can still provide real-time data read capability. Elasticsearch Translog defaults to Fsync once per write request and has a scheduled task detection (default 5 seconds). If the business scenario requires greater write throughput, you can tune the Translog-related configuration for optimization.

Strong, the Kibana in its biosphere is mainly dependent on aggregation to achieve data analysis and visualization.

Typical application Scenario One: Cloud analytics Business

Solution: Set the number of shards separately based on the index size and take full advantage of the type merge index

In addition to the word breaker field, all other fields are stored as Doc value, master node, data node, client node Detach deployment conservative settings Fielddata memory footprint, and other memory usage limits

Set the fielddata validity period.

Typical application Scenario two: Casio business

Solution:

Automatically match unknown fields using dynamic mapping

Data distribution to all nodes bulk Import

Use all Doc value storage to reduce memory consumption

Use templates to automatically create indexes at the day and hour levels

SSD and SATA packet, cold data automatically migrated periodically

Give us a brief introduction: Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.