An open source distributed search engine---Elasticsearch (not to be continued)

Source: Internet
Author: User
Tags http post

Today, we introduce an open source distributed search engine Elasticsearch.

First, Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. Elasticsearch is the second most popular enterprise search engine developed in Java and published as an open source under the Apache license terms. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use.

We build a website or application, and to add search functionality, what strikes us is that it is difficult to search for work. We want our search solution to be fast, we want to have a 0 configuration and a completely free search mode, we want to be able to simply use JSON indexed data via HTTP, we want our search server to always be available, we want to be able to start one and expand to hundreds of, we want to search in real time, We want simple multi-tenancy and we want to build a cloud-based solution. Elasticsearch is designed to solve all these problems and more. Elasticsearch Getting started is easy. When it was released, it set a lot of reasonable defaults that allowed beginners to avoid the complicated search theory. It can work right away. Even people who have little knowledge of the search can use it for many functions.

Second, Elasticsearch has several core concepts. Understanding these concepts from the outset will be of great help to the whole learning process.

near real-time (NRT): Elasticsearch is a near real-time search platform. This means that there is a slight delay (usually 1 seconds) from indexing a document until the document can be searched.

cluster (cluster): A cluster is organized by one or more nodes that collectively hold your entire data and provide indexing and search functionality together. A cluster is identified by a unique name, which by default is "Elasticsearch". This name is important because a node can only join the cluster by specifying the name of a cluster. It is a good practice to explicitly set this name in the product environment, but it is also good to use the default values for testing/development.

node: A node is a server in your cluster that, as part of a cluster, stores your data and participates in the indexing and searching capabilities of the cluster. A node can be joined to a specified cluster by configuring the cluster name. By default, each node is scheduled to be added to a cluster called "Elasticsearch", which means that if you start several nodes in your network and assume that they can discover each other, they will automatically form and join a "elasticsearch "In the cluster. In a cluster, you can have as many nodes as you want. Also, if you do not currently have any Elasticsearch nodes running on your network, starting a node will default to creating and adding a cluster called "Elasticsearch".

index: An index is a collection of documents that have a few similar characteristics. For example, you can have an index of the customer data, an index of another product catalog, and an index of the order data. An index is identified by a name (which must be all lowercase letters), and is used when we want to index, search, update, and delete the document that corresponds to the index. In a cluster, you can define as many indexes as you want.

type: In an index, you can define one or more types. A type is a logical classification/partition of your index, and its semantics is entirely up to you. Typically, you define a type for a document that has a common set of fields. For example, let's say you run a blogging platform and store all your data in an index. In this index, you can define a type for the user data, define another type for the blog data, and, of course, define another type for the comment data.

document: A document is a basic unit of information that can be indexed. For example, you can have a document for a customer, a document for a product, and, of course, a document for an order. The document is represented in JSON (Javascript Object Notation) format, and JSON is a ubiquitous form of Internet data interaction. In a index/type, you can store as many documents as you want. Note that although a document is physically present in an index, the file must be indexed/given an indexed type.

sharding and Replication (shards & Replicas): An index can store large amounts of data beyond the limits of a single node's hardware. For example, an index with 1 billion documents occupies 1TB of disk space, and either node does not have such large disk space, or a single node processes search requests and responds too slowly. To solve this problem, Elasticsearch provides the ability to divide the index into multiple parts, which are called shards. In a network/cloud environment, failure can occur at any time, when a shard/node is somehow offline, or for any reason, there is a failover mechanism that is very useful and highly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard, which are called replication shards, or directly called replication.

Three

Any other language can use the RESTful API that your favorite Web client can access via 9200 port and Elasticsearch communication. In fact, you can even use the Curl command from the command line (of course you want to know about the Curl command) and Elasticsearch communication.

Elasticsearch provides official clients for several languages, such as Java, Python,. NET, PHP, and so on, and what we want to introduce here is to interact with elasticsearch in the form of restful APIs.

Curl-xget ' Http://localhost:9200/_count?pretty '-d ' {     "query": {        "Match_all": {}}      } '

Description

-xget appropriate HTTP methods or actions: GET, POST, PUT, head, or delete;

HTTP:.........:9200 represents the protocol, hostname and port of any node in the cluster;

_count represents the requested path;

Pretty any optional query string parameter, such as pretty will be beautifully printed in JSON format in response to make it easier to read;

-D represents the HTTP post mode for transmitting data;

The section in {} represents the JSON-formatted request inclusion (which we will use later in this form);

Query represents the keyword in the JSON-formatted request package body;

Match_all represents the field to query in the JSON-formatted request package body.

The Elasticsearch returns a status code like a $ OK and a response in JSON format (except for head requests). The above Curl request will return a response in the following JSON format:

{    "Count": 0,    "_shards": {        "total": 5,        "successful": 5,        "failed": 0    }}

  

An open source distributed search engine---Elasticsearch (not to be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.