Elasticsearch's study notes

Last Update:2015-09-29 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before you introduce the usage of Elasticsearch, let's talk about why you should use it. First of all to learn the search engine, certainly inevitably have heard LUCENE,SOLR and Elasticsearch are based on it. Spinx many articles, but the database is too intrusive (plug-in mode). Elasticsearch is one of the most popular distributed search engines of the moment. SOLR has also played a little, and there are many articles. At the same time also hope that through elasticsearch further learning to improve their own distributed learning. More in-depth students can consider starting to learn elk (Elasticsearch, Logstash, Kibana).

Recommendation: "Elasticsearch-definitive-guide"

After reading this book, it is easy to get started, and the book is simple and easy to understand, and there is nothing particularly difficult.

One example is written very well, can be as a global guide to learn, excerpt part of it:

The first thing we need to do is store employee data and each document represents an employee. The behavior of storing data in Elasticsearch is called an index (indexing), but before the index, we need to make clear where the data should be stored.

In Elasticsearch, where documents belong to one type, and these types exist in index , we can draw a simple comparison chart to compare traditional relational databases:
Relational DB -> Databases -> Tables -> Rows -> ColumnsElasticsearch -> Indices   -> Types  -> Documents -> Fields
Elasticsearch clusters can contain multiple indexes (indices)(databases), each of which can contain more than one type (types)(table), each containing more than one document (documents) (rows), and each document contains more than one field (fields)(columns).

distinction of Meaning of "index"
You may have noticed that the term index has a different meaning in elasticsearch, so it's important to make a distinction here:

Index (noun) as mentioned above, an index is like a database in a traditional relational database, where the relevant document is stored, and the complex number of index is indices or indexes .

Index (verb) "index a document" means to store a document in an index (noun) so that it can be retrieved or queried. This is much like a keyword in sql, and the INSERT difference is that if the document already exists, the new document will overwrite the old document.

Inverted index A traditional database adds an index to a particular column, such as a b-tree index, to speed up retrieval. Elasticsearch and Lucene use a data structure called Inverted Index (inverted index) to achieve the same purpose.

By default, all fields in a document are indexed (with an inverted index), so that they are searchable.

After understanding the above index introduction, the next step is to start work. Install Elasticsearch:

Installing the JDK ()
Install Elasticsearch ()
1. Unzip, run \bin\elasticsearch.bat.
2. Browser input http://localhost:9200/, you can see
3. The installation was successful.
Installing Elasticsearch–header plugin Https://github.com/mobz/elasticsearch-head
This plugin is mainly used to manage the index data and status, the documentation is detailed, the installation procedure is very simple, with the command line to the bin directory running plug-in installation
When you're done, open http://localhost:9200/_plugin/head/to see the following management interface

This kind of installation is finished, you can start to write some code to do something, I use nest this client tool, you can choose the tool according to their own language.

In general, we use the search function is full-text index, or there is no need to use a search engine:

            varnode =NewUri ("http://localhost:9200"); varSettings =Newconnectionsettings (node, Defaultindex:"my-application"            ); varClient =Newelasticclient (settings); varperson =NewPerson {Id="2", Firstname="test a genius in Chinese", Lastname="haha, are you a genius? "            }; varindex = client. Index (person, i =i. Index ("Sample-index")                . Type ("Sample-type")                . Id ("1-should-not-be-the-id")                . Refresh (). TTL ("1m")            ); //Query_string is just one of the most commonly used queries, the operator enumeration of OR and and can be used according to the business requirements of full-text indexing, the following query is for the simplest full-text indexing interface            varsearchresults = client. Search<person> (s = =S. Index ("Sample-index")                . Type ("Sample-type")                . Query (q= = q.querystring (qs = qs. Query ("Genius"). Defaultoperator (Operator.and)). From (0)//Pagination Page Number. Size (Ten)//Paging Size            ); foreach(varIteminchsearchresults.documents) {Console.WriteLine ("Item:"+Newtonsoft.Json.JsonConvert.SerializeObject (item)); } console.readline ();

Operation Result:

The above only used to QueryString, of course, its query method is far more than these, the specific query method I do not swim here. Recommended to read the document. Attach Nest's Documentation: http://nest.azurewebsites.net/nest/quick-start.html

Elasticsearch is committed to hiding the complexities of distributed systems. The following actions are done automatically at the bottom:

Partition your documents into different containers or shards (shards) , which can exist in one or more nodes.
Distribute the shards evenly to each node, and load balance the indexes and searches.
Redundancy of each shard to prevent data loss due to hardware failure.
Route requests on any node in the cluster to the node where the corresponding data resides.
Whether you are adding nodes or removing nodes, shards can be seamlessly scaled and migrated.

You can see the following default shard conditions:


It is obvious that the above shards are all on the same node, and the Shards form a cluster. Where node itself is likely to have a primary shard and replication shards, and nodes and nodes may be connected through the Shard.
There are also a number of key cluster modules, cluster health monitoring, Elasticsearch health There are three states: green , yellow or red . Restricts the JVM Heap sized that are consumed when query executes, and so on.
For the distribution of this piece can not be expanded here, specifically to learn the text mentioned in the book, there are detailed introduction. and performance, because the company's business is not up to this order of magnitude, temporarily unable to give a better contrast. Chipa, no matter solr, or es are OK. I'm afraid we have to weigh in and optimize it.

Elasticsearch's study notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More