Developing C + + interfaces for Elasticsearch

Last Update:2018-08-14 Source: Internet

Author: User

Tags elastic search

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, what is Elasticsearch?

Elasticsearch is the first choice of open source full-text search engine, it can quickly store, search and analyze massive data. Stack Overflow,github, etc. are in use.

Elasticsearch is written in Java and uses Lucene for indexing and searching internally, but it makes full-text retrieval simple, and instead provides a simple, consistent set of RESTful APIs by hiding the complexity of lucene.

ES provides the client api:https://www.elastic.co/guide/en/elasticsearch/client/index.html

Multiple languages included:

Note: There is no C + + interface, and we need to operate ES based on C + +

II. installation and Deployment 1. Server Selection

Equipment ip:10.3.246.224

System: linux-64

Disk space: None (Df–h found disk no capacity)

2. Resolving disk Fills

Du-sh/* | SORT–NR: Find out the maximum capacity of the system in the folder, the system 15scpp This folder accounted for 41G

du-sh/15scpp/* | SORT-NR: Find the largest 15scpp_testbin folder in 15scpp, accounting for 31G

Make sure this folder is useless, rm delete

PS: In Linux, when we use RM to delete large files on Linux, but if a process opens the large file, but does not close the file handle, then the Linux kernel will not release the disk space of this file. Find the file usage process, kill it, and then

3. JDK8 Installation

Elastic requires a Java 8 environment

Java–version See if Java is installed or current version

Download jdk-8u181-linux-x64.tar.gz, unzip, install, set environment variables, here do not repeat.

1. Download ES

Download Https://www.elastic.co/downloads/elasticsearch:

Elasticsearch-6.3.2.tar.gz

Tar decompression

2. Create a new user

because the security issue Elasticsearch not allow the root user to run directly, create a new user :

AddUser pwrd-es

passwd pwrd-es Pwrd-es

3. Modify the installation file permissions

Chown-r pwrd-es:pwrd-es/home/elasticsearch/elasticsearch-6.3.2

Because of security issues, the root user is not allowed to perform the installation, but other users do not have permission to file operations, and therefore change the

4. Modify the ES configuration

Vim config/elasticsearch.yml

Cluster.name:pwrd-es//Elasticsearch automatically discovers the Elasticsearch node under the same network segment, uses this attribute to distinguish between different clusters, cluster.name the same group to build a cluster

Node.name:node-1//node name, default randomly specifies a name in the name list, cannot be repeated

Node.master:true//Specifies whether the node is eligible to be elected node, the default is True,es is the first machine in the default cluster is master, and if this machine hangs, it will be re-elected master

Node.data:true//Specifies whether the node stores index data, which is true by default

Path.data:/home/elasticsearch/log_export/data//Storage data

Path.logs:/home/elasticsearch/log_export/logs//Storage Log

network.host:10.3.246.224//set to 0.0.0.0 in curl can use localhost

HTTP.PORT:9200//Listening port

Index.number_of_shards:5//Set default index number of shards, default to 5 slices

Index.number_of_replicas:1//Set default number of index replicas, default is 1 copies

5. Modify the system maximum virtual memory

Vim/etc/sysctl.conf

Vm.max_map_count = 655360

Sysctl-p

6. Switch User Execution Program

Switch User: Su pwrd-es

./elasticsearch

7. Verifying the installation effect

Enter 10.3.246.224:9200 in the browser

Curl ' Http://localhost:9200/?pretty '

You can see a JSON data that is successful for installation

Ps:pretty is formatted for JSON so that the returned results look good

First, the basic concept of 1. Node and Cluster

Elastic is essentially a distributed database that allows multiple servers to work together, and each server can run multiple Elastic instances.

A single Elastic instance is called a node. A group of nodes form a cluster (cluster)

The name of the cluster is configured by the Cluster.name property, which uniquely identifies a cluster, different clusters, and its cluster.name are different, and all nodes with the same name cluster automatically form a cluster. When a node is started, the node automatically looks for the main node of the same cluster name in the current LAN, and if the primary node is found, the node is added to the cluster, and if no primary node is found, the node becomes the primary node .

2. Index

ES basic structure is: INDEX/TYPE/ID, document (typically stores data in JSON style)

So index (index) is the top-level unit of elastic data management, which is a synonym for a single database.

Es indexes all fields and writes a reverse index (inverted index) after processing. When looking for data, look directly at the index.

PS: Each Index (that is, the database) must have a lowercase name.

You can view the current node index with the following command:

Curl-x GET ' http://localhost:9200/_cat/indices?v '

3. Document

A single record in index is called document. Many of the Document forms an Index.

Document is expressed in JSON format, as follows:

{

"Name": "Zhang San",

"Age": 18,

"Sex": "Male"

}

PS: For JSON to be written and formatted correctly, you can use the online JSON tool: http://www.bejson.com/

4. Type

Type can be used to classify document, for example, China/beijing/id-i->doc-n

China/shanghai/id-j->doc-m

A different Type under the same index should have a similar structure.

However, Elastic version 6.x only allows each Index to contain one version of type,7.x, which will remove the Type completely.

We are currently deploying the latest 6.3.2 version

5. Shared shards

When there is too much data under one index, more than the disk space that a single node can provide, ES provides sharding capabilities to store massive data shards in different nodes in the cluster. When you query an index across multiple shards, ES sends the query to each related shard and combines the results, and the application does not know that the Shard exists. That is, the process is transparent to the user.

6. Replicas Copy

To improve query throughput or to achieve high availability, you can use fragmented replicas.

Replicas are exact copies of a shard, and each shard can have 0 or more replicas. Es can have many of the same shards, one of which is chosen to change the index operation, a special shard called the primary shard.

When the primary shard is lost, such as when the data in which the Shard resides is not available, the cluster promotes the replica to the new primary shard.

7. The ES data architecture concept vs. mysql

But the current type is about to expire.

Iv. introduction of crud 1. Create an index

Curl-xput "http://10.3.246.224:9200/tests/"

Return data

{

"acknowledged": true,

"Shards_acknowledged": true,

"Index": "Testes"

}

2. Add Data

Curl-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", "Year": "2018", "Month": "8"} ' will error, You can specify the header as follows

Curl-h "Content-type:application/json"-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", " Year ":" 2018 "," Month ":" 8 "} '

return Result:

{

"_index": "Testes",

"_type": "Songs",

"_id": "1",//id can also not know, by the system self-generated

"_version": 1,

"Result": "Created",//Represents the added success

"_shards": {

"Total": 2,

"Successful": 1,

"Failed": 0

"_seq_no": 0,

"_primary_term": 1

}

3. Reading data

Curl-xget Http://localhost:9200/music/songs/1?pretty

Return Data:

{

"_index": "Testes",

"_type": "Songs",

"_id": "1",

"_version": 1,

"Found": true,//Find successful

"_source": {//Destination data

"Name": "Deck the Halls",

"Year": "2018",

"Month": "8"

}

4. Updating data

A) Find a key to update:

Curl-h "Content-type:application/json"-xpost "Http://10.3.246.224:9200/testes/songs/1/_update?pretty"-d ' {"Doc": { "Query": {"match": {"name": "Qqqddd"}}} '

Return Data:

{

"_index": "Testes",

"_type": "Songs",

"_id": "1",

"_version": 2,

"Result": "Updated",//Update succeeded

"_shards": {

"Total": 2,

"Successful": 1,

"Failed": 0

"_seq_no": 1,

"_primary_term": 1

}

b) Update the whole piece of data:

Curl-h "Content-type:application/json"-xpost "Http://10.3.246.224:9200/testes/songs/1/_update?pretty"-d ' {"Doc": { "Name": "QDDD", "Year": "2018", "Month": "8"}} '

return Data Ibid.

c) There is also the command to add data: just change the data

Curl-h "Content-type:application/json"-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", " Year ":" 2020 "," Month ":" 8 "} '

5. Delete data

Curl-xdelete "HTTP://LOCALHOST:9200/MUSIC/SONGS/1"

Return Data:

{

"_index": "Tests",

"_type": "Songs",

"_id": "1",

"_version": 2,

"Result": "deleted",//delete succeeded

"_shards": {

"Total": 2,

"Successful": 1,

"Failed": 0

"_seq_no": 1,

"_primary_term": 1

}

Note: Deleting a document does not take effect immediately, it is only marked as deleted. ES will be removed from the background after you add more indexes.

V. C + + API development for ES 1. ES does not have a C + + interface, and we need to operate ES based on C + +

There are two ways :

One is the development of embedded other languages, using the interfaces provided by ES, such as embedding Python APIs in C + +, which may introduce new problems in compiling

Second, we need to construct the HTTP request of the fourth part to obtain the data, can be based on the Libcurl library, or based on the existing httpproxy in the system

To construct HTTP requests without disrupting the consistency of existing systems, and to provide C + + interfaces for external encapsulation, based on Httpproxy

2. Design ideas based on business requirements

/*es is a document-oriented, index-based, elastic search engine and therefore

Data structure in *es: index/type/id, corresponding to a single piece, so

Data structure design for posts in *es:

Uid/tiezi/tid-Json{uid,tid,content,timestamp}

* Although there are UID and TID redundancy, the advantage of this design is that it is easy to handle a certain piece of data because the UID is the index, which also leverages the high processing performance of the index

3. C + + API1) add

/* Function: Add protobuf data to Es, but in es it exists in JSON style

*UID: User ID

*tid: Post ID

*msg: Post structure, ie (uid,tid,content,timestamp)

BOOL Adddocument (const std::string &uid, const std::string &tid,const google::p rotobuf::message *msg);

HTTP requests that are formed after encapsulation:

Curl-h "Content-type:application/json"-xput "Http://10.3.246.224:9200/uid/tiezi/tid"-d ' Msg. JSON_STR () '

2) Delete

/* Function: Remove all data from a person's uid to ES

*UID: User ID

BOOL Deleteallbyuid (const std::string &uid);

HTTP requests that are formed after encapsulation:

Curl-xdelete http://localhost:9200/uid-d ' {"query": {"Match_all": {}}} '

/* Function: Delete A person's specific data to ES

*UID: User ID

*tid: Post ID

BOOL Deletedocumentbyuidtid (const std::string &uid, const std::string &tid);

HTTP requests that are formed after encapsulation:

Curl-xdelete "Http://localhost:9200/uid/tiezi/tid"

/* Feature: Remove all posts from a user's previous point in time to Es, that is, all posts less than a certain time

*UID: User ID

*beforetimes: All previous data will be deleted

BOOL Deletedocumentbyuidbeforetimes (const std::string &uid, const std::string &beforetimes);

HTTP requests that are formed after encapsulation:

Curl-xpost "Http://localhost:9200/uid/tiezi/_delete_by_query

-d ' {"Query": {"range": {"Timestamp.keyword": {"GTE": "2016-07-09 11:18:21", "LTE": "2018-08-17 11:18:21", "format": " Yyyy-mm-dd HH:mm:ss "}}} '

Ps:timestamp is a field in the data, in the time period match, pay attention to the space

3) Query

/* Function: Query the post for the presence of the containing word in the logical relationship operation, and filter the returned results, only return and get Uid,tid

* Note: For example, the query word is the phrase: "Running swimming", after the query results are, as long as the post content contains at least one word can be, that the post content contains "running", "swimming", "running ... Swimming ... "all return

*containswords: Query words or phrases, such as "Sport", "running swimming"

*uid_tid: The n uid and TID data returned successfully by the query

int searchallbycontainwords_or (const std::string &containswords,vector<struct uid_tid> &uid_tid);

HTTP requests that are formed after encapsulation:

Curl-h "Content-type:application/json"-xpost "Http://localhost:9200/_search

-d ' {"Query": {"match": {"content": {"Query": "Swimming Running" "," Operator ":" or "}}," _source ": [" UID "," Tid "]} '

PS: operator is OR, indicates yes or logical operation, query fills in multiple fields, match data field exists in content swimming or running,_source has data field uid and TID, to control the return result, Crop the results when the data volume is large, reducing useless data transfer

/* Function: Query The post for inclusion words in the logical relationship operation, and filter the returned results, only return and get Uid,tid

* Note: For example, the query word is the phrase: "Running swimming", the result of the query is that the post content contains each word, that is, the post content contains running and swimming to return

*containswords: Query words or phrases, such as "Sport", "running swimming"

*uid_tid: The n uid and TID data returned successfully by the query

Int Searchallbycontainwords_and (const std::string &containswords,vector<struct uid_tid> &uid_tid);

HTTP requests that are formed after encapsulation:

Curl-h "Content-type:application/json"-xpost "Http://localhost:9200/_search

-d ' {"Query": {"match": {"content": {"Query": "Swimming Running" "," Operator ":" and "}}," _source ": [" UID "," Tid "]} '

PS: Ibid., the difference is that operator is and, that is, with the logical operation

similar to C + + API interface Reference : Https://github.com/QHedgeTech/cpp-elasticsearch

Vi. Information 1. ES reference Manual

Https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

2. "Elasticsearch authoritative guide"

Https://www.elastic.co/guide/cn/elasticsearch/guide/current/intro.html

3. Chinese community

https://elasticsearch.cn/

4. English-speaking Community

https://discuss.elastic.co/c/elasticsearch/

Seven, plug-in 1. Head Plugin

Plug-ins that are installed directly in chrome are simpler than commands installed under Linux

2. IK participle plugin

Viii. Other

ES also provides analysis of the data, the function is very powerful, there are many such as IK plug-in, we access the ES service in the browser, in the Web interface can also be very convenient to manipulate data. The understanding is very superficial, the suspense in this, in order to have the opportunity to understand deeply. You also need to be very careful with JSON formats and data spaces in JSON.

Developing C + + interfaces for Elasticsearch

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More