First, what is Elasticsearch?
Elasticsearch is the first choice of open source full-text search engine, it can quickly store, search and analyze massive data. Stack Overflow,github, etc. are in use.
Elasticsearch is written in Java and uses Lucene for indexing and searching internally, but it makes full-text retrieval simple, and instead provides a simple, consistent set of RESTful APIs by hiding the complexity of lucene.
ES provides the client api:https://www.elastic.co/guide/en/elasticsearch/client/index.html
Multiple languages included:
Note: There is no C + + interface, and we need to operate ES based on C + +
II. installation and Deployment 1. Server Selection
Equipment ip:10.3.246.224
System: linux-64
Disk space: None (Df–h found disk no capacity)
2. Resolving disk Fills
Du-sh/* | SORT–NR: Find out the maximum capacity of the system in the folder, the system 15scpp This folder accounted for 41G
du-sh/15scpp/* | SORT-NR: Find the largest 15scpp_testbin folder in 15scpp, accounting for 31G
Make sure this folder is useless, rm delete
PS: In Linux, when we use RM to delete large files on Linux, but if a process opens the large file, but does not close the file handle, then the Linux kernel will not release the disk space of this file. Find the file usage process, kill it, and then
3. JDK8 Installation
Elastic requires a Java 8 environment
Java–version See if Java is installed or current version
Download jdk-8u181-linux-x64.tar.gz, unzip, install, set environment variables, here do not repeat.
1. Download ES
Download Https://www.elastic.co/downloads/elasticsearch:
Elasticsearch-6.3.2.tar.gz
Tar decompression
2. Create a new user
because the security issue Elasticsearch not allow the root user to run directly, create a new user :
AddUser pwrd-es
passwd pwrd-es Pwrd-es
3. Modify the installation file permissions
Chown-r pwrd-es:pwrd-es/home/elasticsearch/elasticsearch-6.3.2
Because of security issues, the root user is not allowed to perform the installation, but other users do not have permission to file operations, and therefore change the
4. Modify the ES configuration
Vim config/elasticsearch.yml
Cluster.name:pwrd-es//Elasticsearch automatically discovers the Elasticsearch node under the same network segment, uses this attribute to distinguish between different clusters, cluster.name the same group to build a cluster
Node.name:node-1//node name, default randomly specifies a name in the name list, cannot be repeated
Node.master:true//Specifies whether the node is eligible to be elected node, the default is True,es is the first machine in the default cluster is master, and if this machine hangs, it will be re-elected master
Node.data:true//Specifies whether the node stores index data, which is true by default
Path.data:/home/elasticsearch/log_export/data//Storage data
Path.logs:/home/elasticsearch/log_export/logs//Storage Log
network.host:10.3.246.224//set to 0.0.0.0 in curl can use localhost
HTTP.PORT:9200//Listening port
Index.number_of_shards:5//Set default index number of shards, default to 5 slices
Index.number_of_replicas:1//Set default number of index replicas, default is 1 copies
5. Modify the system maximum virtual memory
Vim/etc/sysctl.conf
Vm.max_map_count = 655360
Sysctl-p
6. Switch User Execution Program
Switch User: Su pwrd-es
./elasticsearch
7. Verifying the installation effect
Enter 10.3.246.224:9200 in the browser
Or
Curl ' Http://localhost:9200/?pretty '
You can see a JSON data that is successful for installation
Ps:pretty is formatted for JSON so that the returned results look good
First, the basic concept of 1. Node and Cluster
Elastic is essentially a distributed database that allows multiple servers to work together, and each server can run multiple Elastic instances.
A single Elastic instance is called a node. A group of nodes form a cluster (cluster)
The name of the cluster is configured by the Cluster.name property, which uniquely identifies a cluster, different clusters, and its cluster.name are different, and all nodes with the same name cluster automatically form a cluster. When a node is started, the node automatically looks for the main node of the same cluster name in the current LAN, and if the primary node is found, the node is added to the cluster, and if no primary node is found, the node becomes the primary node .
2. Index
ES basic structure is: INDEX/TYPE/ID, document (typically stores data in JSON style)
So index (index) is the top-level unit of elastic data management, which is a synonym for a single database.
Es indexes all fields and writes a reverse index (inverted index) after processing. When looking for data, look directly at the index.
PS: Each Index (that is, the database) must have a lowercase name.
You can view the current node index with the following command:
Curl-x GET ' http://localhost:9200/_cat/indices?v '
3. Document
A single record in index is called document. Many of the Document forms an Index.
Document is expressed in JSON format, as follows:
{
"Name": "Zhang San",
"Age": 18,
"Sex": "Male"
}
PS: For JSON to be written and formatted correctly, you can use the online JSON tool: http://www.bejson.com/
4. Type
Type can be used to classify document, for example, China/beijing/id-i->doc-n
China/shanghai/id-j->doc-m
A different Type under the same index should have a similar structure.
However, Elastic version 6.x only allows each Index to contain one version of type,7.x, which will remove the Type completely.
We are currently deploying the latest 6.3.2 version
5. Shared shards
When there is too much data under one index, more than the disk space that a single node can provide, ES provides sharding capabilities to store massive data shards in different nodes in the cluster. When you query an index across multiple shards, ES sends the query to each related shard and combines the results, and the application does not know that the Shard exists. That is, the process is transparent to the user.
6. Replicas Copy
To improve query throughput or to achieve high availability, you can use fragmented replicas.
Replicas are exact copies of a shard, and each shard can have 0 or more replicas. Es can have many of the same shards, one of which is chosen to change the index operation, a special shard called the primary shard.
When the primary shard is lost, such as when the data in which the Shard resides is not available, the cluster promotes the replica to the new primary shard.
7. The ES data architecture concept vs. mysql
But the current type is about to expire.
Iv. introduction of crud 1. Create an index
Curl-xput "http://10.3.246.224:9200/tests/"
Return data
{
"acknowledged": true,
"Shards_acknowledged": true,
"Index": "Testes"
}
2. Add Data
Curl-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", "Year": "2018", "Month": "8"} ' will error, You can specify the header as follows
Curl-h "Content-type:application/json"-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", " Year ":" 2018 "," Month ":" 8 "} '
return Result:
{
"_index": "Testes",
"_type": "Songs",
"_id": "1",//id can also not know, by the system self-generated
"_version": 1,
"Result": "Created",//Represents the added success
"_shards": {
"Total": 2,
"Successful": 1,
"Failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
3. Reading data
Curl-xget Http://localhost:9200/music/songs/1?pretty
Return Data:
{
"_index": "Testes",
"_type": "Songs",
"_id": "1",
"_version": 1,
"Found": true,//Find successful
"_source": {//Destination data
"Name": "Deck the Halls",
"Year": "2018",
"Month": "8"
}
}
4. Updating data
A) Find a key to update:
Curl-h "Content-type:application/json"-xpost "Http://10.3.246.224:9200/testes/songs/1/_update?pretty"-d ' {"Doc": { "Query": {"match": {"name": "Qqqddd"}}} '
Return Data:
{
"_index": "Testes",
"_type": "Songs",
"_id": "1",
"_version": 2,
"Result": "Updated",//Update succeeded
"_shards": {
"Total": 2,
"Successful": 1,
"Failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
b) Update the whole piece of data:
Curl-h "Content-type:application/json"-xpost "Http://10.3.246.224:9200/testes/songs/1/_update?pretty"-d ' {"Doc": { "Name": "QDDD", "Year": "2018", "Month": "8"}} '
return Data Ibid.
c) There is also the command to add data: just change the data
Curl-h "Content-type:application/json"-xput "HTTP://10.3.246.224:9200/TESTS/SONGS/1"-d ' {"name": "Deck the Halls", " Year ":" 2020 "," Month ":" 8 "} '
5. Delete data
Curl-xdelete "HTTP://LOCALHOST:9200/MUSIC/SONGS/1"
Return Data:
{
"_index": "Tests",
"_type": "Songs",
"_id": "1",
"_version": 2,
"Result": "deleted",//delete succeeded
"_shards": {
"Total": 2,
"Successful": 1,
"Failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
Note: Deleting a document does not take effect immediately, it is only marked as deleted. ES will be removed from the background after you add more indexes.
V. C + + API development for ES 1. ES does not have a C + + interface, and we need to operate ES based on C + +
There are two ways :
One is the development of embedded other languages, using the interfaces provided by ES, such as embedding Python APIs in C + +, which may introduce new problems in compiling
Second, we need to construct the HTTP request of the fourth part to obtain the data, can be based on the Libcurl library, or based on the existing httpproxy in the system
To construct HTTP requests without disrupting the consistency of existing systems, and to provide C + + interfaces for external encapsulation, based on Httpproxy
2. Design ideas based on business requirements
/*es is a document-oriented, index-based, elastic search engine and therefore
Data structure in *es: index/type/id, corresponding to a single piece, so
Data structure design for posts in *es:
Uid/tiezi/tid-Json{uid,tid,content,timestamp}
* Although there are UID and TID redundancy, the advantage of this design is that it is easy to handle a certain piece of data because the UID is the index, which also leverages the high processing performance of the index
*/
3. C + + API1) add
/* Function: Add protobuf data to Es, but in es it exists in JSON style
*UID: User ID
*tid: Post ID
*msg: Post structure, ie (uid,tid,content,timestamp)
*/
BOOL Adddocument (const std::string &uid, const std::string &tid,const google::p rotobuf::message *msg);
HTTP requests that are formed after encapsulation:
Curl-h "Content-type:application/json"-xput "Http://10.3.246.224:9200/uid/tiezi/tid"-d ' Msg. JSON_STR () '
2) Delete
/* Function: Remove all data from a person's uid to ES
*UID: User ID
*/
BOOL Deleteallbyuid (const std::string &uid);
HTTP requests that are formed after encapsulation:
Curl-xdelete http://localhost:9200/uid-d ' {"query": {"Match_all": {}}} '
/* Function: Delete A person's specific data to ES
*UID: User ID
*tid: Post ID
*/
BOOL Deletedocumentbyuidtid (const std::string &uid, const std::string &tid);
HTTP requests that are formed after encapsulation:
Curl-xdelete "Http://localhost:9200/uid/tiezi/tid"
/* Feature: Remove all posts from a user's previous point in time to Es, that is, all posts less than a certain time
*UID: User ID
*beforetimes: All previous data will be deleted
*/
BOOL Deletedocumentbyuidbeforetimes (const std::string &uid, const std::string &beforetimes);
HTTP requests that are formed after encapsulation:
Curl-xpost "Http://localhost:9200/uid/tiezi/_delete_by_query
-d ' {"Query": {"range": {"Timestamp.keyword": {"GTE": "2016-07-09 11:18:21", "LTE": "2018-08-17 11:18:21", "format": " Yyyy-mm-dd HH:mm:ss "}}} '
Ps:timestamp is a field in the data, in the time period match, pay attention to the space
3) Query
/* Function: Query the post for the presence of the containing word in the logical relationship operation, and filter the returned results, only return and get Uid,tid
* Note: For example, the query word is the phrase: "Running swimming", after the query results are, as long as the post content contains at least one word can be, that the post content contains "running", "swimming", "running ... Swimming ... "all return
*containswords: Query words or phrases, such as "Sport", "running swimming"
*uid_tid: The n uid and TID data returned successfully by the query
*/
int searchallbycontainwords_or (const std::string &containswords,vector<struct uid_tid> &uid_tid);
HTTP requests that are formed after encapsulation:
Curl-h "Content-type:application/json"-xpost "Http://localhost:9200/_search
-d ' {"Query": {"match": {"content": {"Query": "Swimming Running" "," Operator ":" or "}}," _source ": [" UID "," Tid "]} '
PS: operator is OR, indicates yes or logical operation, query fills in multiple fields, match data field exists in content swimming or running,_source has data field uid and TID, to control the return result, Crop the results when the data volume is large, reducing useless data transfer
/* Function: Query The post for inclusion words in the logical relationship operation, and filter the returned results, only return and get Uid,tid
* Note: For example, the query word is the phrase: "Running swimming", the result of the query is that the post content contains each word, that is, the post content contains running and swimming to return
*containswords: Query words or phrases, such as "Sport", "running swimming"
*uid_tid: The n uid and TID data returned successfully by the query
*/
Int Searchallbycontainwords_and (const std::string &containswords,vector<struct uid_tid> &uid_tid);
HTTP requests that are formed after encapsulation:
Curl-h "Content-type:application/json"-xpost "Http://localhost:9200/_search
-d ' {"Query": {"match": {"content": {"Query": "Swimming Running" "," Operator ":" and "}}," _source ": [" UID "," Tid "]} '
PS: Ibid., the difference is that operator is and, that is, with the logical operation
similar to C + + API interface Reference : Https://github.com/QHedgeTech/cpp-elasticsearch
Vi. Information 1. ES reference Manual
Https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
2. "Elasticsearch authoritative guide"
Https://www.elastic.co/guide/cn/elasticsearch/guide/current/intro.html
3. Chinese community
https://elasticsearch.cn/
4. English-speaking Community
https://discuss.elastic.co/c/elasticsearch/
Seven, plug-in 1. Head Plugin
Plug-ins that are installed directly in chrome are simpler than commands installed under Linux
2. IK participle plugin
Viii. Other
ES also provides analysis of the data, the function is very powerful, there are many such as IK plug-in, we access the ES service in the browser, in the Web interface can also be very convenient to manipulate data. The understanding is very superficial, the suspense in this, in order to have the opportunity to understand deeply. You also need to be very careful with JSON formats and data spaces in JSON.
Developing C + + interfaces for Elasticsearch