Elasticsearch Note Finishing (ii): Curl operations, ES plugins, cluster installation and core concepts

Source: Internet
Author: User
Tags kibana elasticsearch kibana sqoop

[TOC]

Curl Operation Curl Introduction

Curl is an open source file Transfer tool that works with URL syntax in the command line mode, and using curl makes it easy to implement common get/post requests. Simply think of a tool that can access the URL below the command line. In the CentOS default library there are curl tools, if not please install Yum.

curl    -X 指定http的请求方法 有HEAD GET POST PUT DELETE    -d 指定要传输的数据    -H 指定http请求头信息curl创建索引库    curl -XPUT http://<ip>:9200/index_name/  PUT或POST都可以创建  举例:    curl -XPUT ‘http://localhost:9200/bigdata‘
Curl operation (i): Index library creation and query

Create an index library:

curl -XPUT http://uplooking01:9200/bigdata返回值{"acknowledged":"true"}

To view an index library information:

curl -XGET http://uplooking01:9200/bigdata返回值一个json字符串如果想要格式良好的结果curl -XGET http://uplooking01:9200/bigdata?pretty

Add several index information to the index library:

curl -XPOST http://uplooking01:9200/bigdata/product/1 -d‘{    "name":"hadoop",    "author":"Dong Couting",    "version":"2.9.4"}‘  curl -XPOST http://uplooking01:9200/bigdata/product/2 -d‘{    "name":"hive",    "author":"facebook",    "version":"2.1.0",    "url":"http://hive.apache.org"}‘curl -XPUT http://uplooking01:9200/bigdata/product/3 -d‘{    "name":"hbase",    "author":"apache",    "version":"1.1.5",    "url":"http://hbase.apache.org"}‘  

Query all the data below an index library:

Curl-xget Http://uplooking01:9200/bigdata/product/_search Well-formed: Curl-xget http://uplooking01:9200/bigdata/ Product/_search?pretty{"took": One, "timed_out": false,----> whether timeout "_shards": {----> Number of shards (par in Kafka        Tition, an index library has several partitions) "Total": 5,----> Default Shard has 5 "successful": 5,----> has 5 "failed" for normal service: 0    ----> (total-successful) =failed}, "hits": {----> Query result set "Total": 2,----> query to several records      "Max_score": 1.0,----> Record maximum score "hits": [{----> Specific result set array ' _index ': ' Bigdata ',-----> Index Library of results       "_type": "Product",-----> result one of the types in the index library is type "_id": "2",-----> Result index ID "_score": 1.0, -----> Index Score "_source": {-----> Index specific content _source "name": "Hive", "Author": "Faceb Ook "," Version ":" 2.1.0 "," url ":" Http://hive.apache.org "}}, {" _index ":" Bigdata "," _type ":" Product "," _id ":" 1 ",      "_score": 1.0, "_source": {"name": "Hadoop", "Author": "Dong Couting", "Version": "2 .9.4 "}}}}}
Put and post differences
    • Put is a idempotent method, and post is not. So put is used for updates, and post is more appropriate for new additions.

    • The put and delete operations are idempotent. Idempotent means that the results are the same regardless of the number of operations performed. For example, to modify an article with a put, and then do the same thing, the result of each operation is no different, delete is the same.

    • The post operation is not idempotent, such as a common post repeat load problem: when we make the same post request multiple times, the result is that several resources are created.

    • It is also important to note that the create operation can use post, or put, the difference is that the post is acting on a collection resource (/articles), and the put operation is acting on a specific resource (/articles/123), For example, many resources use the database to self-increment the primary key as identity information, this time need to use put. You must use post when the identity information of the created resource is exactly what is available only to the server.

      • Es points of note when creating index libraries and indexes

      ? 1) The index library name must be all lowercase, cannot begin with an underscore, and cannot contain commas

      ? 2) If the ID of the index data is not explicitly specified, then ES will automatically generate a random ID, which requires the use of the post parameter

      Curl-xpost http://localhost:9200/bigdata/product/-d ' {"Author": "Doug Cutting"} '

举例说明幂等操作:    StringBuilder sb = new StringBuilder();    sb.append("aaaa");结果是什么?sb    String str = new StringBuilder();    str.substring();---->新的对象
Curl Operations (II): Advanced query, UPDATE, delete, and bulk operations

Check all:

只查询source中的个别字段    curl -XGET ‘http://uplooking01:9200/bigdata/product/_search?_source=name,author&pretty‘返回结果只查询source    curl -XGET ‘http://uplooking01:9200/bigdata/product/1?_source&pretty‘条件查询:    curl -XGET ‘http://uplooking01:9200/bigdata/product/_search?q=name:hbase&pretty‘        查询name为hbase的结果    curl -XGET ‘http://uplooking01:9200/bigdata/product/_search?q=name:h*&pretty‘        查询name以h打头的结果

Paging query:

curl -XGET ‘http://uplooking01:9200/bank/acount/_search?pretty&from=0&size=5‘

Update:

使用post put都可以    curl -XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs -d‘{"name":"sqoop", "author":"apache"}‘    curl -XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs -d‘{"version":"1.4.6"}‘但是这些操作都是全局更新,理解为删除之前的,重新创建一个新的,id相同的document。局部更新(必须使用POST)    要是用update,同时要更新的是source中的doc内容    curl -XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs/_update -d‘    {"doc":{"name":"sqoop", "author":"apache"}}‘说明:   ES可以使用PUT或者POST对文档进行更新,如果指定ID的文档已经存在,则执行更新操作注意:执行更新操作的时候,ES首先将旧的文档标记为删除状态,然后添加新的文档,旧的文档不会立即消失,但是你也无法访问,ES会继续添加更多数据的时候在后台清理已经标记为删除状态的文档。

Delete:

普通删除,根据主键删除   curl -XDELETE http://localhost:9200/bigdata/product/3/说明:如果文档存在,es属性found:true,successful:1,_version属性的值+1。   如果文档不存在,es属性found为false,但是版本值version依然会+1,这个就是内部管理的一部分,有点像svn版本号,它保证了我们在多个节点间的不同操作的顺序被正确标记了。   注意:一个文档被删除之后,不会立即生效,他只是被标记为已删除。ES将会在你之后添加更多索引的时候才会在后台进行删除。

Bulk Operation-bulk:

Bulk api可以帮助我们同时执行多个请求格式:   action:[index|create|update|delete]   metadata:_index,_type,_id   request body:_source(删除操作不需要)   {action:{metadata}}\n   {request body}\n   {action:{metadata}}\n   {request body}\n例如:{"index":{"_id":"1"}}{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}{"index":{"_id":"6"}}{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"}  create和index的区别    如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行。使用文件的方式curl -XPOST http://uplooking01:9200/bank/acount/_bulk --data-binary @/home/uplooking/data/accounts.json

Additional notes on bulk operations:

可以查看一下各个索引库信息curl ‘http://localhost:9200/_cat/indices?v‘Bulk请求可以在URL中声明/_index或者/_index/_typeBulk一次最大处理多少数据量Bulk会把将要处理的数据载入内存中,所以数据量是有限制的最佳的数据量不是一个确定的数值,它取决于你的硬件,你的文档大小以及复杂性,你的索引以及搜索的负载一般建议是1000~5000个文档,如果你的文档很大,可以适当减少队列,大小建议是5~15MB,默认不能超过100M,可以在es的配置文件中修改这个值http.max_content_length:100mb
Curl Operation (iii): ES version control
    • Common relational databases are used (pessimistic concurrency control (PCC))

      Lock this line before reading a data, and then make sure that only the thread that reads the data can modify this row of data.

    • ES is used (Optimistic concurrency control (OCC))

      ES does not block access to a data, however, if the underlying data changes in the interval between reading and writing, the update fails, and the program decides how to handle the conflict. It can re-read new data to update it, or directly feed the situation back to the user.

    • How ES implements version control (using the ES build number)

? 1: First get the document to be modified, get the version (_version) number

? Curl-xget HTTP://LOCALHOST:9200/BIGDATA/PRODUCT/1

? 2: Pass the version number when you perform the update operation again

? Curl-xput http://localhost:9200/bigdata/product/1?version=1-d ' {"name": "Hadoop", "Version": 3} ' (overwrite)

? Curl-xpost http://localhost:9200/bigdata/product/1/_update?version=3-d ' {"Doc": {"name": "Apachehadoop", "Latest_ Version ": 2.6}} ' (partial update)

? 3: Update fails if the version number passed and the version number of the document to be updated is inconsistent

    • How ES implements versioning (using external version numbers)

? If your database already has a version number, or a timestamp that can represent the version. You can then add version_type=external after the query URL of es to use these numbers.

? Note: The version number must be an integer greater than 0 less than 9223372036854775807 (the maximum positive value for long in Java).

     es在处理外部版本号的时候,它不再检查_version是否与请求中指定的数值是否相等,而是检查当前的_version是否比指定的数值小,如果小,则请求成功。example:     curl -XPUT ‘http://localhost:9200/bigdata/product/20?version=10&version_type=external‘ -d ‘{"name": "flink"}‘     注意:此处url前后的引号不能省略,否则执行的时候会报错
Es plugin

ES itself is relatively small service, the strength of its function is reflected in the richness of the plugin. There are a lot of ES plug-in for ES Management, performance improvement, the following is a few commonly used plug-ins.

Bigdesk Plugin
离线安装:    bin/plugin install file:/home/uplooking/soft/bigdesk-master.zip卸载:    bin/plugin remove bigdesk在线安装:    bin/plugin install hlstudio/bigdesk访问(web):    http://uplooking01:9200/_plugin/bigdesk
Elasticsearch-head Plugin
离线安装    bin/plugin install file:/home/uplooking/soft/在线安装    bin/plugin install mobz/elasticsearch-head访问    http://uplooking01:9200/_plugin/head/
Elasticsearch Kibana

is one of ELK(ElasticSearch LogStach Kibana) the components in.

配置:    server.port: 5601    server.host: "uplooking01"    elasticsearch.url: "http://uplooking01:9200"    elasticsearch.username: "jack"    elasticsearch.password: "uplooking"访问:    http://uplooking01:5601常见的报表工具:    折线图    饼形图    柱状图
ES cluster installation
如果在一个局域网内的话,只需要修改一个地方:cluster.name 保持统一即可,这样就可以通过发现机制找到新启动的机器。如果在不同局域网内的要进行组网,直接禁用发现机制,指定对应的节点主机名即可。(但实际上在使用2.3.0的版本时,并不能自动发现,所以为了配置集群,需要将该机制禁用并手动配置节点主机名)cluster.name: bigdata-08-28node.name: hadooppath.data: /home/uplooking/data/elasticsearchpath.logs:  /home/uplooking/logs/elasticsearchnetwork.host: uplooking01discovery.zen.ping.multicast.enabled: falsediscovery.zen.ping.unicast.hosts: ["uplooking01", "uplooking02", "uplooking03"]

Cluster status of Elasticsearch:

Green:  所有的主分片和副分片都可用Yellow:所有的主分片都可以不是所有的副分片都可用Red:    不是所有的主分片和副分片都可用
Elasticsearch Core Concept Cluster

Represents a cluster, there are multiple nodes in the cluster, there is a primary node, the main node can be elected, the master-slave node for the internal cluster. One of the concepts of ES is to center, literally understand that there is no central node, this is for the outside of the cluster, because the ES cluster from the outside, in a logical whole, you communicate with any one node and the entire ES cluster communication is equivalent.

The primary node's responsibility is to manage the state of the cluster, including managing the state of the shards and the state of the replicas, as well as discovering and deleting nodes.

You can automatically compose a cluster by starting multiple ES nodes within the same network segment.

By default, ES automatically discovers nodes within the same network segment and automatically makes up the cluster.

View status of the cluster:

http://<ip|host>:9200/_cluster/health?pretty
Shards

Represents the index Shard, es can divide a complete index into multiple shards, the advantage is that a large index can be split into multiple, distributed to different nodes, to form a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created.

You can specify when you create an index library:

curl -XPUT ‘localhost:9200/test1/‘ -d‘{"settings":{"number_of_shards":3}}‘# 默认是一个索引库有5个分片 index.number_of_shards:5
Replicas

Represents a copy of the index, ES can set a copy of the index, the role of the copy is to improve the system's fault tolerance, when a node a shard corruption or loss can be recovered from the replica. The second is to improve the query efficiency of ES, ES will automatically load balance the search request.

You can specify when you create an index library:

curl -XPUT ‘localhost:9200/test2/‘-d‘{"settings":{"number_of_replicas":2}}‘# 默认是一个分片有1个副本 index.number_of_replicas:1
Recovery

On behalf of data recovery or redistribution of data, ES, when a node joins or exits, the index shards are redistributed based on the load of the machine, and data recovery occurs when the node is restarted.

Gateway

Represents the persistent storage of ES indexes, es default is to store the index in memory, and then persist to the hard disk when the memory is full. When the ES cluster is shut down and restarted, the index data is read from the gateway. ES supports multiple types of gateway, with local file system (default), Distributed File System, Hadoop HDFs and Amazon's S3 cloud storage service.

Discovery.zen

Represents the automatic discovery node mechanism of ES, ES is a peer-based system that first searches for existing nodes by broadcasting, and then communicates between nodes through multicast protocols, and also supports point-to-point interactions.

如果是不同网段的节点如果组成ES集群,禁用自动发现机制:    discovery.zen.ping.multicast.enabled: false设置新节点被启动时能够发现的注解列表:    discovery.zen.ping.unicast.hosts: ["master:9200","slave01:9200"]
Transport

Represents the way in which ES internal nodes or clusters interact with the client by default, using the TCP protocol for interaction, while supporting the HTTP protocol (JSON format), thrift, servlet, memcached, ZEROMQ, and other transport protocols (integrated via plug-in mode).

Elasticsearch Note Finishing (ii): Curl operations, ES plugins, cluster installation and core concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.