Elasticsearch data for long-term preservation scenarios

Source: Internet
Author: User
Tags kibana

The Elasticsearch data is stored on the hard disk. When our access logs are very large, kabana is very slow when drawing graphics. and hard disk space is limited, it is not possible to save all log files. What if we want to get the important data of the site every day, such as the amount of traffic per day, and the way we want to visualize it?

First, the specific operation method

Before you get the data, you need to know what kind of data you want. For example, I want to get an hourly site access to the PV, in the Kibana will certainly be able to obtain

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/87/0F/wKioL1fSe-iQWogiAACcnWJEBvA511.png "title=" 1.png " alt= "Wkiol1fse-iqwogiaaccnwjebva511.png" width= "650" style= "Padding:0px;margin:0px;vertical-align:top;border: none; "/>

This is the hourly PV that is queried in Kibana, and then we copy his query JSON

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/87/11/wKiom1fSfDnCZV1yAABIVyb9BPc723.png "title=" 2.png " alt= "Wkiom1fsfdnczv1yaabivyb9bpc723.png" width= "650" style= "Padding:0px;margin:0px;vertical-align:top;border: none; "/>

Paste the above JSON into the test file and use the following statement to query the displayed query results

Curl-post ' Http://192.168.10.49:9200/_search '-d ' @test '

{"Took": 940, "timed_out": false, "_shards": {"Total": 211, "successful": 211, "Failed": 0}, "hits" ...

The data in the returned result is then stored in the database array, which you can also convert to JSON directly into the ES

This method is mainly through the Elasticsearch query statement to the data to be transmitted to other places. You enter a fixed query JSON and it returns JSON data in a fixed format, so you can dig out the data we want from it!


Second, the PHP code to achieve the above operation

class.php

12345678910111213141516171819202122232425262728293031323334
<?php# exporting data from ES # two parameters: #url为从ES导出数据的访问路径, different data paths are not the same #post_data is in JSON format, is request body.  function export ($url, $post _data) {         $ch  =  curl_init  ();        curl_setopt  (  $ch,  CURLOPT_ url,  $url  );        curl_setopt  (  $ch,  Curlopt_customrequest, "POST");        curl_setopt  (  $ch,  CURLOPT_HEADER, 0 );        curl_setopt  ( $ ch, curlopt_returntransfer, 1 );         curl_setopt   (  $ch, curlopt_postfields,  $post _data);         $ Arr=curl_exec ($ch);         curl_close ($ch);         return json_decode ($arr, ' aSsoc ');;} #把数组数据导入ES # Two parameters: # $url The exact location for importing data   such as: Http://IP:9200/index/type/ID (ID best based on time, requires unique ID)   #post_data   Data array for import ES function import ($url, $post _data)  {   $json =json_encode ($post _data);      $ci  = curl_init ()     curl_setopt ($ci,  curlopt_port, 9200) ;     curl_setopt ($ci,  curlopt_timeout, 2000);     curl_setopt ($ci,  curlopt_returntransfer, 1);     curl_setopt ($ci,  curlopt_forbid_reuse,  0)     curl_setopt ($ci, curlopt_customrequest,  ' PUT ');     curl_setopt ($ci, curlopt_url,  $url);     curl_setopt ($ci,  CURLOPT_ postfields,  $json);     $response  = curl_exec ($CI);     unset ($post _data);//Destroy Array         unset ($json);//Destroy Data        &Nbsp; curl_close ($CI); } ?> 

Vim access_info.php

123456789101112131415161718
<?phpinclude ("class.php"), #导出数据的ES路径 $url = "Http://192.168.10.49:9200/_search", #查询数据的开始时间 $begin =date ("y-m-d", Strtotime (" -16 day")); #开始时间的格式转换 $start _time=strtotime ($begin. "  00:00:00 "), #查询数据的结束时间及当时时间, and convert the time format $end_time=strtotime (date (" y-m-d h:i:s "); To replace the start and end time in the query JSON, JSON is the TXT file with the same name under./lib/$post_data=str_replace (' End_time ', $end _time,str_replace (' Start_time ', $ Start_time,file_get_contents (' lib/'. Str_replace ('. php ', '. txt ', basename ($_server[' php_self ']))); #查询ES中的数据, returns the array data $arr=export ($url, $post _data); #从数组中获取你想要的数据, and then in combination into a new array $array= $arr [' Aggregations '] [' 2 '] [' Buckets '];foreach ($array  as  $key  =>  $value) {     $data [' @timestamp '] = $value [' key_as_string '];     $data [' REQUEST_PV ']= $value [' Doc_count '];      $data [' request_ip ']= $value [' 3 '] [' value '];     #Time为导入ES中的ID, unique. (different Tpye can be the same)      $Time =strtotime ($data [' @timestamp ']);     $urls = "       #调用函数import导入数据      import ($urls, $data);}? >

The following file is stored under the./lib file, and the PHP file executed must have the same name.

Vim Lib/access_info.txt

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
{   "Size": 0,   "Aggs": {     "2": {        "Date_histogram": {         "field":  "@ Timestamp ",        " Interval ": " 1h ",   #" Time_zone ":   "Asia/shanghai"   This line needs to be deleted, otherwise time is not on          "Min_doc_count":  1,         "Extended_bounds": {            "min": start_time,  #start_time会被换成具体的时间             "Max":  end_time        }       },       "Aggs": {          "3": {           " Cardinality ":  {             "field":  "Geoip.ip"            }        }      }     }  },      "Highlight": {      "Pre_tags": [       "@[email protected" &NBSP;&NBSP;&NBSP;&NBSP;] ,     "Post_tags": [       "@/[email protected]"     ],     "Fields": {       "*":  {}    },     "Require_field_match": false,      "fragment_size": 2147483647  },   "Query": {     " Filtered ": {      " Query ": {          "Query_string": {           "Query":  "*",            "Analyze_wildcard": true        }       },       "Filter": {          "bool": {           "must":  [            {                "Range": {                  "@timestamp": {                    "GTE": start_time,                    "LTE":  end_time,                   "format":  " Epoch_second "   #由毫秒换成秒                  }              }             }           ],           "Must_not": []         }      }    }   }}

Based on the code above, we can get important data from ES on a regular basis. The data obtained is only the result data, not very accurate, but can reflect the trend of the site, and query very fast! If you want to save important data for a long time, you can use this method to store the data in the database.

The above is a personal approach to the long-term preservation of ES results data, if there is a better way, hope to discuss together!

This article is from the "Tranquility Zhiyuan" blog, please be sure to keep this source http://irow10.blog.51cto.com/2425361/1853507

Elasticsearch data for long-term preservation scenarios

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.