Processing real-time logs in Linux to generate another real-time log

Source: Internet
Author: User
I. Background 1. knowledge points

This blog is intended to introduce the following knowledge points:

Curl obtains the HTTP content;

Execute the PHP file in shell;

Execute shell commands in PHP (through the exec function );

PHP implements the tail-F command;

How to pass parameters that contain spaces as parameters (enclosed in double quotation marks ).

2. Business Process

The background of this blog is to read "/data3/im-log/nginx. im. imp. current/nginx. im. imp. current_current "this real-time log generates the real-time log required for the job fair.

The business process is as follows:
(1) From http://bj.baidu.com/jobfairs/jobfairs_im_port.php? Action = getims get the relationship between the enterprise and IM Client ID.
The response format is as follows:
{"Status": 1, "RET": {"company_id": {"im_accout": [im_id], "company_name": []}
The obtained data is as follows:
{"Status": 1, "RET": {"2028107": {"im_account": ["31669394", "50000098"], "name ": ["Baidu"]}, "2028098": {"im_account": ["50029298", "50000098", "31669376", "31669394", "50006271"], "name": ["sogou"] }}, "MSG ":""}

The first problem I encountered here is that my development environment and http://bj.baidu.com are not in the same network segment, the IP address of the URL service is 10.3.20.201, at this time I need to map hosts, so when I access http://bj.baidu.com/jobfairs/jobfairs_im_port.php? When action = getims, it is equivalent to accessing http: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims.
But we must have a question: why do we not directly use http: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims for access, the answer is we need to get the user's city through URL, that is, http://bj.baidu.com/jobfairs/jobfairs_im_port.php? Action = getims, which contains bj.baidu.com and user's city information BJ.

The solution is to map the URL and host through curl:

Curl-h "Host: bj.ganji.com" http: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims

Reference link: Use of the curl command in the http://blog.csdn.net/lianxiang_biancheng/article/details/7575370.

(2) If fromuserid or touserid in this log contains the IM Client ID of an enterprise, the message belongs to this enterprise;

(3) Finally, generate the log in the required format. The log field format is as follows:
Time Enterprise ID enterprise name enterprise im id applicant im ID who sends the message (0: enterprise, 1: Applicant) Content

2. Three implementation methods are adopted. 1. The first method is to use shell to read each row of records and pass them to PhP for matching and output.

(1) Start. Sh is the Startup File, as follows:

#! /Bin/sh # Clear all PIDs = 'ps aux | grep jobfairs | grep-V "grep" | awk '{print $2} ''if [" $ PIDs "! = ""]; Then ECHO $ PIDs kill-9 $ pidsfish jobfairs. Sh>/home/Baidu/log/jobfairs. Log

(2) jobfairs. Sh is the implementation of obtaining HTTP content, reading real-time logs, and re-requesting every two minutes, as follows:

#! /Bin/shlogfile = "/data3/im-log/nginx. im. imp. current/nginx. im. imp. current_current "hours = 'date + % H' start _ time = 'date + % s' # Stop running the program after while [$ hours-lt 17] Do res = 'curl -S-h "Host: bj.baidu.com "http: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims '# echo $ res Len =$ {# res} if [$ Len = 0]; then Echo "failed! Request error! "Exit fi status = 'echo $ res | sed-E's /. * Status ": // '-E's /,. * // ''if [$ status! = 1]; then Echo "failed! Request stauts: "$ status exit fi ret = 'echo $ res | sed-E's /. * RET ": // '-E's/," MSG. * // ''# ret = '{" 2028097 ": {" im_account ": [" 2875001357 "," 197823104 "," 3032631861 "," 197305863 "], "name ": ["8 \ u811a \ Users \ u65b0 \ u79d1 \ u6280 \ u6709 \ u9650 \ u516c \ u53f8 \ uff08 \ Users \ u5927 \ u7237 \ u6dae \ u8089 \ Users \ u7c73 \ u7684 \ u79d1 \ u6280 \ u516c \ u53f8 \ uff09 "]}, "2028098": {"im_account": ["3658247660", "192683241", "197488883 "," 108963206 "," 197305001 "]," name ": ["9 \ u811a \ u732b \ u521b \ u65b0 \ u79d1 \ u6280 \ u6709 \ u9650 \ u516c \ u53f8"]} '; tail-F $ logfile | grep sendmsgok | grep "spamreasons = \ [\]" | awk-F "\ t" '{printf ("% s \ t % s \ t % s \ n ", $1, $3, $4, $11);} '| while read line do/usr/local/webserver/PHP/bin/PHP jobfairs. PHP $ RET "$ line" # Stop generating logs after 120s, re-execute the HTTP request to obtain company-related information end_time = 'date + % s' if [$ (expr $ end_time-$ start_time)-ge 12 0]; then # echo 'date + % t' "" 'date + % d' # echo "120 s is done! "Break fi done start_time = 'date + % s' hours = 'date + % H' done

It also involves how to pass strings containing spaces as parameters.
The scenario here is as follows: because each field in a record is separated by a tab, one of the fields msgcontent is the message content, and messages often contain spaces, PHP accepts foreign parameters separated by spaces by default. If $ line is passed as the parameter, msgcontent is separated into several fields. Then how can we solve this problem? The answer is to pass in a row of records as a whole string by adding double quotation marks (that is, $ line is changed to "$ line, after receiving the string, PHP splits the fields by explode ("\ t", $ line. As follows:
/Usr/local/webserver/PHP/bin/PHP jobfairs. php $ RET"$ Line"

(3) jobfairs. PHP is a log format that matches each line of real-time logs and outputs them as Im:

<? PHP $ ret = $ _ server ["argv"] [1]; $ arr = json_decode ($ ret, true ); // decodes the JSON string into an array foreach ($ arr as $ key => $ value) {$ name = $ value ["name"] [0]; // enterprise name foreach ($ value ["im_account"] as $ v) {// ding dong ID for the enterprise $ userid [$ v] = $ key; $ compname [$ v] = $ name; // echo $ key. "\ t ". $ v. "\ t ". $ name. "\ n" ;}}$ line = $ _ server ["argv"] [2]; // get a log record $ logarr = explode ("\ t ", $ line); // echo $ line. "\ n"; // obtain each field $ time = $ L Ogarr [0]; $ fromuserid = $ logarr [1]; $ touserid = $ logarr [2]; $ msgcontent = $ logarr [3]; $ fuiarr = explode ('=', $ fromuserid); $ tuiarr = explode ('=', $ touserid); $ fui = $ fuiarr [1]; $ Tui = $ tuiarr [1]; $ output = $ time. "\ t"; if (isset ($ userid [$ fui]) {// fromuserid is the ding dong ID of an enterprise // echo $ line. "\ n"; $ output. = "companyid = $ userid [$ fui] \ t"; $ output. = "companyName = $ compname [$ fui] \ t"; $ output. = "companydingdo Ngid = $ fui \ t "; $ output. = "personaldingdongid = $ Tui \ t"; $ output. = "whosend = 0 \ t"; $ output. = $ msgcontent; echo $ output. "\ n";} else if (isset ($ userid [$ Tui]) {// touserid is the ding dong ID of an enterprise // echo $ line. "\ n"; $ output. = "companyid = $ userid [$ Tui] \ t"; $ output. = "companyName = $ compname [$ Tui] \ t"; $ output. = "companydingdongid = $ Tui \ t"; $ output. = "personaldingdongid = $ fui \ t"; $ output. = "whosend = 1 \ t"; $ output. = $ Msgcontent; echo $ output. "\ n" ;}?>

2. Second: PhP executes shell commands and matches the output results.

Note: This method cannot generate real-time logs. Because the tail-F command is a real-time update command, PHP cannot obtain the returned results. Therefore, this method is only used to read and process a fixed piece of text.

Run the tail-n1000 shell command through exec to obtain the last 1000 rows of data and then process it. In addition, the curl module in PHP is called to obtain the HTTP Response content. The file name is jobfairs2.php.

<? PHP // error_reporting (e_all &~ E_notice); $ host = array ("Host: bj.baidu.com"); $ DATA = 'user = xxx & QQ = xxx & id = xxx & post = XXX '; $ url = 'HTTP: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims '; $ res = curl_post ($ host, $ data, $ URL); $ arr = json_decode ($ res, true ); $ status = $ arr ["status"]; if ($ status! = 1) {echo "request failed! "; Exit;} // get the returned Enterprise Information $ ret = $ arr [" RET "]; foreach ($ RET as $ key => $ value) {$ name = $ value ["name"] [0]; // map the im id to the enterprise ID by hash ($ value ["im_account"] as $ V) {$ userid [$ v] = $ key; $ compname [$ v] = $ name ;}$ logfile = "/data3/im-log/nginx. im. imp. current/nginx. im. imp. current_current "; // tail-n1000 get the last 1000 rows of records, and save it to the $ log variable $ shell = "tail-N 1000 $ logfile | grep sendmsgok | grep 'spamreasons = \ [\] '| "; $ Shell. = "awk-F' \ t' {print $1, $3, $4, $11;} '"; Exec ($ shell, $ log ); // Save the executed shell results to the array // process each row of records foreach ($ log as $ line) {// match the required field $ flag = preg_match ("/([0-9] +: [0-9] +: [0-9] + ). * fromuserid = ([0-9] + ). * touserid = ([0-9] + ). * msgcontent = (. *)/", $ line, $ matches); if ($ flag = 0) {// continue for matching failure;} // echo $ line. "\ n"; $ time = $ matches [1]; $ fui = $ matches [2]; $ Tui = $ matches [3]; $ msgconten T = $ matches [4]; // check whether fromuserid and touserid correspond to the company $ output = $ time. "\ t"; // use hash to determine whether the im id belongs to an enterprise if (isset ($ userid [$ fui]) {// echo $ line. "\ n"; $ output. = "companyid = $ userid [$ fui] \ t"; $ output. = "companyName = $ compname [$ fui] \ t"; $ output. = "companydingdongid = $ fui \ t"; $ output. = "personaldingdongid = $ Tui \ t"; $ output. = "whosend = 0 \ t"; $ output. = $ msgcontent; echo $ output. "\ n";} else if (isset ($ userid [$ Tui]) {// echo $ line. "\ n"; $ output. = "companyid = $ userid [$ Tui] \ t"; $ output. = "companyName = $ compname [$ Tui] \ t"; $ output. = "companydingdongid = $ Tui \ t"; $ output. = "personaldingdongid = $ fui \ t"; $ output. = "whosend = 1 \ t"; $ output. = $ msgcontent; echo $ output. "\ n" ;}}/** submit request * @ Param $ host array the domain name to be configured array ("Host: bj.ganji.com "); * @ Param $ data string the data to be submitted 'user = xxx & QQ = xxx & id = xxx & post = XXX' .... * @ Param $ URL string the URL to be submitted 'HTTP: // 192.168.1.12/XXX/API/'; */function curl_post ($ host, $ data, $ URL) {$ CH = curl_init (); $ res = curl_setopt ($ ch, curlopt_url, $ URL); // var_dump ($ res); curl_setopt ($ ch, curlopt_ssl_verifyhost, false ); curl_setopt ($ ch, expires, false); curl_setopt ($ ch, curlopt_header, 0); curl_setopt ($ ch, curlopt_post, 0); curl_setopt ($ ch, curlopt_postfields, $ d ATA); curl_setopt ($ ch, curlopt_returntransfer, 1); curl_setopt ($ ch, curlopt_httpheader, $ host); $ result = curl_exec ($ ch); curl_close ($ ch ); if ($ result = NULL) {return 0;} return $ result ;}?>

PHP shell command to execute the function reference link: http://blog.csdn.net/a600423444/article/details/6059548
3. Third: PhP implements the tail-F command to read log files in real time.

The file name is jobfairs3.php. Here, PHP uses the file offset to implement the effect of the tail-F command. However, it is not applicable to reading different files dynamically each time. In addition, the curl module in PHP is called to obtain the HTTP Response content.

<? PHP // error_reporting (e_all &~ E_notice); // PHP obtains the HTTP content through curl $ host = array ("Host: bj.ganji.com "); $ DATA = 'user = xxx & QQ = xxx & id = xxx & post = XXX'; $ url = 'HTTP: // 10.3.20.201/jobfairs/jobfairs_im_port.php? Action = getims '; $ res = curl_post ($ host, $ data, $ URL); // decodes a JSON string into an array $ arr = json_decode ($ res, true ); $ status = $ arr ["status"]; if ($ status! = 1) {echo "request failed! "; Exit;} // get the returned Enterprise Information $ ret = $ arr [" RET "]; foreach ($ RET as $ key => $ value) {$ name = $ value ["name"] [0]; // map the im id to the enterprise ID by hash ($ value ["im_account"] as $ V) {$ userid [$ v] = $ key; $ compname [$ v] = $ name ;}$ logfile = "/data3/im-log/nginx. im. imp. current/nginx. im. imp. current_current "; tail_f ($ logfile, $ userid); // Use PHP to implement the shell tail-F command function tail_f ($ logfile, $ userid) {$ size = filesize ($ log File); $ CH = fopen ($ logfile, 'R'); $ I = 0; while (1) {clearstatcache (); $ tmp_size = filesize ($ logfile ); if (0 <($ Len = $ tmp_size-$ size) {$ I = 0; fseek ($ ch,-($ len-1), seek_end ); $ content = fread ($ ch, $ Len); $ linearr = explode ("\ n", $ content); foreach ($ linearr as $ line) {// echo $ line. "\ n"; if (preg_match ("/sendmsgok. * spamreasons = \ [\]/", $ line) {matchcompany ($ line, $ userid) ;}} else {$ I ++; if ($ I> 60) {echo php_eol. 'The file in 60 s without change, so exit! '; Break;} Sleep (1); continue;} $ size = $ tmp_size;} fclose ($ ch) ;}// checks whether a row of records is in the enterprise information, if yes, output the combined record function matchcompany ($ line, $ userid) {$ flag = preg_match ("/([0-9] +: [0-9] +: [0-9] + ). * fromuserid = ([0-9] + ). * touserid = ([0-9] + ). * msgcontent = (. *) \ tchannel =. */", $ line, $ matches); if ($ flag = 0) {return;} // echo $ matches [0]. "\ t ". $ matches [1]. "\ t ". $ matches [2]. "\ t ". $ matches [3]. "\ t ". $ matches [4]. "\ n"; $ time = $ matches [1]; $ fromuserid = $ matches [2]; $ touserid = $ matches [3]; $ msgcontent = $ matches [4]; // check whether fromuserid and touserid have corresponding companies $ output = $ time. "\ t"; // If the im id belongs to an enterprise if (isset ($ userid [$ fromuserid]) {// echo $ line. "\ n"; $ output. = "companyid = $ userid [$ fui] \ t"; $ output. = "companyName = $ compname [$ fui] \ t"; $ output. = "companydingdongid = $ fui \ t"; $ output. = "personaldingdongid = $ Tui \ t "; $ Output. = "whosend = 0 \ t"; $ output. = $ msgcontent; echo $ output. "\ n";} else if (isset ($ userid [$ touserid]) {// echo $ line. "\ n"; $ output. = "companyid = $ userid [$ Tui] \ t"; $ output. = "companyName = $ compname [$ Tui] \ t"; $ output. = "companydingdongid = $ Tui \ t"; $ output. = "personaldingdongid = $ fui \ t"; $ output. = "whosend = 1 \ t"; $ output. = $ msgcontent; echo $ output. "\ n" ;}}/** submit request * @ Param $ host array required The domain name array ("Host: bj.ganji.com") to be configured "); * @ Param $ data string the data to be submitted 'user = xxx & QQ = xxx & id = xxx & post = XXX '.... * @ Param $ URL string the URL to be submitted 'HTTP: // 192.168.1.12/XXX/API/'; */function curl_post ($ host, $ data, $ URL) {$ CH = curl_init (); $ res = curl_setopt ($ ch, curlopt_url, $ URL); // var_dump ($ res); curl_setopt ($ ch, curlopt_ssl_verifyhost, false ); curl_setopt ($ ch, curlopt_ssl_verifypeer, false); curl_setopt ($ Ch, curlopt_header, 0); curl_setopt ($ ch, curlopt_post, 0); curl_setopt ($ ch, curlopt_postfields, $ data); curl_setopt ($ ch, expires, 1 ); curl_setopt ($ ch, curlopt_httpheader, $ host); $ result = curl_exec ($ ch); curl_close ($ ch); if ($ result = NULL) {return 0 ;} return $ result ;}?>

Iii. Summary

In this article, we finally adopt the first method for implementation. Finally, we need to add start. Sh to crontab-E, as shown in the following record:

30 9 *** CD/home/Baidu/zhaolincheung/jobfairs; SH start. Sh

The second method is not suitable for the business here, because the log needs to be read in real time, but the exec of PHP cannot return the read result of tail-f.
The third method can also be used to read only a fixed log file that is dynamically growing. However, the log file nginx. Im. Imp. current_current is a soft connection that dynamically points to different files, as shown in:


In this way, if you implement tail-F by yourself, the file may change, and the dynamic offset of the file may be different, resulting in incorrect log reading. Therefore, this method is not used.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.