Shell quickly searches logs in large file logs by time period

Source: Internet
Author: User
Tags set time

Problem Description:

In the high-traffic line service, the log system will produce a large number of logs, at all levels of the tens of millions of G. A quick search of logs in such a large file is a problem that ops people often encounter. The problem that we often meet is to query for some logs over time. For example, one of the visits failed today, about 9 o'clock in the morning, to find out the log and find the reason for the failure.

Common treatment methods and disadvantages:

1. If the file is small, it is convenient to use grep, awk or sed to match each other within 100m, but when the file is very large, its search efficiency is very low, running for up to dozens of minutes or even hours.

2. Using Hadoop Big Data processing, query fast, high efficiency. But we need to build a Hadoop environment that requires a huge amount of computing. Feeling is to kill chicken with sledgehammer.

3. Manually find by time, use the tail-c size filename command to extract a log to see if the time is eligible. If you do not fit, adjust the size parameter until you find the time period before 9, and then start with grep, awk, or sed to match each other until you find the target log and then CTRL + C to stop, which can really save a lot of time, and it's a way I used to.

Solution:

Obviously, manual lookup is not the ideal solution for a programmer, so can you write a script that automatically searches for super-large logs by time period? Common commands such as cat, Head, tail, grep, more, less, sed, awk seem to be unable to provide an effective solution. Here's how the blog is introduced.

Here's my script, let's call it the probe method:

#!/bin/bash# read in file name file_name=$1#读入查询起始时间start=$2#读取文件大小file_size=$(Stat--format=%s $file _name) #设置探针步长, typically one of the file-size Baifi step=500000000#如果文件大小小于步长, size is a file, otherwise size is step [$file _size-lt $step] && size= $file _size | | Size=$step # Initialize probe time Test_time="00:00:00"#循环检测 until the probe time is greater than the query time, stop while[[${test_time} <${start}]] Dosize=$ (($size +$step)) Test_time=$(DD if= $file _name skip=$ ($size/10000)) ibs=10000Count=1 2>/dev/NULL|sed-N"2p"|awk '{print $}') Done#读取此时的查询size, use tail-the C command starts by checking the log from the target time. Tail-C $ (($file _size-$size + $step)) $file _name

Execute SH find.sh filename.log 09:00:00, and then filter with commands such as grep.

To explain, the main use of the DD command: Copy a file, and follow the parameters of its processing transformation.

you can view the relevant documentation, such as , http://blog.chinaunix.net/uid-24958038-id-3416169.html

Here, using the DD command to copy a small chunk from a large file, extract the time, that is, test_time, to determine whether the Test_time meets the criteria, does not conform to make the size of a step and then continue to extract time until the condition is met. Because the DD command has the SKIP option, you can skip the specified size part of the interception to a section, its efficiency is very high, cycle 500 times time is more than 1s, that is, the probe Test 500 times is about 1s.

The DD command intercepts a log and uses SED to take out a complete line and then use awk to remove the timestamp.

Note that the set time is 09:00:00, the query results are not strictly from 09:00:00 onwards, will generally be a little forward, where the step parameters are affected. The smaller the step, the more precise, and the more probe attempts.

Original, forwarding please specify. by lzjing

Shell quickly searches logs in large file logs by time period

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.