Some shell command tricks in log processing

Source: Internet
Author: User
Tags diff apache access log

Some shell command tricks in log processing

Quirks's log analysis, the future of the uncertainty of the day, the only thing that may also be a bit of a point is to manually handle a large number of logs. Sum up.

The input to the log file is the text of a few G moving. Get a list, a number, a scale from n such a file. In any case, the shell command is not only a method to verify the accuracy of the system data, but also a good learning process.

Cutting journal lines with the Cut command

The following line is a typical Apache access log:

If you need to get an IP address you can use the cut command

-d ‘ ‘The result is that the line is cut by a space and the -f1 first field is taken, so that the IP list

Sometimes the files you get are delimited or cut-cut \t , but you need to write more than one $

Use the TR command to remove characters, replace characters

-c:complement, replacing characters that are not contained in SET1 with SET2
-d:delete, delete all characters in SET1, do not convert
-s:squeeze-repeats, compressing repeated characters in a SET1
-t:truncate-set1, convert SET1 with SET2, usually the default is-t

If you get the split file,

[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | TR- s [A-c] #-S
ACDDD SS

[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | TR- s "" "," #d和s之间有2个空格, replace after compression repeat
AAACCCDDD,SS,

[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | Tr-t "" ","
AAACCCDDD,,SS,

[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | Tr-s "A" "B" #替换后压缩重复
BCCCDDD SS

Replace the space with the file into CSV

The above command removes the space directly

After the log processing will often appear empty lines, the TR command to remove the empty line principle is to replace two consecutive lines of a line break

Use the Uniq command to reset

Think of the IP list for the IP list to be accessed independently.

If you want to count each IP access number, you can add a parameter C

The resulting format is as follows:

The preceding number is the number of occurrences.

Using Awk/seed to process logs

Awk/seed is the ultimate balm for processing logs. It is true that everything can be done. Awk/seed is a great science. Here's a log I came across, formatted like this:

If I need to get Isactive=1 's journal line, take it to out= ' previous paragraph, like ABC above.

The function of grep is isActive=1 to filter the rows. Awk followed by "is the awk language." $0always represents the currently matched field value, match substr is a function that awk can use, and the code in {} after match is executed. When the match,$0 is the regular matching part. Rstart,rlength is a constant that awk can use, representing the beginning of the match starting subscript, Rlength is the length of the match.

It is not possible to use the ' light escape ' in ' \x27 ' with the 16-binary code. Turn 16 binary You can use Python code to "‘".encode("hex") get

It was a surprise that awk explained it so simply, but it wasn't even a primer.

Collection operations

Imagine I want to get two lists of communication, and set, the difference set, statistics are often encountered, such as I want to get yesterday is the access to the IP today, in fact, today's IP list and yesterday's IP list intersection.

Define two simple files first:

If you want to get the intersection of AB 4 5, you can use the following command:

If you want to get the 1-9 of the assembly, you can:

If you want to get the difference of AB, that is, a removes the intersection of AB 1 2 3

In the same vein: The difference set of BA:

The above two commands are equivalent

Comm command is compare function, if any parameters are not brought what?

The diff command used to look at what the code changed:

Diff A.txt B.txt  
Summary && References

I thought I could play around with these commands, and it's not much of a problem to handle a log.

A blog post describing collection operations in the shell:

Http://wordaligned.org/articles/shell-script-sets

A blog that has been placed on the shell side of the Favorites folder:

Common techniques for Linux shells

The Linux Shell Advanced Tips section of awk is well written

Some shell command tricks in log processing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.