Some shell command tricks in log processing
Quirks's log analysis, the future of the uncertainty of the day, the only thing that may also be a bit of a point is to manually handle a large number of logs. Sum up.
The input to the log file is the text of a few G moving. Get a list, a number, a scale from n such a file. In any case, the shell command is not only a method to verify the accuracy of the system data, but also a good learning process.
Cutting journal lines with the Cut command
The following line is a typical Apache access log:
If you need to get an IP address you can use the cut
command
-d ‘ ‘
The result is that the line is cut by a space and the -f1
first field is taken, so that the IP list
Sometimes the files you get are delimited or cut-cut \t
, but you need to write more than one $
Use the TR command to remove characters, replace characters
-c:complement, replacing characters that are not contained in SET1 with SET2
-d:delete, delete all characters in SET1, do not convert
-s:squeeze-repeats, compressing repeated characters in a SET1
-t:truncate-set1, convert SET1 with SET2, usually the default is-t
If you get
the split file,
[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | TR- s [A-c] #-S
ACDDD SS
[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | TR- s "" "," #d和s之间有2个空格, replace after compression repeat
AAACCCDDD,SS,
[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | Tr-t "" ","
AAACCCDDD,,SS,
[Email protected]:~/dhcptest$ echo "AAACCCDDD ss" | Tr-s "A" "B" #替换后压缩重复
BCCCDDD SS
Replace the space with the file into CSV
The above command removes the space directly
After the log processing will often appear empty lines, the TR command to remove the empty line principle is to replace two consecutive lines of a line break
Use the Uniq command to reset
Think of the IP list for the IP list to be accessed independently.
If you want to count each IP access number, you can add a parameter C
The resulting format is as follows:
The preceding number is the number of occurrences.
Using Awk/seed to process logs
Awk/seed is the ultimate balm for processing logs. It is true that everything can be done. Awk/seed is a great science. Here's a log I came across, formatted like this:
If I need to get Isactive=1 's journal line, take it to out= ' previous paragraph, like ABC above.
The function of grep is isActive=1
to filter the rows. Awk followed by "is the awk language." $0
always represents the currently matched field value, match substr is a function that awk can use, and the code in {} after match is executed. When the match,$0 is the regular matching part. Rstart,rlength is a constant that awk can use, representing the beginning of the match starting subscript, Rlength is the length of the match.
It is not possible to use the ' light escape ' in ' \x27 ' with the 16-binary code. Turn 16 binary You can use Python code to "‘".encode("hex")
get
It was a surprise that awk explained it so simply, but it wasn't even a primer.
Collection operations
Imagine I want to get two lists of communication, and set, the difference set, statistics are often encountered, such as I want to get yesterday is the access to the IP today, in fact, today's IP list and yesterday's IP list intersection.
Define two simple files first:
If you want to get the intersection of AB 4 5, you can use the following command:
If you want to get the 1-9 of the assembly, you can:
If you want to get the difference of AB, that is, a removes the intersection of AB 1 2 3
In the same vein: The difference set of BA:
The above two commands are equivalent
Comm command is compare function, if any parameters are not brought what?
The diff command used to look at what the code changed:
Diff A.txt B.txt
Summary && References
I thought I could play around with these commands, and it's not much of a problem to handle a log.
A blog post describing collection operations in the shell:
Http://wordaligned.org/articles/shell-script-sets
A blog that has been placed on the shell side of the Favorites folder:
Common techniques for Linux shells
The Linux Shell Advanced Tips section of awk is well written
Some shell command tricks in log processing