Shell programming notes (Advanced 2) text filtering

Source: Internet
Author: User
Tags control characters
Shell programming notes (4)-text filtering
Author: sunwill_chen published on: attribute: Original Copy Link

1. Regular Expression
(1) Regular Expressions are generally used to describe the special usage of the text mode. common characters (such as a-z) and special characters (such as metacharacters, such as/, *, and /,*,? .
(2) Basic metacharacters and their meanings
^: Match only the beginning of the row. For example, ^ A matches the ABC, a2e, A12, AAA ,......
$: Matches only the end of a row. For example, ^ A matches the BCA, 12a, AAA ,.......
*: Matches 0 or more single characters. For example, (a) * matches null, A, AA, AAA ,....
[]: Matches only the characters in. It can be either a single character or a character sequence. Use "," to separate different strings to be matched. You can also use-to indicate the range of character sequences in []. For example, [1-5] indicates [12345].
/: Only used to block the special meaning of a metacharacter. For example,/*,/',/",/|,/+,/^ ,/.
. (Point) only matches any single character.
Pattern/{n/}: it is only used to match the occurrence times of the previous pattern. n is the number of times. For example, a/{2/} matches AA.
Pattern/{n,/}: The same as above, but the minimum number of times is N. For example, a/{2,/} matches AA, AAA, AAAA ,.....
Pattern/{n, m/}: the same meaning as above, but the number of times is between N and M. For example, a/{2, 4/} matches AA, AAA, and AAAA.
(3) Example:
^ $: Matches empty rows
^. $: Match the rows that contain one character
/*/. PAS: matches all characters or files ending with *. Pas.
[0123456789] or [0-9]: assume you want to match any number.
[A-Z]: Any lowercase letter
[A-Za-Z]: Any uppercase/lowercase letter
[S, S]: matching case s
[0-9]/{3 /}/. [0-9]/{3 /}/. [0-9]/{3 /}/. [0-9]/{3/}: A string consisting of three 0-9 IP addresses: [0-9]/{3 ;/.: Match point (note that the point here is a special character, so use "/" to block its meaning)
2. Find Introduction
(1) Search for commands with certain feature files. You can traverse the current directory or even the entire file system to view some files or directories. When traversing a large file system, it is generally executed in the background.
(2) general form of the find command
Find pathname-options [-print-Exec-OK]
-Pathname: directory path searched by the find command. For example, "." is used to represent the current directory, and/is used to represent the root directory of the system.
-Print: The find command outputs matching files to the standard output.
-Exec: The find command executes the Shell Command provided by this parameter on the matching file. The corresponding command form is
'COMMAND '{}/; (Note the space between {} And)
-OK and-exec play the same role, but execute the shell command given by this parameter in a safer mode. A prompt will be given before each command is executed, let the user determine whether to execute.
Options has the following types:
-Name: searches for files by file name
-Perm: searches for files based on file permissions.
-User: searches for files based on the file owner
-Group: searches for files according to the group to which the files belong.
-Mtime-N + N: Find the file based on the file change time.-N indicates that the file change time is earlier than N days, and + N indicates that the file change time is earlier than N days. The find command also has the-atime and-ctime options, but they are similar to the-mtime options.
-Size N [c]: searches for files with a length of N blocks. If a file contains C, the file length is measured in bytes.
-Nogroup: Find the file with no valid group, that is, the group to which the file belongs does not exist in/etc/groups.
-Newer file1! File2 searches for files whose change time is newer than file1 but older than file2.
-Depth first searches for matching files in the specified directory, and then searches for matching files in the subdirectory if no matching files exist.
-Type: search for a certain type of file, such
B: block Device Files
D: Directory
E: character Device File
P; MPS queue File
L: Symbolic Link file
F: Common File
(3) Example of the find command
Find-name "*. txt"-print: Find the file ending with TXT and output it to the screen.
Find/cmd ". Sh"-print find all sh files in the/CMD directory and Output
Find.-Perm 755-print: Find the file with the permission of 755 in the current directory and Output
Find 'pwd'-User Root-print to find the file whose main directory is root and Output
Find./-group sunwill-print: Find the file whose master is sunwill under the current directory.
Find/var-mtime-5-print find all files in the/var directory that have been changed to within 5 days
Find/var-mtime + 5-print find/var directory, change the time to all files 5 days ago
Find/var-newer "myfile1 "! -Newer "myfile2"-print searches for all files in the/var directory that are newer than myfile1 but older than myfile2.
Find/var-type D-print find all directories under the/var directory
Find/var-type L-print find all symbolic link files under the/var directory.
Find.-size + 000000c-print: Search for files larger than 1000000 bytes in the current directory
Find/-name "con. File"-depth-print: Check whether "con. File" is in the root directory. If not, search for "con. File" in its subdirectory.
Find.-Type F-exec LS-l {}/; find whether there are common files in the current directory. If yes, execute LS-l
(4) xargs command
When you use the-exec option of the find command to process matched files, the find command passes all matching files to Exec together. Unfortunately, some systems have limits on the length of commands that can be passed to exec, so that the find command can run for a few minutes, even if an overflow error occurs. The error message is usually "the parameter column is too long" or "parameter column overflow ". This is the use of xargs, especially when used with the find command. exec will initiate multiple processes, and xargs will have multiple, only one
Find./-Perm-7-print | xargs chmod o-w: Find the file with the permission of 7 and pass it to chmod for processing.

3. grep Introduction
(1) The general format of grep is grep [Options] basic regular expression [file]
String parameters are best enclosed in double quotes. One is to prevent misunderstanding as a shell command, and the other is to find strings composed of multiple words.
-C: only records of matched rows are output.
-I: case-insensitive (only applicable to a single character)
-H: the file name is not displayed when multiple files are queried.
-H: only show file names
-L: When querying multiple files, only names containing matching characters are output.
-N: only matching rows and their row numbers are displayed.
-S: the error message that does not exist or does not match the text is not displayed.
-V: displays all rows that do not contain matched text.
(2) Example:
Grep ^ [^ 210] myfile matches rows starting with not 2, 1, and 0 in myfile.
Grep "[5-8] [6-9] [0-3]" myfile matches the first digit in myfile as 5 | 6 | 7 | 8, the second digit is 6 | 7 | 8 | 9, and the third digit is 0 | 1 | 2 | three characters of the row.
Grep "4/{2, 4/}" myfile matches rows containing 44,444 or 4444 in myfile.
Grep "/? "Myfile matches the rows containing any characters in myfile.
(3) grep command Class Name
[[: Upper:] indicates [A-Z]
[[: Alnum:] indicates [0-9a-za-z]
[[: Lower:] indicates [A-Z]
[[: Space:] indicates a space or a tab key.
[[: Digit:] indicates [0-9]
[[: Alpha:] indicates [A-Za-Z]
For example, grep "5 [[: digit:] [[: digit:]" myfile matching myfile contains rows with numbers starting with 5.

4. awk Introduction
You can browse and extract information from a file or string based on specified rules. It is a language of Self-interpretation.
(1) awk command line mode awk [-f Filed-configurator] 'command' input-Files
Awk Script: All awk commands insert a file, make the awk program executable, and then use the awk command interpreter as the first line of the script to call it by typing the Script Name. Awk scripts are composed of various operations and modes.
The Mode part determines when the Action Statement is triggered and the event is triggered. (Begin, end)
Action to process the data, put it in {} to specify (print)
(2) separators, fields, and records
During awk execution, the browsing domain is marked as $1, $2,... $ n. This method becomes the domain ID. $0 indicates all domains.
(3) Example:
Awk '{print $0}' test.txt | tee test. out all rows in test.txt $0 indicate all fields
Awk-F: '{print $1} test.txt | tee test. out .. Only the separator is ":"
Awk 'in in {print "ipdate/N"} {print $1 usd/t "$4} end {print" end-of-Report "} 'test.txt
Print "end-of-Report" at the end of "ipdate". For example, if three pieces of information are matched, the output is as follows:
Ipdate
1 first
2 second
3 third
End-of-Report
(4) matching operator ~ Match ,!~ Mismatch
Cat test.txt | awk '$0 ~ /210.34.0.13/'match test.txt to the row 210.34.0.13
Awk '$0 !~ /210.34.0.13 'test.txt matching test.txt is not a line of 210.34.0.13
Awk '{if ($1 = "210.34.0.13") Print $0}' test.txt matches the row with the first field 210.34.0.13 in test.txt.

5. Introduction to SED
Sed does not deal with the initialization file. It only operates on a copy, and if all the changes are not redirected to a file, the output is to the screen.
Sed is a very important text filtering tool. It uses a single line of command or pipeline combined with grep and awk. Is a non-interactive text stream editing.
(1) three methods of calling SED
Use sed command line format: sed [Options] sed command input file
The format of the SED script file is sed [Options]-F sed.
Sed script file [Options] Input File
-- Whether using the shell command line or script file, if the input file is not specified, sed accepts the input from the standard input, generally the keyboard or redirection result.
(2) the options of the SED command are as follows:
-N: do not print
-C: The next command is the edit command.
-F: If the SED script file is being called
(3) how sed queries text in files
-- The row number can be a simple number or a range of the row number.
-- Use a regular expression
(4) reading text
X is a row number.
X, Y indicates that the row number ranges from X to Y.
/Pattern/query rows in the include Mode
/Pattern/query rows that contain two modes
Pattern/, X queries the rows in the include mode on the given row number
X,/pattern/query matched rows by row number and Pattern
X, Y! Query rows that do not contain the specified row numbers x and y
(5) Basic sed Editing Command
P print matching rows
D. Delete matching rows.
= Display the file line number
A/Add new text information after locating the row number
I/Insert new text information after locating the row number
C. Replace the positioning text with the new text
S replace the corresponding mode with the replacement Mode
R. Read the file from another file.
W. Write text to a file
Q. Release or exit immediately after the first mode match is completed
L display control characters equivalent to October ASCII code
{} Command group executed on the target line
N read the next line of text from another file and append it to the next line
G paste Mode 2 to/pattern N/
Y Transfer Character
(6) Example:
Sed-N '2p' test.txt prints the information of the second line (Note:-N does not print unmatched information. If-N is not added, all information of the file is printed instead of matching information)
Sed-N '1, 4p' test.txt prints the information from the first row to the fourth row
Sed-n'/los/P' test.txt mode matches los and prints it out.
Sed-n'2,/los/P' test.txt starts from the second line .. Knows to match the first Los
Sed-n'/^ $/P' test.txt matches empty rows
Sed-n-e '/^ $/p'-e'/^ $/= 'test.txt print empty rows and row numbers
Sed-n'/Good/A/morning 'test.txt append the morning after the matched good
Sed-n'/Good/I/morning 'test.txt inserts morning before the matched good
Sed-n'/Good/C/morning 'test.txt Replace the matched good with morning
Sed '1, 2D 'test.txt Delete rows 1st and 2
Sed's/Good morning/G' test.txt matches good and replaces it with Goodmorning
Send's/Good/& Hello/p 'test.txt match to good and add hello to it.
Send's/Good/Hello &/P 'test.txt match to good and add hello to it.

6. Merge and split (sort, uniq, join, cut, paste, split)
(1) sot command
Sort [Options] files many different fields are classified in different column order
-C: whether the test file has been classified
-M: merge two classification files.
-U: delete all duplicate rows.
-O stores the output file name of the sort result
-T domain separator, starts classification with non-space or tab
+ N: N indicates the domain number. Use this domain number to start classification.
-N indicates that the category is a digital category item in the domain.
-R comparison Inverse
Whether the sort-C test.txt test file has been classified
Sort-u test.txt sorts and merges the same rows
Sort-r test.txt is arranged in reverse order
Sort-T "/" + 2 test.txt is separated by "/", and the second domain starts to be classified.
(2) uniq command
Uniq [Options] files removes or disables duplicate rows from a text file
-U only displays non-repeated rows
-D: only duplicate data rows are displayed. Only one row is displayed for each duplicate row.
-C: print the number of times each duplicate row appears.
-F: N is a number, and the first n fields are ignored.
Uniq-F 2 test.txt ignore the first two fields
(3) join command
Join [Options] file1 file2 is used to link rows from two classified text files
-An, n is a number used to display unmatched rows from file N during connection.
-ONM: Connection domain. n is the file number, and m is the domain number.
-Jnm, n is the file number, M is the domain number, and other domains are used as the connection domain
-T, domain separator. This field delimiter is used to set non-space or tab fields.
(4) split command
Split-output_file_size intput_filename output_filename
It is used to split large files into small files.
-B N, the size of each split file n
-C n: Each split file can contain up to N bytes per row.
-L n, number of lines per split File
-N, same as-l n
Split-10 test.txt split test.txt into 10 rows of small files
(5) Cut command
Cut-C n1-n2 filename displays the text from the beginning of each line from N1 to N2.
Cut-C 3-5 test.txt shows that each line in test.txt contains 3rd to 5th characters

------------------------------------------------------------------------- End

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.