Regular expressions
Symbol |
p> meaning |
. |
Matches any single character in any ASCII, or letter, or number |
^ |
Match beginning |
$ |
Match line end |
* |
Matches any character or one or more repetitions of the previous |
\ |
Escaped, escaped with $. ' "* [] ^ \ () | + ? |
[...] [-] |
Matches a range or collection |
\{\} |
Matches n times: \{n\}, at least n times: \{n,\},m to n times: \{m,n\}, |
+ |
Only for awk, identity matches one or more |
? |
For awk only, matches 0 or 1 times |
Grep
First, show example file DATA.F content
-Dec 3bc1977 LPSX68.00lvx2a138483Sept 5ap1996 USP65.00lvx2c189 -OCT 3zl1998 LPSX43.00kvm9d +219Dec 2cc1999 CAD23.00Plv2c the484Nov 7pl1996 CAD49.00Plv2c234483May 5pa1998 USP37.00kvm9d644216Sept 3zl1998 USP86.00kvm9e234
grep General Match
A generic match usually requires the matching pattern to be enclosed in double quotation marks.
the general format of grep is:grep [options] basic Regular expression [file]
Note Use double quotes for input parameter strings
Options for grep:
-C outputs only the number of matching rows
-I is case insensitive (only for single-character)
-H does not display a file name when querying multiple files
-L only output filenames that contain matching characters when querying multiple files
-N Displays the line number of the matching row
-S does not display error messages that do not exist or have no matching text
-V Displays all lines that do not contain matching text
Querying multiple files: You can use file placeholders to query multiple files, or you can list multiple files, such as:
grep "DATA*.F"
grep "Data1.f" data2.f
Whole Word match: add \> after matching characters
grep "45\>" data.f
GREP uses regular matching
In order to prevent other behaviors such as shell substitution, the use of single quotes is usually used.
Pattern Range: grep ' 48[43] ' data.f, match 484/483
Does not match the beginning of the line: grep ' ^[^48] ' DATA.F
Match the month first, then match the pattern: grep ' [ss]ept ' DATA.F | grep 443
Blank line: grep ' ^$ ' DATA.F
Extended Mode :
Using the-e parameter, this extension allows the use of extended pattern matching
Match 219 or 216:grep-e ' 219|216 ' DATA.F
Class name
Class name |
Equivalence of regular |
|
Class name |
Equivalence of regular |
[[: Upper:]] |
[A-z] |
|
[[: Alnum:]] |
[0-9a-za-z] |
[[: Lower:]] |
[A-z] |
|
[[: Space:]] |
Space or TAB key |
[[:d Igit:]] |
[0-9] |
|
[[: Alpha:]] |
[A-za-z] |
grep ' [[: alpha:]]* ' DATA.F
System grep Command
Catalog: ls-l grep ' ^d '
passwd file: grep "Angel"/etc/passed
To view the DNS service process (typically named): PS | grep "named"
Egrep
Expression or extended grep, which accepts all regular expressions, is characterized by:
- File as a regular string for the query. Egrep-f grepstrings DATA.F
- Use | Symbol that represents one of the matching sides
- Use the ^ symbol to indicate that it does not match.
awk for text filtering
Usage:awk [POSIX or GNU style options]-F progfile file ...
Usage:awk [POSIX or GNU style options] ' program ' File ...
where Progfile or ' program ' is the real awk command, the last file is the input file (s).
The commonly used parameter is-f field-separator, which specifies the file delimiter, which defaults to white space characters. For passed, such as with colons, use the-f: parameter, which means that the colon is a column split.
You can also make the same SH script, specifying #/bin/awk on the first line, so that the script file can be executed using awk.
awk Script
The awk script consists of various operations and patterns.
Awk reads one row at a time, and then divides each row into multiple fields using a delimiter.
The awk statements are made up of patterns and actions, which determine the trigger condition of the action, and if omitted, the action is always executed.
awk mode , including two special fields, begin and end. Begin uses the text-browsing action to begin before the text action is browsed. End uses the total number of printed output text and the end state identifier after the browse text action. The number of rows is always matched or printed, if not specifically indicated.
The awk action , indicated in curly braces {}, is mostly used for printing, as well as if and looping statements. If not specified, all the browsed records will be printed.
domain:when awk executes an action, it marks the domain as $ $ $ ..., where $ $ identifies all domains.
record: each row is a record
Extract each column in a file
awk ' {print $} ' DATA.F
awk ' {print $} ' DATA.F
Add a file to the tail
awk ' BEGIN {print ' month\tprice\n------------------------} {print $ "\ T" $ \ntip:end} END {print "------------------of File "} ' DATA.F
Regular Expressions in awk
When using regular in awk, it is used//enclosed. such as:/wang*/
Conditional operator
Operator |
Describe |
|
Operator |
Describe |
< |
Less than |
|
>= |
Greater than or equal |
<= |
Less than or equal |
|
~ |
Match Regular expression |
== |
Equals |
|
!~ |
Mismatched regular expressions |
!= |
Not equal to |
|
&& |
And |
|| |
Or |
|
! |
Non - |
print a partial column of qualifying rows: awk ' {if ($4~/lpsx/) print $ "\ t" $4 "\ T" $ "DATA.F
print Some columns of rows that do not match the criteria: awk ' {if ($4!~/lpsx/) print $ "\ t" $4 "\ T" $ "DATA.F
Print the first column less than the seventh column:awk ' {if ($ < $7) print "\ t" $4 "\ T" $7} ' DATA.F
awk built-in variables
ARGC number of command line arguments
ARGV Command line parameter arrangement
ENVIRON support for the use of system environment variables in queues
FileName awk browses the file name
FNR number of records to browse files
FS sets the input domain delimiter, which is equivalent to the command line-F option
NF browsing record number of fields
NR number of records read
OFS output Field delimiter
ORS Output Record delimiter
RS Control record delimiter
Print record number, number of fields, last print file name
awk ' {print NF ' \ t ' NR ' \ t ' $} END {print FILENAME} ' DATA.F
awk operator
Assignment actions: =, + =, *=,/=,%=, ^=
Conditional expression:?
Or not: | |, &&,!
Matches: ~,!~
Relationships:<, <=, >, >=,! =, = =
Arithmetic: + 、-、 *,/,%, ^
Prefix: + + 、--
use variable:awk ' {name=$4; price=$5;print name ' \ t ' price} ' DATA.F
Built-in String functions
Gsub (R,s) |
Replace R with S in the whole of $ |
Gsub (r,s,t) |
Replace R with S in the whole t |
Index (S,T) |
Returns the first position of the string T in S |
Length (s) |
return s length |
Match (S,R) |
Tests if s contains a string matching R |
Split (S,A,FS) |
Dividing s into sequence a on FS |
Sprint (FMT,EXP) |
Returns the EXP formatted by FMT |
Sub (r,s) |
Replace s with the longest substring in the leftmost string |
SUBSTR (S,P) |
Returns the suffix part of the string s starting from P |
SUBSTR (S,p,n) |
Returns the suffix part of the string s from p starting at length n |
returns the length of each line : awk ' {print $ "\ T" Length ($)} ' DATA.F
awk modifies the output format with printf
Modifier |
Meaning |
- |
Align Left |
Width |
The step size of the field, with 0 for 0 steps |
. prec |
Maximum string length, or number of digits to the right of the decimal point |
%c |
ASCII characters |
%d |
Integer |
%e |
Floating-point number, scientific notation |
%f |
Floating-point numbers, for example (123.44) |
%g |
awk determines which floating-point number to use to convert E or F |
%o |
Octal number |
Note printf does not automatically output line breaks.
65 output characters to ASCII A:awk ' BEGIN {printf '%c\n ', 65} '
Fixed column-wide output: awk ' {printf '%-15s%s\n ", $1,$3} ' DATA.F
awk script File
Here is a script file
The first line represents the command and arguments that execute the script,!/bin/awk-f
When executing, type the script name and input file to get the output
awk Array
Usage examples:
- Dividing text into arrays: awk ' BEGIN {print split ("123:456:789", Array, ":")} '
The resulting array is: array[1]= "123" and so on.
- Loop: for (element in array) print Array[element]
Shell text filtering