Shell text filtering

Source: Internet
Author: User
Tags posix egrep

Regular expressions

Symbol

p> meaning

.

Matches any single character in any ASCII, or letter, or number

^

Match beginning

$

Match line end

*

Matches any character or one or more repetitions of the previous

\

Escaped, escaped with $. ' "* [] ^ \ () | + ?

[...] [-]

Matches a range or collection

\{\}

Matches n times: \{n\}, at least n times: \{n,\},m to n times: \{m,n\},

+

Only for awk, identity matches one or more

?

For awk only, matches 0 or 1 times

Grep

First, show example file DATA.F content

 -Dec 3bc1977 LPSX68.00lvx2a138483Sept 5ap1996 USP65.00lvx2c189 -OCT 3zl1998 LPSX43.00kvm9d +219Dec 2cc1999 CAD23.00Plv2c the484Nov 7pl1996 CAD49.00Plv2c234483May 5pa1998 USP37.00kvm9d644216Sept 3zl1998 USP86.00kvm9e234
grep General Match

A generic match usually requires the matching pattern to be enclosed in double quotation marks.

the general format of grep is:grep [options] basic Regular expression [file]
Note Use double quotes for input parameter strings

Options for grep:
-C outputs only the number of matching rows
-I is case insensitive (only for single-character)
-H does not display a file name when querying multiple files
-L only output filenames that contain matching characters when querying multiple files
-N Displays the line number of the matching row
-S does not display error messages that do not exist or have no matching text
-V Displays all lines that do not contain matching text

Querying multiple files: You can use file placeholders to query multiple files, or you can list multiple files, such as:
grep "DATA*.F"
grep "Data1.f" data2.f

Whole Word match: add \> after matching characters
grep "45\>" data.f

GREP uses regular matching

In order to prevent other behaviors such as shell substitution, the use of single quotes is usually used.

Pattern Range: grep ' 48[43] ' data.f, match 484/483
Does not match the beginning of the line: grep ' ^[^48] ' DATA.F
Match the month first, then match the pattern: grep ' [ss]ept ' DATA.F | grep 443
Blank line: grep ' ^$ ' DATA.F

Extended Mode :

Using the-e parameter, this extension allows the use of extended pattern matching
Match 219 or 216:grep-e ' 219|216 ' DATA.F

Class name

Class name

Equivalence of regular

Class name

Equivalence of regular

[[: Upper:]]

[A-z]

[[: Alnum:]]

[0-9a-za-z]

[[: Lower:]]

[A-z]

[[: Space:]]

Space or TAB key

[[:d Igit:]]

[0-9]

[[: Alpha:]]

[A-za-z]

grep ' [[: alpha:]]* ' DATA.F

System grep Command

Catalog: ls-l grep ' ^d '
passwd file: grep "Angel"/etc/passed
To view the DNS service process (typically named): PS | grep "named"

Egrep

Expression or extended grep, which accepts all regular expressions, is characterized by:

    1. File as a regular string for the query. Egrep-f grepstrings DATA.F
    2. Use | Symbol that represents one of the matching sides
    3. Use the ^ symbol to indicate that it does not match.

awk for text filtering

Usage:awk [POSIX or GNU style options]-F progfile file ...
Usage:awk [POSIX or GNU style options] ' program ' File ...

where Progfile or ' program ' is the real awk command, the last file is the input file (s).

The commonly used parameter is-f field-separator, which specifies the file delimiter, which defaults to white space characters. For passed, such as with colons, use the-f: parameter, which means that the colon is a column split.

You can also make the same SH script, specifying #/bin/awk on the first line, so that the script file can be executed using awk.

awk Script

The awk script consists of various operations and patterns.

Awk reads one row at a time, and then divides each row into multiple fields using a delimiter.

The awk statements are made up of patterns and actions, which determine the trigger condition of the action, and if omitted, the action is always executed.

awk mode , including two special fields, begin and end. Begin uses the text-browsing action to begin before the text action is browsed. End uses the total number of printed output text and the end state identifier after the browse text action. The number of rows is always matched or printed, if not specifically indicated.

The awk action , indicated in curly braces {}, is mostly used for printing, as well as if and looping statements. If not specified, all the browsed records will be printed.

domain:when awk executes an action, it marks the domain as $ $ $ ..., where $ $ identifies all domains.

record: each row is a record

Extract each column in a file

awk ' {print $} ' DATA.F
awk ' {print $} ' DATA.F

Add a file to the tail

awk ' BEGIN {print ' month\tprice\n------------------------} {print $ "\ T" $ \ntip:end} END {print "------------------of File "} ' DATA.F

Regular Expressions in awk

When using regular in awk, it is used//enclosed. such as:/wang*/

Conditional operator

Operator

Describe

Operator

Describe

<

Less than

>=

Greater than or equal

<=

Less than or equal

~

Match Regular expression

==

Equals

!~

Mismatched regular expressions

!=

Not equal to

&& And

||

Or

! Non -

print a partial column of qualifying rows: awk ' {if ($4~/lpsx/) print $ "\ t" $4 "\ T" $ "DATA.F

print Some columns of rows that do not match the criteria: awk ' {if ($4!~/lpsx/) print $ "\ t" $4 "\ T" $ "DATA.F

Print the first column less than the seventh column:awk ' {if ($ < $7) print "\ t" $4 "\ T" $7} ' DATA.F

awk built-in variables

ARGC number of command line arguments
ARGV Command line parameter arrangement
ENVIRON support for the use of system environment variables in queues
FileName awk browses the file name
FNR number of records to browse files
FS sets the input domain delimiter, which is equivalent to the command line-F option
NF browsing record number of fields
NR number of records read
OFS output Field delimiter
ORS Output Record delimiter
RS Control record delimiter

Print record number, number of fields, last print file name
awk ' {print NF ' \ t ' NR ' \ t ' $} END {print FILENAME} ' DATA.F

awk operator

Assignment actions: =, + =, *=,/=,%=, ^=
Conditional expression:?
Or not: | |, &&,!
Matches: ~,!~
Relationships:<, <=, >, >=,! =, = =
Arithmetic: + 、-、 *,/,%, ^
Prefix: + + 、--

use variable:awk ' {name=$4; price=$5;print name ' \ t ' price} ' DATA.F

Built-in String functions

Gsub (R,s)

Replace R with S in the whole of $

Gsub (r,s,t)

Replace R with S in the whole t

Index (S,T)

Returns the first position of the string T in S

Length (s)

return s length

Match (S,R)

Tests if s contains a string matching R

Split (S,A,FS)

Dividing s into sequence a on FS

Sprint (FMT,EXP)

Returns the EXP formatted by FMT

Sub (r,s)

Replace s with the longest substring in the leftmost string

SUBSTR (S,P)

Returns the suffix part of the string s starting from P

SUBSTR (S,p,n)

Returns the suffix part of the string s from p starting at length n

returns the length of each line : awk ' {print $ "\ T" Length ($)} ' DATA.F

awk modifies the output format with printf

Modifier

Meaning

-

Align Left

Width

The step size of the field, with 0 for 0 steps

. prec

Maximum string length, or number of digits to the right of the decimal point

%c

ASCII characters

%d

Integer

%e

Floating-point number, scientific notation

%f

Floating-point numbers, for example (123.44)

%g

awk determines which floating-point number to use to convert E or F

%o

Octal number

Note printf does not automatically output line breaks.

65 output characters to ASCII A:awk ' BEGIN {printf '%c\n ', 65} '
Fixed column-wide output: awk ' {printf '%-15s%s\n ", $1,$3} ' DATA.F

awk script File

Here is a script file

The first line represents the command and arguments that execute the script,!/bin/awk-f

When executing, type the script name and input file to get the output

awk Array

Usage examples:

    1. Dividing text into arrays: awk ' BEGIN {print split ("123:456:789", Array, ":")} '
      The resulting array is: array[1]= "123" and so on.
    2. Loop: for (element in array) print Array[element]

Shell text filtering

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.