Text Processing Tools
The Three Musketeers of Linux text Processing
GRP: Text Filtering tool (mode: pattern)
Sed: text Editor-stream editor
Internship on Awk:linux for gawk, Text Report Generator (formatted text)
Regular expressions are used in all three tools
Regual Expression
A pattern written by a class of special characters and text characters, where some characters do not represent literal meanings, but are used for control or for the function of a wildcard.
Classification
Basic Regular Expressions
Extending regular Expressions
Different meta characters differ
Grep:global search Regular expression and print out of the line
Function: The text Search tool, according to the user-specified mode (filter condition) to the target text line by row matching check, print the matching line
Patterns: Metacharacters with regular expressions and filter conditions written by text characters
Grep
Common options:--color=auto auto-coloring matches to text
-I ingnorecase ignoring character case
-O displays only the string that matches to itself
-V displays only reverse matches, displaying text that cannot be matched to
-e support for using extended regular expressions
-Q silent mode does not output any information
-a# Displays the result of the search text after the # line
-b# displays the previous # lines of the text results that were searched
-c# displays the previous # lines of the text results that were searched
The meta-character of a regular expression
Character matching
.: Dot number matches any single character
[]: matches any single character within the specified range
[^]: matches any single character outside the specified range
[:d igit:],[:lower:],[:upper:],[:alpha:],[:p Unct:],[:space:],[:alnum:]
Number of matches-----------(how long does the regular expression work in greedy mode)
Used after the character to specify the number of occurrences
To limit the number of occurrences of the preceding character
*: Match its preceding character any time--0,1 or multiple times
. * matches any character of any length
\? Match its previous character 0 or 1 times, that is, the preceding character is optional
\+ matches its preceding character 1 or more times, that is, the preceding character appears at least once
\{m\} matches the preceding character m times, exact number of matches
\{m,n\} matches the preceding character at least m times, at least n times
\{0,n\} matches the preceding character up to n times
\{m,\} matches the preceding characters at least m times, more than an unlimited
Position anchoring
^: Anchor at the beginning of the line; to the left of the pattern
$: End of line anchoring; right side of the pattern
^pattern$: Used to match entire rows
^$: Blank line anchoring
^[[:space:]]*$: Blank lines or lines that contain white space characters
Word: A continuous character (string) of non-special characters is called a word
\<: The first anchor of the word, used for the left side of the word pattern
\>: Ending anchor for the right side of the word pattern
\<pattern\> Match full word
Grouping and referencing
\ (\): Binds one or more characters together and treats them as a whole
\ (xy\) *ab
XY appears 0 times 1 or more times
Note: The patterns in the grouping brackets match to the content that is automatically recorded in the internal variables by the regular expression engine, and these variables are
\1
\2
\3
......
Back reference:
grep "\ (L.. e\) *\1 "
Refers to the character to which the pattern in the preceding grouping brackets matches
Grep
Support for basic regular expressions
-E supports extended regular expressions
-F do not use regular expressions
Egrep
Supports extended regular expressions
-G support for basic regular expressions
-F do not use regular expressions
Fgrep---------use Fgrep for better performance when there is no need to use meta characters to write patterns
Regular expressions are not supported
-E supports extended regular expressions
-G support for basic regular expressions
--------------------------------------------------------------------------------
Extending the meta-character of a regular expression
Character matching
.: Any single character
[]: Any single character within the specified range
[^]: Any single character outside the specified range
Number of Matches
*: Any 0,1 or multiple times
? : 0 or 1 times
{m} matches its first character m times
{M,n}: matches its first character at least m times, up to N times
{0,m}: matches its first character up to M times
{m,}: Match minimum m times
Position anchoring
^: Anchor at the beginning of the line
$: End of line anchoring
\<: The first anchor of the word
\>: Ending anchoring
Or: A|b entire right or left
C|cat C or Cat
(C|C) at cat or cat
Grouping and referencing:
(): Group
Back reference:
\1
\2
\3
...
---------------------------------------------------------------------------------------------------------
Linux file Lookup
Find
File Lookup
Locate
File Lookup: Find eligible files on the file system
Locate
Index library that relies on the build number of the internship
Automatic system implementation
Manually create a new database (UpdateDB, which consumes system resources very much)
Operating characteristics:
Fast Search Speed
Fuzzy Lookup
Non-real-time lookup, the file may have changed or does not exist
Locate
Locate [option] ... PATTERN ...
-B matches only the base name in the path
-C statistics on the number of eligible files
-R using the basic regular expression
Find command
Real-time Lookup tool to complete file lookups by traversing the file system hierarchy under the specified starting path
Operating characteristics:
Find speed slightly slower
Exact search
Real-time Search
Find
Find [OPTIONS] [Find starting path] [find condition] [processing action]
Find start path: Brake specific search target start path; defaults to the current directory
Find criteria: Specifies the lookup criteria that can be based on the file name, size type. Subordinate relationships, permissions, and other standards. Implicitly, find all files under the specified path (effect like LS)
Handling actions: Actions made to match the find criteria, such as delete, output to standard output by default
Search criteria:
An expression:
Options and tests
Test: The result is usually a Boolean (True,false)
Find criteria
-name "PATTERN"
-iname "PATTERN" ignores case
PATTERN--->glob-style wildcard characters
* ? [] [^].....
-regex PATTERN find files based on regular expression regular expressions
The match is the entire path, not the base name (not much)
Combination test:
With:-A defaults to combinatorial logic (multiple conditions are met at the same time)
Or:-O (or) satisfies one of the conditions
Non:-not,! Conditional inversion
Options
Search according to the genus of the file
~] #find/tmp-user username
Find files in/tmp under username owner
/uid
/group
/gid
/nouser
/nogroup
Note: The user is deleted, the group is deleted, the files of the owner and the group are left only the UID and GID are not the owner and the group
Available LS view
Find based on the type of file
-type type
F Common Files
D catalog File
L Symbolic Link file
B-Block Equipment
C-Character device
P Pipeline File
s socket file
Find by File size
-size [+|-] #UNIT #为指定大小的数字
Common units: K.M.G
#UNIT-----(#-1,#)
-#UNIT----(0,#-1)
+ #UNIT----(#, Infinity)
Find by Time stamp
On a per-day
-atime [+|-]#
+#-------(x>=#+1)
-#------(x<#)
#------(#<=x<#+1)
-mtime
-ctime
In minutes
-amin
-mmin
-cmin
Find based on file permissions
-prem [+|-] MODE
MDOE: Exact Match
The MODE three-digit representation of the permissions such as 664,755,222 ...
+mode any one of the permissions of any class of users (U,g,o) meets the criteria for both conditions
Find./+222 any class with a write-only
Find./+621 The owner has read or write or a group has written or other people have execute permission three cases meet one of the conditions
-mode each of the permissions of each class of users meets the criteria at the same time
Find./-222 means that each bit is less than or equal to 2
Find./-666 includes 666,650,550,111 but excludes 766
Handling actions:
-print output to standard output, default action
-ls similar to performing "Ls-l" on a found file
-delete Delete a found file (use caution)
-fls/path/to/somefile long-format information found to a file is saved to the specified file
-ok COMMAND {}\; User confirmation is required each time the found file executes commands represented by command
-exec COMMAND {}\; Commands are executed for each file that is found, but no user acknowledgement is required
Note: Some commands cannot receive too long parameters at this time the execution of the command fails.
Workaround, Find | Xargs COMMAND
M20 Preview notes Finishing--regular expressions and find