August 4, the main learning contents are as follows:
First, the tool of extracting text: Less,cat,head,tail,cut
II. Tools for analyzing text: Wc,sort,diff,patch
Second, grep and regular expressions
Third, Egrep extended regular expression
I. Tools for extracting text
1) File View command:
Cat [OPTION] ... [FILE] ...
-E: Display line terminator $
-N: Numbering each line displayed
-A: Show all control characters
-B: Non-empty line number
-S: Compress consecutive blank lines into a row
Tac
Features are the same as cat, displaying the contents in reverse of cat
2) Pagination View tool
MORE: Paging through files
More [OPTIONS ...] FILE ...
-D: Show page flipping and exit tips
Less: A page-by-page view of a file or stdin output
The commands that are useful for viewing are:
/Text Search text
n/n jumps to the next or previous match
Less command is a pager used by the man command
3) display text before or after content
Head
Head [OPTION] ... [FILE] ...
-C #: Specify get before # bytes
-N #: Specifies the first # line to get
-#: Specify the number of rows (same as-n#)
Tail
tail [OPTION] ... [FILE] ...
-C #: Specifies the # bytes after fetching
-N #: Specifies the # line after fetch
-#: Specify the number of rows
-F: Trace display File New additions, common log monitoring
4) Extract text cut and merge files by column paste
Cut [OPTION] ... [FILE] ...
-D DELIMITER: Indicates delimiter, default tab (-D and delimiter can have no spaces)
-F Fileds:
#: Section # Fields
#,#[,#]: Discrete multiple fields, such as 1,3,6
#-#: Multiple consecutive fields, such as 1-6
Mixed use: 1-3,7
-C by character cut--output-delimiter=string Specify output delimiter
Cut-d:-f1/etc/passwd
cat/etc/passwd | Cut-d:-f7
Cut-c2-5/usr/share/dict/words
Paste merge two files with row number columns to one line
-D delimiter: Specify Delimiter, default tab
-S: All rows are composited on a single line display
Paste F1 F2
Paste-s F1 F2
Ii. Text Analysis Tools
1) Text data statistics
WC: Count lines, total words, total characters (and total bytes), can run on data in a file or stdin
WC Story.txt
237 1901 Story.txt
Number of characters in the line number of digits
-L count of rows only
-W counts only the total number of words
-c counts only bytes total
-m count number characters total
2) Text sorting
Sort: Display the sorted text in stdout (default by character size) does not change the original file
Sort [Options] file (s)
-R performs reverse direction (top to bottom) finishing
-N Execution by numeric sizing
The-f option ignores character capitalization in the (fold) string
-u option (unique) Delete duplicate rows in output
The-t C option uses C as the field delimiter
The-k x option can be used multiple times by using the C character Delimited X column collation
3) In addition to weight
Uniq: Remove duplicate front and back rows from input
Uniq [OPTION] ... [FILE] ...
-C: Shows the number of occurrences per line
-D: Show only rows that have been repeated
-U: Show only rows that have not been duplicated: continuous and exact duplicates
Commonly used with the sort command: Sort Userlist.txt | Uniq-c
4) Compare files
diff: Compare the differences between two files per line,
diff [OPTION] ... [OLDFILE] [NEWFILE] Shows differences and measures compared to oldfile and NEWFILE
Diff Foo.conf-broken Foo.conf-works
5C5 (Note that there is a difference in line 5th)
< Use_widgets = No
---
> use_widgets = yes
-U Displays the context of the changed row, default 3 rows (for patch files)
Diff/path/to/oldfile/path/to/newfile >/path/to/patch_file
Diff can also be used to compare two different directories, showing the difference between each of these files
Patch: Copy Changes to file (patch to file)
Patch-i/path/to/patch_file/path/to/oldfile
Patch/path/to/oldfile </path/to/patch_file
-B option to automatically back up changed files
Iii. grep and regular expressions
1) Three Musketeers of text processing on Linux
grep: Text filter (Pattern: pattern) Tool
grep, Egrep (supports extended regular expressions), Fgrep (regular expression search not supported)
Sed:stream Editor, text editing tools
Implementation Gawk on Awk:linux, Text Report Generator
2) Grep:global search REgular expression and Print out of the line
Function:: Text Search tool, according to user-specified "mode" to match the target text line by row to check; print matching lines; pattern: Filter conditions written by regular expression characters and text characters
grep [OPTIONS] PATTERN [FILE ...]
grep root/etc/passwd
grep [OPTIONS] PATTERN [FILE ...] grep root/etc/passwd
Command options:
--color=auto: Coloring the text to match to a display
-V: Shows rows that cannot be matched to pattern
-I: Ignore character case
-N: Show matching line numbers
-C: Count the number of matching rows
-O: Show only the matching string
-Q: Silent mode, do not output any information (with echo $?). Can be used to write scripts)
-a #:after, showing the following # lines at the same time
-B #: Before, Front # line
-c #:context, front and back # lines
-E: Implementing a logical or relationship between multiple options
Grep–e ' Cat '-e ' dog ' file
-W: Entire line matches Whole word
-E: Regular expression using ere extension
3) Regular Expressions
REGEXP: A pattern written by a class of special characters and text characters, in which some characters (metacharacters) do not represent literal meanings, but are functions that represent control or a wildcard
Program support: grep, VIM, Less,nginx, etc.
Divided into two categories: basic Regular Expression: BRE extended Regular expression: ERE
Meta-character classification: character matching, number of matches, position anchoring, grouping
4) Basic Regular expressions
Character matching
. : Matches any single character
[]: matches any single character within the specified range
[^]: matches any single character outside the specified range
[:d Igit:], [: Lower:], [: Upper:], [: Alpha:], [: Alnum:], [:p UNCT:], [: Space:]
Number of matches (the default works in greedy mode: match as long as possible)
Used after the number of characters to be specified, to specify the number of occurrences of the preceding character
*: matches the preceding character any time, including 0 times
. *: Any character of any length
\?: match its preceding character 0 or 1 times
\+: Matches the preceding characters at least 1 times
\{m\}: Matches the preceding character m times
\{m,n\}: Matches the preceding character at least m times, up to N times
\{0,n\}: Matches the preceding character up to n times
\{m,\}: Matches the preceding character at least m times
Position anchoring: positioning where it appears
^: The beginning of the line is anchored to the leftmost side of the pattern (^root starts with root)
$: End of line anchor for the right-most side of the pattern (root$ line with Root)
^pattern$: Used for pattern matching of entire rows (only this pattern for the entire row)
^$: Blank line (white space character not included)
^[[:space:]]*$: Blank lines (blank lines or lines that contain white space characters)
Word: A continuous character (string) consisting of non-special characters (including numbers, without special characters)
\< or \b: The first anchor of the word, used for the left side of the word pattern
\> or \b: the ending anchor; for the right side of the word pattern
\<pattern\>: Match Whole word
Grouping and referencing
Group: \ (\): Binds one or more characters together as a whole, such as: \ (root\) \+
Note: The contents of the pattern matching in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...
\1: The character that matches the pattern between the first opening parenthesis and the matching closing parenthesis, starting from the left
Example: \ (string1\+\ (string2\) *\)
\1:string1\+\ (string2\) *
\2:string2
Back reference: References the pattern in the preceding grouping brackets to match the character (not the pattern itself)
Four, EGRP and extended regular expressions
1) egrep
Egrep = Grep-e
Egrep [OPTIONS] PATTERN [FILE ...]
2) Extended Regular expression
Character matching (same as basic regular expression)
Number of Matches
*: matches the preceding character any time
?: 0 or 1 times
+:1 Times or more
{m}: matches M-Times
{M,n}: At least m, up to N times
{0,n} {m,}
Position anchoring (same as basic regular expression)
Group
()
Back reference: \1, \2, .....
Or
A|b
C|cat:c or Cat
(c|c) At:cat or cat
3) Fgrep
Regular expression metacharacters are not supported: using Fgrep is better when you don't need to use metacharacters to write patterns
This article is from the "Laugh Monkey" blog, please be sure to keep this source http://xiaomonky.blog.51cto.com/11869371/1835347
DAY7: Text-processing tools and regular expressions