1 Tools for extracting text
File content: Less and cat
File interception: Head and tail
Extract by column: Cut
Extract by keyword: grep
1.2 File View
File View command:
Cat,tac,rev
Cat [OPTION] ... [FILE] ...
-E: Display line terminator $
-N: Numbering each line displayed
-A: Show all control characters
-B: Non-empty line number
-S: Compress consecutive blank lines into a row
Tac
Rev
1.3 Page View the contents of a file
MORE: Paging through files
More [OPTIONS ...] FILE ...
-D: Show page flipping and exit tips
Less: A page-by-page view of a file or stdin output
The commands that are useful for viewing are:
/Text Search text
n/n jumps to the next or previous match
Less command is a pager used by the man command
1.4 Extract text cut and merge files by column paste
Cut [OPTION] ... [FILE] ...
-D DELIMITER: Indicates delimiter, default tab
-F Fileds:
#: Section # Fields
#,#[,#]: Discrete multiple fields, such as 1,3,6
#-#: Multiple consecutive fields, such as 1-6
Mixed use: 1-3,7
-C cut by character
--output-delimiter=string specifying the output delimiter
1.5 Tools for analyzing text
Text data statistics: WC
Collating text: Sort
Compare Files: diff and patch
1.6uniq
Uniq command: Remove duplicate rows from the input before and after a phase
Uniq [OPTION] ... [FILE] ...
-C: Shows the number of occurrences per line
-D: Show only rows that have been repeated
-U: Show only rows that have not been duplicated
Note: Repeat for continuous and exact same side
Commonly used with the sort command:
Sort Userlist.txt | Uniq-c
2Linux Text Processing Three Musketeers
grep: Text filter (Pattern: pattern) Tool
grep, Egrep, fgrep (regular expression search not supported)
Sed:stream Editor, text editing tools
Implementation Gawk on Awk:linux, Text Report Generator
Regular Expression
REGEXP: A pattern written by a class of special characters and text characters in which some characters (metacharacters) do not represent character literal meaning, while the function of a control or a wildcard
program support: Grep,sed,awk,vim, Less,nginx, Varnish, etc.
divided into two categories:
Basic Regular expression: BRE
Extended Regular expression: ERE
Grep-e, egrep
Regular expression engine:
using different algorithms, check the software module that handles regular expressions
PCRE (Perl Compatible Regular Expressions)
metacharacters: Character matching, number of matches, position anchoring, grouping
Man 7 regex
character match:
. Match any single character
[] Matches any single character in the specified range
[^] matches any single character outside the specified range
[: alnum:] The letters and numbers
[: Alpha:] represent any English uppercase and lowercase characters, i.e. A-Z, a-Z
[: lower:] lowercase letters [: Upper:] Uppercase Letters
[: blank:] white space characters (spaces and tabs)
[: space:] horizontal and vertical whitespace characters (wider than [: blank:])
[: Cntrl:] non-printable control characters (backspace, delete, alarm ...
[:d igit:] decimal digit [: xdigit:] Hexadecimal number
[: graph:] printable non-whitespace character
[:p rint:] printable character
[:p unct:] Punctuation
Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character
- Matches the preceding character any time, including 0 times
Greedy mode: Match as long as possible
. Any character of any length
\? Match its preceding character 0 or 1 times
+ match the characters in front of it at least 1 times
{n} matches the preceding character n times
{M,n} matches the preceding character at least m times, up to N times
{, n} matches the preceding character up to n times
{N,} matches the preceding character at least n times
Position anchoring: positioning where it appears
^ Beginning of the line anchor, for the leftmost mode
$ line End anchor for the right side of the pattern
^pattern$ for pattern matching entire row
^$ Empty Line
^[[:space:] "$ blank Line
\< or \b The first anchor for the left side of the word pattern
\> or \b ending anchor; for the right side of the word pattern
\<pattern\> Match Whole Word
Grouping: () binds one or more characters together as a whole, such as: (Root) +
The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...
\1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left
Example: (string1+ (string2))
\1:string1+ (string2)
\2:string2
Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself
Or: |
Example: A|b:a or B c|cat:c or cat (c|c) At:cat or cat
2 vim
Vim: A pattern editor
Keystroke behavior is dependent on the "mode" of vim?
Three main modes:
Command (Normal) mode: default mode, move cursor, cut/paste text
Insert (insert) or edit mode: Modify text
Extended commands (Extended command) mode: Save, exit, etc.
ESC key? Exiting the current mode
ESC key? ESC key? Always return to command mode
3 Tools for handling text sed
Usage:
sed [option] ... ' Script ' Inputfile ...
Common options:
-N: does not output mode space content to the screen, i.e. does not print automatically
-e: Multi-point editing
-f:/path/script_file: Reading the edit script from the specified file
-R: Supports the use of extended regular expressions
-i.bak: Backing up files and editing them in place
Script
' Address command '
4 awk language
Basic format: awk [options] ' program ' File ...
program:pattern{action statements;..}
Pattern and action:
The pattern section determines when an action statement triggers and triggers an event
Begin,end
Action statements the data and places it within {} to indicate
Print, printf
separators, fields, and records
When Awk executes, a delimiter-delimited field (field) tag $1,$2: $n is called a domain identity. $ $ $ For all domains, note: and Shell variable $ characters have different meanings
Each line of the file is called a record
Omit action, default to print $
Print format: Print item1, item2, ...
Points:
(1) Comma delimiter
(2) Each item of the output can be a string, or it can be a numeric value; An expression of a field, variable, or awk of the current record
(3) If item is omitted, it is equivalent to print $
Example:
awk ' {print ' Hello,awk '} '
Awk–f: ' {print} '/etc/passwd
Awk–f: ' {print ' Wang '} '/etc/passwd
Awk–f: ' {print '} '/etc/passwd
Awk–f: ' {print $} '/etc/passwd
Awk–f: ' {print $ ' \ t ' $/etc/passwd} '
Tail–3/etc/fstab |awk ' {print $2,$4} '
Linux Operations Foundation Text Processing