Tools for text processing tools to extract text
File content: Less and cat
File interception: Head and tail
Extract by column: Cut
Extract by keyword: grep
File View
File View commands: Cat, Tac,rev
Cat [OPTION] ... [FILE] ...
-E: Display line terminator $
-N: Numbering each line displayed
-A: Show all control characters
-B: Non-empty line number
-S: Compress consecutive blank lines into a row
TAC: Reverse display of rows
Rev: Reverse Display of columns
NL: Add line number, blank line no line number.
View the contents of a file on a page
MORE: Paging through files
More [OPTIONS ...] FILE ...
-D: Show page flipping and exit tips
Less: A page-by-page view of a file or stdin output
The commands that are useful for viewing are:
/Text Search text
n/n jumps to the next or previous match
Less command is a pager used by the man command
Show text before or after content
Head [OPTION] ... [FILE] ...
-C #: Specify get before # bytes
-N #: Specifies the first # line to get
-#: Specify the number of rows
tail [OPTION] ... [FILE] ...
-C #: Specifies the # bytes after fetching
-N #: Specifies the # line after fetch
-#:
-F: Trace display file fd new additions, common log monitoring
Equivalent to--follow=descriptor
-F: Trace file name, equivalent to-follow=name--retry (the file is not prompted, small F not prompt)
Tailf similar to Tail–f, does not access files when files are not growing (save resources)
[[email protected]/dev] #tail-f/app/a -n0 &
& Background Run real-time observation of the contents of a file changes,-n0 if a new line appears, the old line does not.
FG and BG: foreground and background running programs
Extract text cut and merge files by column paste
Cut [OPTION] ... [FILE] ...
-D DELIMITER: Indicates the delimiter, the default tab key as the delimiter.
-F Fileds:
#: Section # Fields
#,#[,#]: Discrete multiple fields, such as 1,3,6
#-#: Multiple consecutive fields, such as 1-6
Mixed use: 1-3,7
-C cut by character
--output-delimiter=string specifying the output delimiter
[[Email protected] ~] #cut-D:-f1,3 /etc/passwdroot:0Bin:1daemon: 2
To: Remove the user name and UID from the delimiter.
[[Email protected] ~] #cut-D:-f1,3 --output-delimiter=* /etc/passwdroot*0Bin *1Daemon*2ADM*3LP*4
Specifies that the output delimiter is *
Cut and paste
Display a specified column of a file or stdin data
cut-d:-f1/etc/passwd
Cat/etc/passwd|cut-d:-f7
Cut-c2-5/usr/share/dict/words
Paste merge two files with row number columns to one line
Paste [OPTION] ... [FILE] ...
-D delimiter: Specify Delimiter, default tab
-S: All rows are composited on a single line display
Paste F1 F2
Paste-s F1 F2
Tools for analyzing text
Text data statistics: WC
Collating text: Sort
Compare Files: diff and patch
Collecting text Statistics WC
Count the total number of words, total number of rows, total number of bytes, and total number of characters (one character in Linux three bytes, one letter one byte)
Characters and bytes are two concepts, bytes are the amount of space occupied on disk
Can run on data in a file or stdin
WC Story.txt
237 1901 Story.txt
Number of rows number of bytes
Common options
-L count of rows only
-W counts only the total number of words
-c counts only bytes total
-m count number characters total
-L Displays the length of the longest line in the file
Sorting text sort
Display the sorted text in stdout, without changing the original file
Sort[options]file (s)
Common options
-R performs reverse direction (top to bottom) finishing
-N Execution by number size (small to large)
The-f option ignores character capitalization in the (fold) string
-u option (unique) Delete duplicate rows in output
The-t C option uses C as the field delimiter
The-k x option can be used multiple times by using the C character delimited x column collation
Uniq
Uniq command: Remove duplicate rows from the input before and after a phase
Uniq[option] ... [FILE] ...
-C: Shows the number of occurrences per line
-D: Show only rows that have been repeated
-U: Show only rows that have not been duplicated
Continuous and exactly the same side is repeated
Commonly used with the sort command:
Sort Userlist.txt | Uniq-c
Compare files
Compare the differences between two files
diff: Compare the contents of two files and do not compare the properties of two files
Difffoo.conffoo2.conf
5c5
< Use_widgets=no
---
> Use_widgets=yes
Note that the 5th line has a difference (change)
Copy change patch to File
The output of the diff command is saved in a file called "patches"
Use the-u option to output the "unified (Unified)" diff format file, which is best for patch files
Patch replication changes in other files (use caution)
The-B option is used to automatically back up changed files
$diff-U foo.conf foo2.conf>foo.patch
$patch-B foo.conf Foo.patch
[[email protected]/app] #cp/etc/fstab f3[[email protected]/app] #cp/etc/fstab f4[[email protected]/app] #echo a >>F4[[email protected]/app] #diff f3 f4-u >F3f4.diff[[email protected]/App] #lltotal A-rw-r--r--.1Root root595Nov + the: -f3-rw-r--r--.1Root root392Nov + the: theF3f4.diff-rw-r--r--.1Root root597Nov + the: -F4[[email protected]/app] #rm-f f4[[email Protected]/app] #patch-b f3 f3f4.diffpatching file F3[[email protected]/App] #lltotal A-rw-r--r--.1Root root597Nov + the: thef3-rw-r--r--.1Root root392Nov + the: theF3f4.diff-rw-r--r--.1Root root595Nov + the: -F3.orig
Create F3, F4 two file different results output to file F3f4.diff Delete f4, use patch to retrieve F4,
But the recovered file is actually F3 into a f4,f3 actual backup of the F3.orig, through the file size can also be compared.
The seventh Chapter Linux Text Processing tool