The seventh Chapter Linux Text Processing tool

Source: Internet
Author: User
Tags control characters diff stdin

Tools for text processing tools to extract text

File content: Less and cat

File interception: Head and tail

Extract by column: Cut

Extract by keyword: grep

File View

File View commands: Cat, Tac,rev

Cat [OPTION] ... [FILE] ...

-E: Display line terminator $

-N: Numbering each line displayed

-A: Show all control characters

-B: Non-empty line number

-S: Compress consecutive blank lines into a row

TAC: Reverse display of rows

Rev: Reverse Display of columns

NL: Add line number, blank line no line number.

View the contents of a file on a page

MORE: Paging through files

More [OPTIONS ...] FILE ...

-D: Show page flipping and exit tips

Less: A page-by-page view of a file or stdin output

The commands that are useful for viewing are:

/Text Search text

n/n jumps to the next or previous match

Less command is a pager used by the man command

Show text before or after content

Head [OPTION] ... [FILE] ...

-C #: Specify get before # bytes

-N #: Specifies the first # line to get

-#: Specify the number of rows

tail [OPTION] ... [FILE] ...

-C #: Specifies the # bytes after fetching

-N #: Specifies the # line after fetch

-#:

-F: Trace display file fd new additions, common log monitoring

Equivalent to--follow=descriptor

-F: Trace file name, equivalent to-follow=name--retry (the file is not prompted, small F not prompt)

Tailf similar to Tail–f, does not access files when files are not growing (save resources)

[[email protected]/dev] #tail-f/app/a  -n0 &

& Background Run real-time observation of the contents of a file changes,-n0 if a new line appears, the old line does not.

FG and BG: foreground and background running programs

Extract text cut and merge files by column paste

Cut [OPTION] ... [FILE] ...

-D DELIMITER: Indicates the delimiter, the default tab key as the delimiter.

-F Fileds:

#: Section # Fields

#,#[,#]: Discrete multiple fields, such as 1,3,6

#-#: Multiple consecutive fields, such as 1-6

Mixed use: 1-3,7

-C cut by character

--output-delimiter=string specifying the output delimiter

[[Email protected] ~] #cut-D:-f1,3 /etc/passwdroot:0Bin:1daemon:  2

To: Remove the user name and UID from the delimiter.

[[Email protected] ~] #cut-D:-f1,3 --output-delimiter=*  /etc/passwdroot*0Bin *1Daemon*2ADM*3LP*4

Specifies that the output delimiter is *

Cut and paste

Display a specified column of a file or stdin data

cut-d:-f1/etc/passwd

Cat/etc/passwd|cut-d:-f7

Cut-c2-5/usr/share/dict/words

Paste merge two files with row number columns to one line

Paste [OPTION] ... [FILE] ...

-D delimiter: Specify Delimiter, default tab

-S: All rows are composited on a single line display

Paste F1 F2

Paste-s F1 F2

Tools for analyzing text

Text data statistics: WC

Collating text: Sort

Compare Files: diff and patch

Collecting text Statistics WC

Count the total number of words, total number of rows, total number of bytes, and total number of characters (one character in Linux three bytes, one letter one byte)

Characters and bytes are two concepts, bytes are the amount of space occupied on disk

Can run on data in a file or stdin

WC Story.txt

237 1901 Story.txt

Number of rows number of bytes

Common options

-L count of rows only

-W counts only the total number of words

-c counts only bytes total

-m count number characters total

-L Displays the length of the longest line in the file

Sorting text sort

Display the sorted text in stdout, without changing the original file

Sort[options]file (s)

Common options

-R performs reverse direction (top to bottom) finishing

-N Execution by number size (small to large)

The-f option ignores character capitalization in the (fold) string

-u option (unique) Delete duplicate rows in output

The-t C option uses C as the field delimiter

The-k x option can be used multiple times by using the C character delimited x column collation

Uniq

Uniq command: Remove duplicate rows from the input before and after a phase

Uniq[option] ... [FILE] ...

-C: Shows the number of occurrences per line

-D: Show only rows that have been repeated

-U: Show only rows that have not been duplicated

Continuous and exactly the same side is repeated

Commonly used with the sort command:

Sort Userlist.txt | Uniq-c

Compare files

Compare the differences between two files

diff: Compare the contents of two files and do not compare the properties of two files

Difffoo.conffoo2.conf

5c5

< Use_widgets=no

---

> Use_widgets=yes

Note that the 5th line has a difference (change)

Copy change patch to File

The output of the diff command is saved in a file called "patches"

Use the-u option to output the "unified (Unified)" diff format file, which is best for patch files

Patch replication changes in other files (use caution)

The-B option is used to automatically back up changed files

$diff-U foo.conf foo2.conf>foo.patch

$patch-B foo.conf Foo.patch

[[email protected]/app] #cp/etc/fstab f3[[email protected]/app] #cp/etc/fstab f4[[email protected]/app] #echo a >>F4[[email protected]/app] #diff f3 f4-u >F3f4.diff[[email protected]/App] #lltotal A-rw-r--r--.1Root root595Nov +  the: -f3-rw-r--r--.1Root root392Nov +  the: theF3f4.diff-rw-r--r--.1Root root597Nov +  the: -F4[[email protected]/app] #rm-f f4[[email Protected]/app] #patch-b f3 f3f4.diffpatching file F3[[email protected]/App] #lltotal A-rw-r--r--.1Root root597Nov +  the: thef3-rw-r--r--.1Root root392Nov +  the: theF3f4.diff-rw-r--r--.1Root root595Nov +  the: -F3.orig

Create F3, F4 two file different results output to file F3f4.diff Delete f4, use patch to retrieve F4,

But the recovered file is actually F3 into a f4,f3 actual backup of the F3.orig, through the file size can also be compared.

The seventh Chapter Linux Text Processing tool

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.