Text Processing Tools

Source: Internet
Author: User
Tags control characters

File View commands: Cat, Tac,rev

Cat [OPTION] ... [FILE] ...

-E: Display line terminator $

-N: Numbering each line displayed

-A: Show all control characters

-B: Non-empty line number

-S: Compress consecutive blank lines into a row

Example: View the inside of a/etc/issue file and display the line number

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/85/7F/wKiom1elrTDBaZEPAAAKmDapNjQ591.png "title=" 01.png "alt=" Wkiom1elrtdbazepaaakmdapnjq591.png "/>

View the contents of a file on a page

MORE: Paging through files

More [OPTIONS ...] FILE ...

-D: Show page flipping and exit tips

Less: A page-by-page view of a file or stdin output

The commands that are useful for viewing are:

/Text Search text

n/n jumps to the next or previous match

Less command is a pager used by the man command

Show text before or after content

Head

Head [OPTION] ... [FILE] ...

-C #: Specify get before # bytes

-N #: Specifies the first # line to get

-#: Specify the number of rows

Tail

tail [OPTION] ... [FILE] ...

-C #: Specifies the # bytes after fetching

-N #: Specifies the # line after fetch

-#:

-F: Trace display File New additions, common log monitoring

Example: Show the first two lines of the/etc/passwd file, and the last two lines

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/85/7F/wKioL1elqkKSZiqvAAAX_VTxDZQ169.png "title=" 1.png " alt= "Wkiol1elqkksziqvaaax_vtxdzq169.png"/>

Extract text cut and merge files by column paste

Cut [OPTION] ... [FILE] ...

-D DELIMITER: Indicates delimiter, default tab

-F Fileds:

#: Section # Fields

#,#[,#]: Discrete multiple fields, such as 1,3,6

#-#: Multiple consecutive fields, such as 1-6

Mixed use: 1-3,7

-C cut by character

--output-delimiter=string specifying the output delimiter

Display a specified column of a file or stdin data

cut-d:-f1/etc/passwd

Cat/etc/passwd|cut-d:-f7

Cut-c2-5/usr/share/dict/words

Paste merge two files with row number columns to one line

Paste [OPTION] ... [FILE] ...

-D delimiter: Specify Delimiter, default tab

-S: All rows are composited on a single line display

Paste F1 F2

Paste-s F1 F2

Example: Remove the user name, UID, default shell in the/etc/passwd file

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/85/7F/wKioL1elq7ag7FrSAAAPkKspCNU959.png "title=" 2.png " alt= "Wkiol1elq7ag7frsaaapkkspcnu959.png"/>

Collecting text Statistics WC

Count words, total number of lines, total number of bytes, and total number of characters

Can run on data in a file or stdin

$wcstory. txt

392371901story.txt

Number of characters in the line number of digits

Use-l to count only the number of rows

Use-W to count only the total number of words

Use-C to count only the total number of bytes

Use-M to count only the number of characters

Example: Displays the number of lines, words, and characters of the/ETC/PASSW file

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/85/7F/wKiom1elrK2D0e22AAAZYwqCQ7c867.png "title=" 3.png " alt= "Wkiom1elrk2d0e22aaazywqcq7c867.png"/>

Sorting text sort

Display the sorted text in stdout, without changing the original file

$sort [Options]file (s)

Common options

-R performs reverse direction (top to bottom) finishing

-N Execution by number size

The-f option ignores character capitalization in the (fold) string

-u option (unique) Delete duplicate rows in output

The-t C option uses C as the field delimiter

The-k x option can be used multiple times by using the C character delimited x column collation

Example: Sort the UID in a/etc/passwd file from large to small

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/85/7F/wKiom1elrnjTqNd5AAAcQf9m30A341.png "title=" 4.png " alt= "Wkiom1elrnjtqnd5aaacqf9m30a341.png"/>

Uniq

Uniq command: Remove duplicate front and back rows from input

Uniq[option] ... [FILE] ...

-C: Shows the number of repetitions of each line;

-D: Show only the rows that have been repeated;

-U: Displays only rows that have not been duplicated;

Continuous and exactly the same side is repeated

Commonly used with the sort command:

Sort 1.txt | Uniq-c

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/85/7F/wKioL1elr96yUzmzAAAVclly3YI319.png "title=" 5.png " alt= "Wkiol1elr96yuzmzaaavclly3yi319.png"/>

Text processing on Linux grep of the Three Musketeers

grep: Text filter (Pattern: pattern) tool;

grep, Egrep, fgrep (regular expression search not supported)

Grep:global Search RegularExpression and Print out of the line.

Function: Text Search tool, according to user-specified "mode" to match the target text line by row to check; print matching lines;

Patterns: Filter conditions written by regular expression characters and text characters

grep [OPTIONS] PATTERN [FILE ...]

grep root/etc/passwd

grep "$USER"/etc/passwd

grep ' $USER '/etc/passwd

grep ' WhoAmI '/etc/passwd

grep command Options

--color=auto: Coloring the text to match to the display;

-V: Displays rows that cannot be matched to pattern;

-I: Ignore character case

-N: Show matching line numbers

-C: Count the number of matching rows

-O: Displays only the matching string;

-Q: Silent mode, does not output any information

-A #:after, after # line

-B #: Before, Front # line

-c #:context, front and back # lines

-E: Implementing a logical or relationship between multiple options

Grep–e ' Cat '-e ' dog ' file

-W: Entire line matches Whole word

-E: Use ere

Character Matching:

. : matches any single character;

[]: matches any single character within the specified range

[^]: matches any single character outside the specified range

[:d Igit:], [: Lower:], [: Upper:], [: Alpha:], [: Alnum:], [:p UNCT:], [: Space:]

Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character

*: matches the preceding character any time, including 0 times

Greedy mode: Match as long as possible

. *: Any character of any length

\?: match its preceding character 0 or 1 times

\+: Matches the preceding characters at least 1 times

\{m\}: Matches the preceding character m times

\{m,n\}: Matches the preceding character at least m times, up to N times

\{,n\}: Matches the preceding character up to n times

\{m,\}: Matches the preceding character at least m times

Example: Find the line in the/etc/rc.d/init.d/functions file that follows a word (including an underscore) followed by a parenthesis

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/85/7F/wKioL1eltJ-DH4tRAAAmfpRESuU474.png "title=" 6.png " alt= "Wkiol1eltj-dh4traaamfpresuu474.png"/>

This article from "Progress a little every day" blog, declined reprint!

Text Processing Tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.