Text Processing tool for learning notes

Source: Internet
Author: User
Tags control characters diff printable characters word wrap expression engine

Linux has a lot of text processing tools, this article will introduce several more commonly used text processing tools, such as text viewing tools: Cat and less, text Capture tool: Head and tail, extract text by column tool: Cut, query Text tool by keyword: grep, The use of various tools is described in detail below.

1. Text View command cat

Usage: cat [OPTION] ... [FILE] ...

The cat command displays all text content at once, or it can be displayed with multiple files, but not paginated. Cat is often used in conjunction with redirects for simple text editing functions. Cat can implement different functions with some options, such as the number of lines with-n can display text,-S can compress continuous blank lines, the following options:

-n,--number: Numbering All lines of the output

-b,--number-nonblank: A non-blank line number for the output

-S,--squeeze-blank: Compresses empty rows, does not output extra blank lines

-v,--show-nonprinting: You can display special carriage returns in text, such as a carriage return under win (^M)

-e,--show-ends: Displays the $ character at the end of each line

-t,--show-tabs: Show Tab tab (^I)

-a,--show-all: Show all, equivalent to-vet collection

The TAC command, contrary to the cat command, can display text content in reverse order, that is, the last line of text is displayed to the first line, and the first line of the text is displayed to the last row. The Rev command can display the text contents of each line in reverse order.

Application Examples:

[email protected]/]# cat userlist user1:x:3000:3000::/home/user1:/bin/cshuser2:x:3001:3001::/home/user2:/bin/ Cshuser3:x:3002:3002::/home/user3:/bin/csh[[email protected]/]# TAC userlist user3:x:3002:3002::/home/user3:/bin/ Cshuser2:x:3001:3001::/home/user2:/bin/cshuser1:x:3000:3000::/home/user1:/bin/csh[[email Protected]/]# rev UserList hsc/nib/:1resu/emoh/::0003:0003:x:1resuhsc/nib/:2resu/emoh/::1003:1003:x:2resuhsc/nib/:3resu/emoh/: : 2003:2003:x:3resu

2, pagination display text content, more and less

The more command can display text content in pagination, and the bottom shows the percentage of text that has been looked at, but only page down, cannot page back, follow the-D option to display the help information at the bottom, and the less command also displays the text content in pagination. But less supports the use of the PgUp and PgDn keys to page up and down the page, and lower than more features, support more search methods, not only can search down, can also search upward.

More command options are detailed:

-D: Show friendly tips at the bottom

-F: When calculating the number of rows, the actual number of rows is calculated, not the number of lines after the word wrap, that is, the long line is not broken to the next line

-S: Compresses multiple contiguous blank rows into a single line display

-U: Underline not displayed

-P: Do not scroll, clear the screen and display text

-num: Specifies that the number of rows displayed per screen is NUM

+num: Starting from NUM line of file

+/string: Starting from a file location that matches a string of search strings

More Commands Common operations:

Enter key: First n rows, need to be defined, default to 1 rows

CTRL+F or SPACEBAR: Scroll down one screen

=: Outputs the line number of the current line

: F: Output file name and current line number

! Command: About figurines shell and execute commands

Q: Exit

The less command option is detailed:

-B: Set buffer size

-E: Automatically leave when the file display ends

-F: Forcing special files to open

-G: Only the key element of the last search is labeled

-I: ignoring case when searching

-M: Displays the percentage of text

-N: Displays the line number of each line

-S: Compresses multiple contiguous blank rows into a single line display

Less command common operations:

H: Bring up Help information

PgUp: Turn up one screen

PUDN or SPACEBAR: Turn down one screen

D: Flip down half screen

U: Flip up half screen

G: Back to the beginning of the file

G: Back to the tail of the file

J or enter: Turn down one line

Y or K: Flip up one line

Under less command search mode:

\pattern: Search Forward

? Pattern: Searching backwards

N: Find the next match in the search direction

N: Find the next match in the opposite direction of the search

Less command to browse multiple files at the same time:

Open Mode 1: Pass multiple parameters to less to open multiple files: less file1 file2 ...

Open Mode 2: When you are browsing a file, use: E to open another file, as follows:

Less file1

: E file2

When you open more than one file, you can switch it a bit:

: P: Browse previous file

: N: Browse next file


3. Head command

Usage: head [OPTION] ... [FILE] ...

The head command displays the first few lines of a file, displaying the first 10 rows by default, and you can manually specify the number of rows or words to display.

Common options are:

-C: Specifies the number of characters to display

-N: Specifies the number of rows to display

-num: Specifies that NUM lines are displayed


4. Tail command

Usage: tail [OPTION] ... [FILE] ...

The tail command displays the last lines of a file, showing the last 10 rows by default, or manually specifying the number of rows or words to display, often for tracking of logs.

Common options are:

-C: Specifies the number of characters to display

-N: Specify the number of rows to display

-num: Specifies that the last NUM row is displayed

-F: Look at the end of the file, do not exit, followed by the display of new lines, often used to monitor

Example: Track a log file, show only new content, and not show what's already in it

]# tail-n 0-f log.file& # "&" means to put this program in the background execution


5. Cut command

Cut [Options] ... [File] ...

-C: Displayed by the number of columns specified, such as specifying only the contents of the columns to the first column

-D: Use specified delimiter instead of tab as area demarcation

-F: Specifies the contents of the column of the first field to display

The way you specify a range is:

n the nth Byte, character, or field from the beginning of the 1th number

N-All characters, bytes, or fields from the nth start to the end of the line

N-m all characters, Bytes, or fields from nth start to M (including the first m)

-m all characters, Bytes, or fields from the beginning of the 1th to the first m (including the first m)

Application Exercises:

(1) Interception of disk utilization under the DF command

[[email protected] ~]# DF | Tr-s ' [: space:] ' | Cut-d '-f5

(2) Identify the permissions of/tmp and display them digitally

[[Email protected]~]# stat/tmp/| head-4 | Tail-1 | Cut-d ' ('-F 2 | cut-d '/'-F1


6, multiple file content merge display

Both the cat command and the Paste command can implement multiple file merge displays, except that using the Cat command is a vertical merge, merging and displaying two different files, and the Paste command is a horizontal merge, merging two different files and the first column showing the first file, the second column showing the second file , and you can use the-D option to specify the delimiters between different files, the Paste command followed the-S option to display each file in one line of output.

Application Examples:

[[email protected] /]# paste  passwdlist userlist user1:user1     user1:x:3000:3000::/home/user1:/bin/cshuser2:user2    user2:x:3001:3001::/home /user2:/bin/cshuser3:user3    user3:x:3002:3002::/home/user3:/bin/cshuser4:user4     USER4:X:3003:3003::/HOME/USER4:/BIN/CSHUSER5:USER5    USER5:X:3004:3004: :/home/user5:/bin/cshuser6:user6    user6:x:3005:3005::/home/user6:/bin/cshuser7:user7     user7:x:3006:3006::/home/user7:/bin/cshuser8:user8    user8:x : 3007:3007::/home/user8:/bin/cshuser9:user9    user9:x:3008:3008::/home/user9:/bin/ cshuser10:user10    user10:x:3009:3009::/home/user10:/bin/csh[[email protected]  /]# paste -d @  passwdlist userlist user1:[email protected]:x : 3000:3000::/home/user1:/bin/cshuser2:[email protected]:x:3001:3001::/home/user2:/bin/cshuser3:[email protected]:x:3002:3002::/home/user3:/ BIN/CSHUSER4:[EMAIL PROTECTED]:X:3003:3003::/HOME/USER4:/BIN/CSHUSER5:[EMAIL PROTECTED]:X:3004:3004: :/home/user5:/bin/cshuser6:[email protected]:x:3005:3005::/home/user6:/bin/cshuser7:[email protected ]:x:3006:3006::/home/user7:/bin/cshuser8:[email protected]:x:3007:3007::/home/user8:/bin/cshuser9:[email  protected]:x:3008:3008::/home/user9:/bin/cshuser10:[email protected]:x:3009:3009::/home/user10:/bin /csh[[email protected] /]# paste -s  passwdlist userlist user1:user1     user2:user2    user3:user3    user4:user4     user5:user5    user6:user6    user7:user7     USER8:USER8    USER9:USER9    USER10: User10user1:x:3000:3000::/home/useR1:/bin/csh    user2:x:3001:3001::/home/user2:/bin/csh    user3:x :3002:3002::/home/user3:/bin/csh    user4:x:3003:3003::/home/user4:/bin/csh     user5:x:3004:3004::/home/user5:/bin/csh    user6:x:3005:3005::/home/user6:/bin /CSH    USER7:X:3006:3006::/HOME/USER7:/BIN/CSH    USER8:X:3007:3007: :/home/user8:/bin/csh    user9:x:3008:3008::/home/user9:/bin/csh     User10:x:3009:3009::/home/user10:/bin/csh


7. Text Data Statistics Command WC

The WC command can count the number of rows, words, and bytes that are contained in a file, or you can specify what you want to count separately, with the following common options:

-C: Total Statistics bytes

-L: Count rows

-W: Count words

-M: Count the total number of characters


8. Sort order of columns

The sort command sorts the specified columns, which are sorted alphabetically by default, with the following common options:

-F: Ignores the case of letters

-T: Using the specified delimiter

-K: Specify the sort of column

-N: Sort in numeric order

-R: Reverse Sort

-U: Merge rows with consecutive duplicates


9. Merging adjacent line commands Uniq

The Uniq command merges adjacent and continuously repeated rows, often with the sort command, with the following common options:

-C: Shows the number of occurrences per line

-D: Show only rows that have been duplicated before merging

-U: Show only rows that are not duplicated before merging


10. diff Command and Patch command

The diff command compares the text between two files, and can output the results to a file, and if one of the two files is missing, you can use the patch command to recover another file using the file and one of the files that the comparison results output.

[[email protected] /]# diff -u passwdlist userlist > diff.txt[[ email protected] /]# cat diff.txt --- passwdlist     2016-08-07 11:00:30.609998611 +0800+++ userlist    2016-08-07  10:50:08.300009628 +0800@@ -1,10 +1,10 @@-user1:user1-user2:user2-user3:user3-user4: User4-user5:user5-user6:user6-user7:user7-user8:user8-user9:user9-user10:user10+user1:x:3000:3000::/home/user1 :/bin/csh+user2:x:3001:3001::/home/user2:/bin/csh+user3:x:3002:3002::/home/user3:/bin/csh+user4:x:3003:3003::/ Home/user4:/bin/csh+user5:x:3004:3004::/home/user5:/bin/csh+user6:x:3005:3005::/home/user6:/bin/csh+user7:x : 3006:3006::/home/user7:/bin/csh+user8:x:3007:3007::/home/user8:/bin/csh+user9:x:3008:3008::/home/user9:/bin/ Csh+user10:x:3009:3009::/home/user10:/bin/csh[[email protected] /]# patch -b userlist  diff.txt


11. grep command

Usage: grep [options] ... PATTERN [FILE] ...

grep is a text-search tool that can support regular expressions, followed by the-e option equivalent to Egrep, which supports extended regular expressions.

Common options:

-V: Shows rows that cannot be matched to pattern

-I: Ignore case

-N: Show matching line numbers

-C: Counts the number of rows that are matched, that is, a few lines of what is found

-O: Show only the matching string

-Q: Silent mode, does not output any information

-A num: Displays subsequent NUM lines for the line that matches the string

-B Num: Displays the first num line of the line that matches the string

-C Num: Displays the front and back NUM lines that match the line to which the string is located

-W: Match word

-E: Using extended regular expressions


12. Regular Expressions

A regular expression is a pattern written by a class of special characters and text characters, where some characters (metacharacters) do not represent literal meanings of characters, but are functions of control or wildcard, and regular expressions are divided into basic regular expressions (BRE) and extended regular Expressions (ERE).

Basic regular Expression meta-characters:

(1) Character matching:

.: Matches any single character

[]: matches any single character within the specified range, supports A-Z notation, representing all lowercase letters

[^]: matches any single character outside the specified range

\: Indicates escape character

[:d Igit:]: Matches all numbers

[: Lower:]: matches all lowercase letters

[: Upper:]: Matches all uppercase letters

[: Alpha:]: Matches all letters

[: Alnum:]: Matches all numbers or letters

[: Space:]: Matches all white space characters that are arranged horizontally or vertically

[: Blank:]: Matches all white space characters that are horizontally arranged

[:p UNCT:]: Matches all punctuation

[: Cntrl:]: matches all control characters

[: Graph:]: matches all printable characters, excluding spaces

[:p rint:]: Matches all printable characters, including spaces

[: Xdigit:]: matches all hexadecimal numbers

(2) Number of matches

*: Matches any of the preceding characters, including 0 times

. *: Matches any character of any length

\? : matches 0 or 1 times of its preceding character

\+: Matches its preceding character at least once

\{n\}: Matches n times of its preceding characters

\{m,n\}: Matches the preceding character at least m times, up to N times

\{,n\}: Matches the preceding character up to n times

\{n,\}: Matches the preceding character at least n times

(3), position anchoring, that is, positioning the position appears

^: Anchor at the beginning of the line for the leftmost mode

$: End of line anchor, user mode rightmost

^pattern$: Used to match entire rows

^$: Match blank line

^[:space:]*$: Blank Line

\< or \b: The first anchor of the word, used for the left side of the word pattern

\> or \b: The word is anchored to the right of the word pattern

\<pattern\> or \bpattern\b; Match whole word

(4), group matching

\ (\): Bind one or more characters together as a whole, the contents of the pattern matching in the grouping brackets are recorded in the internal variables by the regular expression engine, and the commands of these variables are: \1,\2 ... \1 means from the left, The first opening parenthesis and the character to match the pattern between the closing parenthesis.

For example: \ (string1\+\ (string2\) *\)

\1 said: string1\+\ (string2\) *

\2 says: string2

To extend the regular expression:

An extended regular expression is basically the same as a basic regular expression, except that an extended regular expression does not need to use the "\" character in a particular environment, such as a match count, {m,n} is the same as the \{m,n\} in a basic regular expression.

In addition, extended regular expression support or judgment.

For example: a|b: denotes a or b

Con (c|c) at: Indicates Concat or concat


This article is from the "Linux Operational Learning path" blog, please be sure to keep this source http://fengliang.blog.51cto.com/3453935/1835433

Text Processing tool for learning notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.