International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Linux

Linux Basic Management-Text processing (small command combination to solve big problems)

Last Update:2017-12-28 Source: Internet

Author: User

Tags control characters diff file handling lowercase stdin expression engine egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

The processing of logs in the server is a very common work, so operation and maintenance for a variety of text tools to view, analyze, statistics, is a necessary basic work. It is necessary to learn common file handling commands such as regular expressions, grep, Egrep, and TR, sort, uniq, and so on.

1. Tool classification

File content: Less and cat

File interception: Head and tail

Extract by column: Cut
Extract by keyword: grep

2. Tool Collection

2.1, File Content View command: Cat, Tac,rev;

(1) Cat

Grammar:

Cat [OPTION] ... [FILE] ...

Options:

-E: Display line terminator $;-n: numbering each line displayed;-A: Show all controls;-B: Non-empty line number;-S: Compress successive blank lines into one row;

(2) TAC: line reverse view;

(3) Rev: column Reverse view;

2.2, paging to view the contents of the file: more, less

(1) More: paging to view files;

Grammar:

More [OPTIONS ...] FILE ...

Options:

-D: Show page turn and exit prompt;

(2) Less: A page-by-page view of the file or stdin output;

Useful commands for viewing include:/text: search text;? text: Search text; n/n: Jumps to one or the same previous match

Note: The less command is a paging device used by the man command

2.3. display text before or after content: Head, tail, TAILF

(1) Head

Grammar:

Head [OPTION] ... [FILE] ...

Options:

-C #: Specify get Pre # byte-N #: Specifies the first # line-#: Specify the number of rows

(2) tail?

Grammar:

tail [OPTION] ... [FILE] ...

Options:

-C #: Specify after # byte-N #: Specify get # line-#: Look at the next few lines;-F: Trace to show file fd new additions, common log monitoring; equivalent to--follow=descriptor. -F: Trace file name, equivalent to-follow=name--retry

(3) Tailf

Similar to Tail–f, does not access files when files are not growing;

2.4. Extract text by column cut

Grammar:

? Cut [OPTION] ... [FILE] ...

Options:

-D DELIMITER: Indicates delimiter, default Tab-f fileds:fileds #: # field; #,#[,#]: discrete fields, such as 1,3,6; #-#: Continuous multiple fields, such as 1-6; mixed: 1-3,7;-c cut by character;--output-delimiter=string specifies the output delimiter;

Example: Display a specified column of a file or stdin data;

cut-d:-f1/etc/passwd cat/etc/passwd |    cut-d:-f7 cut-c2-5/usr/share/dict/words [[email protected] ~] #cut-D:-f1,3/etc/passwd root:0 bin:1 Daemon:2

2.5. Merging files paste

Paste: Merge Two files with row number columns to one line

Grammar:

Paste [OPTION] ... [FILE] ...

Options:

-D delimiter: Specify delimiter, default with tab-s: All rows are composited on one line

Example:

Paste F1 f2 paste-s f1 f2 [[email protected] ~] #paste F1 F2 1 a 2 s 3 a 4 a 5 s 6 b c D

2.6. Text Data statistics: WC

Role:

?   Collects text statistics, including the total number of words counted, the total number of lines, the total number of bytes, and the total number of characters. You can count the data in a file or stdin.

Example:

# WC story.txt 237 1901 Story.txt Line number of words number of bytes

Common options:

? -L counts only the number of rows;? -W counts only the total number of words;? -c counts only bytes total;? -M only counts the number of characters;? -L Displays the length of the longest line in the file;

2.7. Collating text: Sort

Implement text sorting and display the sorted text in stdout without changing the original file.

Grammar:

Sort [Options] file (s)

Common options:

? -R performs the opposite direction (top to bottom) finishing;? -N execution is sorted by number size;? The-f option ignores character capitalization in the (fold) string; -u option to delete duplicate rows in the output (unique); The-t C option uses C as the field delimiter;? The-K option is sorted by using the C character-delimited X column, and can be used multiple times;

2.8, the uniqueness of processing: Uniq

Removes duplicate rows from the input before and after the previous phase.

Grammar:

? Uniq [OPTION] ... [FILE] ...

Options:

-C: Displays the number of repetitions of each line;-D: Displays only the rows that have been repeated;-U: Only the rows that have not been duplicated are displayed; continuous and identical are duplicates;

The Uniq command is often used with the sort command: Sort Userlist.txt | Uniq-c

2.9. Compare Files: diff

Compare the differences between two files:

diff foo.conf foo2.conf 5c5 < use_widgets = no---> use_widgets = yes/Note 5th line There is a difference (change)/

? diff: The output of the command is stored in a file called "Patch";
? Use the-u option to output the "unified (Unified)" diff format file, which is best for patch files.

2.10. Patch

Patch: Copy changes made in other files (use caution).

Options:

-B option to automatically back up changed files

Example:

$ diff-u foo.conf foo2.conf > foo.patch;$ patch-b foo.conf foo.patch/Back up foo.conf files before use/;

3, Text Three Musketeers--grep command

Linux Text Processing Three musketeers: grep, sed, awk;

grep: Text filter (Pattern: pattern) tool; grep, Egrep, fgrep (regular expression search is not supported). Sed:stream Editor, text editing tools. Implementation Gawk on Awk:linux, Text Report Generator.

Grep:global search REgular expression and Print out of the line.

Function: The text Search tool matches the target text line by row according to the user-specified "mode", and prints the matching line .

Pattern: Filter conditions written by regular expression metacharacters and text characters.

Grammar:

grep [OPTIONS] PATTERN [FILE ...]

Example:

grep root/etc/passwdgrep "$USER"/etc/passwdgrep ' $USER '/etc/passwdgrep ' WhoAmI '/etc/passwd

Command options:

--color=auto: The color of the text to match to the display.    -V: Displays rows that are not matched by pattern.    -I: Ignore character case.    -N: Displays the matching line number.    -C: Counts the number of rows to match.    -O: Displays only the string that matches.        -Q: Silent mode, does not output any information.    -A #: After, # lines.    -B #: Before, front # line.        -C #: Context, before and after each # line.        -E: Implements a logical or relationship between multiple options.    Grep–e ' Cat '-e ' dog ' file-w: matches the entire word.    -E: Use Ere to support extended regular expressions: -F: Equivalent to Fgrep and does not support regular expressions.

4. Regular Expressions (REGEX)

REGEXP: A pattern written by a class of special characters and text characters in which some characters (metacharacters) do not represent literal meanings of characters, but are a function of control or wildcard.
Program support: Grep,sed,awk,vim, less,nginx,varnish, etc.

Regular expression classification:

Basic regular expression: BRE; extended regular expression: ERE;

Regular expression engine:

Using different algorithms, check the software module for processing regular expression; PCRE (Perl Compatible Regular Expressions)

Meta-character classification: character matching, number of matches, position anchoring, grouping;

Regular expression man helps get: Man 7 regex

4.1. Basic regular Expression meta-character

4.1.1, character matching

. matches any single character [] matches any single character in the specified range [^] matches any single character outside the specified range [: Alnum:] Letters and Numbers [: Alpha:] for any English uppercase and lowercase characters, i.e. A-Z, a-z[:lower:] lowercase letters [: Upper:] Uppercase [: blank:] white space characters (spaces and tabs) [: space:] Horizontal and vertical white space characters (wider than [: blank:] contains) [: Cntrl:] non-printable control characters (backspace, delete, alarm ...) ) [:d igit:] decimal digit [: xdigit:] Hexadecimal number [: graph:] printable non-whitespace character [:p rint:] printable character [:p unct:] Punctuation

4.1.2, number of matches

Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character.

* match the preceding character any time, including 0 times. Greedy mode, match as long as possible: * Any character of any length \? Matches the preceding character 0 or 1 times \+ matches its preceding character at least 1 times \{n\} matches the preceding character n times \{m,n\} matches the preceding character at least m times, up to n times \{,n\} matches the preceding character up to n Times \{n,\} matches the preceding character at least n times

4.1.3, Position anchoring

Position anchoring: The position where the position appears.

^ The beginning of the line is anchored to the leftmost $ end anchor for the pattern, the rightmost ^pattern$ for the pattern matches the entire row ^$ blank line ^[[:space:]]*$ blank row \< or \b The first anchor, used for the word pattern of the left \> or \b suffix anchor; for the right side of the word pattern \<pattern\> Match Whole Word

4.1.4, grouping, and back reference

Group: \ (\) binds one or more characters together and treats them as a whole, such as: \ (root\) \+

The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ....
\1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left;

? Example:

\ (string1\+\ (string2\) *\) \1:string1\+\ (string2\) * \2:string2

Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself

Or: \|

Example: A\|b:a or b c\|cat:c or cat \ (c\|c\) At:cat or cat

5. Egrep and extended Regular expressions

Egrep = Grep-e

Grammar:

Egrep [OPTIONS] PATTERN [FILE ...]

Extend the metacharacters of regular expressions:

Character Matching:

. Any single character [] specifies the range of characters [^] is not in the specified range of characters

Number of matches:

*: Match the preceding characters any time?: 0 or 1 times +: 1 or more times {m}: matches m times {m,n}: At least m, up to N times

Location anchoring:

^: Beginning $: line tail \<, \b: First \>, \b: End of language

Group and back references:

(): Back reference: \1, \2, ...

Or: |

A|b:a or B c|cat:c or cat (c|c) At:cat or cat

Linux Basic Management-Text processing (small command combination to solve big problems)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

how to solve asvab math problems basic linux command line commands linux basic text editor basic linux command step by step example small basic math functions small basic download microsoft small basic random number

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux Basic Management-Text processing (small command combination to solve big problems)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support