Linux Basic Management-Text processing (small command combination to solve big problems)

Source: Internet
Author: User
Tags control characters diff file handling lowercase stdin expression engine egrep


Objective:

The processing of logs in the server is a very common work, so operation and maintenance for a variety of text tools to view, analyze, statistics, is a necessary basic work. It is necessary to learn common file handling commands such as regular expressions, grep, Egrep, and TR, sort, uniq, and so on.


1. Tool classification

File content: Less and cat

File interception: Head and tail

Extract by column: Cut
Extract by keyword: grep


2. Tool Collection


2.1, File Content View command: Cat, Tac,rev;


(1) Cat


Grammar:

Cat [OPTION] ... [FILE] ...


Options:

-E: Display line terminator $;-n: numbering each line displayed;-A: Show all controls;-B: Non-empty line number;-S: Compress successive blank lines into one row;


(2) TAC: line reverse view;


(3) Rev: column Reverse view;




2.2, paging to view the contents of the file: more, less


(1) More: paging to view files;


Grammar:

More [OPTIONS ...] FILE ...


Options:

-D: Show page turn and exit prompt;



(2) Less: A page-by-page view of the file or stdin output;

Useful commands for viewing include:/text: search text;? text: Search text; n/n: Jumps to one or the same previous match

Note: The less command is a paging device used by the man command



2.3. display text before or after content: Head, tail, TAILF


(1) Head


Grammar:

Head [OPTION] ... [FILE] ...

Options:

-C #: Specify get Pre # byte-N #: Specifies the first # line-#: Specify the number of rows


(2) tail?


Grammar:

tail [OPTION] ... [FILE] ...

Options:

-C #: Specify after # byte-N #: Specify get # line-#: Look at the next few lines;-F: Trace to show file fd new additions, common log monitoring; equivalent to--follow=descriptor. -F: Trace file name, equivalent to-follow=name--retry

(3) Tailf

Similar to Tail–f, does not access files when files are not growing;




2.4. Extract text by column cut

Grammar:

? Cut [OPTION] ... [FILE] ...


Options:

-D DELIMITER: Indicates delimiter, default Tab-f fileds:fileds #: # field; #,#[,#]: discrete fields, such as 1,3,6; #-#: Continuous multiple fields, such as 1-6; mixed: 1-3,7;-c cut by character;--output-delimiter=string specifies the output delimiter;




Example: Display a specified column of a file or stdin data;

cut-d:-f1/etc/passwd cat/etc/passwd |    cut-d:-f7 cut-c2-5/usr/share/dict/words [[email protected] ~] #cut-D:-f1,3/etc/passwd root:0 bin:1 Daemon:2


2.5. Merging files paste


Paste: Merge Two files with row number columns to one line

Grammar:

Paste [OPTION] ... [FILE] ...


Options:

-D delimiter: Specify delimiter, default with tab-s: All rows are composited on one line


Example:

Paste F1 f2 paste-s f1 f2 [[email protected] ~] #paste F1 F2 1 a 2 s 3 a 4 a 5 s 6 b c D


2.6. Text Data statistics: WC


Role:

?   Collects text statistics, including the total number of words counted, the total number of lines, the total number of bytes, and the total number of characters. You can count the data in a file or stdin.


Example:

# WC story.txt 237 1901 Story.txt Line number of words number of bytes


Common options:

? -L counts only the number of rows;? -W counts only the total number of words;? -c counts only bytes total;? -M only counts the number of characters;? -L Displays the length of the longest line in the file;



2.7. Collating text: Sort

Implement text sorting and display the sorted text in stdout without changing the original file.

Grammar:

Sort [Options] file (s)


Common options:

? -R performs the opposite direction (top to bottom) finishing;? -N execution is sorted by number size;? The-f option ignores character capitalization in the (fold) string; -u option to delete duplicate rows in the output (unique); The-t C option uses C as the field delimiter;? The-K option is sorted by using the C character-delimited X column, and can be used multiple times;


2.8, the uniqueness of processing: Uniq


Removes duplicate rows from the input before and after the previous phase.



Grammar:

? Uniq [OPTION] ... [FILE] ...



Options:

-C: Displays the number of repetitions of each line;-D: Displays only the rows that have been repeated;-U: Only the rows that have not been duplicated are displayed; continuous and identical are duplicates;



The Uniq command is often used with the sort command: Sort Userlist.txt | Uniq-c



2.9. Compare Files: diff


Compare the differences between two files:

diff foo.conf foo2.conf 5c5 < use_widgets = no---> use_widgets = yes/Note 5th line There is a difference (change)/


?

? diff: The output of the command is stored in a file called "Patch";
? Use the-u option to output the "unified (Unified)" diff format file, which is best for patch files.

2.10. Patch


Patch: Copy changes made in other files (use caution).


Options:

-B option to automatically back up changed files


Example:

$ diff-u foo.conf foo2.conf > foo.patch;$ patch-b foo.conf foo.patch/Back up foo.conf files before use/;



3, Text Three Musketeers--grep command


Linux Text Processing Three musketeers: grep, sed, awk;

grep: Text filter (Pattern: pattern) tool; grep, Egrep, fgrep (regular expression search is not supported). Sed:stream Editor, text editing tools. Implementation Gawk on Awk:linux, Text Report Generator.



Grep:global search REgular expression and Print out of the line.

Function: The text Search tool matches the target text line by row according to the user-specified "mode", and prints the matching line .

Pattern: Filter conditions written by regular expression metacharacters and text characters.


Grammar:

grep [OPTIONS] PATTERN [FILE ...]

Example:

grep root/etc/passwdgrep "$USER"/etc/passwdgrep ' $USER '/etc/passwdgrep ' WhoAmI '/etc/passwd



Command options:

--color=auto: The color of the text to match to the display.    -V: Displays rows that are not matched by pattern.    -I: Ignore character case.    -N: Displays the matching line number.    -C: Counts the number of rows to match.    -O: Displays only the string that matches.        -Q: Silent mode, does not output any information.    -A #: After, # lines.    -B #: Before, front # line.        -C #: Context, before and after each # line.        -E: Implements a logical or relationship between multiple options.    Grep–e ' Cat '-e ' dog ' file-w: matches the entire word.    -E: Use Ere to support extended regular expressions: -F: Equivalent to Fgrep and does not support regular expressions.


4. Regular Expressions (REGEX)


REGEXP: A pattern written by a class of special characters and text characters in which some characters (metacharacters) do not represent literal meanings of characters, but are a function of control or wildcard.
Program support: Grep,sed,awk,vim, less,nginx,varnish, etc.


Regular expression classification:

Basic regular expression: BRE; extended regular expression: ERE;


Regular expression engine:

Using different algorithms, check the software module for processing regular expression; PCRE (Perl Compatible Regular Expressions)


Meta-character classification: character matching, number of matches, position anchoring, grouping;
Regular expression man helps get: Man 7 regex



4.1. Basic regular Expression meta-character


4.1.1, character matching


. matches any single character [] matches any single character in the specified range [^] matches any single character outside the specified range [: Alnum:] Letters and Numbers [: Alpha:] for any English uppercase and lowercase characters, i.e. A-Z, a-z[:lower:] lowercase letters [: Upper:] Uppercase [: blank:] white space characters (spaces and tabs) [: space:] Horizontal and vertical white space characters (wider than [: blank:] contains) [: Cntrl:] non-printable control characters (backspace, delete, alarm ...) ) [:d igit:] decimal digit [: xdigit:] Hexadecimal number [: graph:] printable non-whitespace character [:p rint:] printable character [:p unct:] Punctuation


4.1.2, number of matches



Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character.


* match the preceding character any time, including 0 times. Greedy mode, match as long as possible: * Any character of any length \? Matches the preceding character 0 or 1 times \+ matches its preceding character at least 1 times \{n\} matches the preceding character n times \{m,n\} matches the preceding character at least m times, up to n times \{,n\} matches the preceding character up to n Times \{n,\} matches the preceding character at least n times


4.1.3, Position anchoring


Position anchoring: The position where the position appears.


^ The beginning of the line is anchored to the leftmost $ end anchor for the pattern, the rightmost ^pattern$ for the pattern matches the entire row ^$ blank line ^[[:space:]]*$ blank row \< or \b The first anchor, used for the word pattern of the left \> or \b suffix anchor; for the right side of the word pattern \<pattern\> Match Whole Word


4.1.4, grouping, and back reference


Group: \ (\) binds one or more characters together and treats them as a whole, such as: \ (root\) \+


The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ....
\1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left;


? Example:

\ (string1\+\ (string2\) *\) \1:string1\+\ (string2\) * \2:string2


Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself


Or: \|

Example: A\|b:a or b c\|cat:c or cat \ (c\|c\) At:cat or cat




5. Egrep and extended Regular expressions

Egrep = Grep-e

Grammar:

Egrep [OPTIONS] PATTERN [FILE ...]


Extend the metacharacters of regular expressions:

Character Matching:

. Any single character [] specifies the range of characters [^] is not in the specified range of characters



Number of matches:

*: Match the preceding characters any time?: 0 or 1 times +: 1 or more times {m}: matches m times {m,n}: At least m, up to N times


Location anchoring:

^: Beginning $: line tail \<, \b: First \>, \b: End of language



Group and back references:

(): Back reference: \1, \2, ...


Or: |

A|b:a or B c|cat:c or cat (c|c) At:cat or cat



Linux Basic Management-Text processing (small command combination to solve big problems)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.