Linux Text Processing Tools

Last Update:2017-06-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. grep

grep (Global search Regular expression (RE) and print out of the line, full search of regular expressions and print out rows) is a powerful text search tool that uses regular expressions to search for text. and print out the matching lines.

Grammar:

grep [options] ' pattern ' input_file ...

[Options] Main parameters:
-C: Outputs only the count of matching rows.
-I: Case insensitive (only for single-character).
-H: The file name is not displayed when querying multiple files.
-L: Only file names that contain matching characters are output when querying multiple files.
-N: Displays matching lines and line numbers.
-S: does not display error messages that do not exist or have no matching text.
-V: Displays all lines that do not contain matching text.

pattern Regular Expression main parameters:
\: Ignores the original meaning of special characters in regular expressions.
^: matches the start line of the regular expression.
$: Matches the end line of the regular expression.
\<: Starts from the line that matches the regular expression.
\>: End of line to match regular expression.
[]: A single character, such as [a], a meets the requirements.
[-]: range, such as [A-z], i.e. A, B, C to Z all meet the requirements.
。 : all the individual characters.
*: There are characters, the length can be 0.

2. Sed

Sed is a stream editor, which is a very useful tool in text processing and can be used perfectly in conjunction with regular expressions. When processing, the currently processed rows are stored in a temporary buffer called pattern space, followed by the SED command to process the contents of the buffer, and after processing is done, the contents of the buffer are sent to the screen. Then the next line is processed, so it repeats until the end of the file. The file content does not change unless you use redirection to store the output. SED is mainly used to automatically edit one or more files, to simplify the repeated operation of the file, to write the conversion program and so on.

Sed:stream EDitor

Line Editor (full Screen Editor: VIM)

SED: Mode space

By default, the source file is not edited, only the data in the pattern space is processed, and then the pattern space is printed to the screen after processing is finished;

SED usage:

sed [options] ' addresscommand ' file ...

Options Common parameters:

-N: Silent mode, content not in the default display mode space

-I: Modify the original file directly

-E script-e script: Multiple scripts can be executed at the same time

-f/path/to/sed_script

Usage: sed-f/path/to/script file

Address :

1, Startline,endline

Like 1,100.

$: Last line

2,/regexp/

/^root/

3,/pattern1/,/patteern2/

The first line that is matched by the pattern1 begins at the end of the line that is first matched to the pattern2, and all the rows in the middle

4, LineNumber

The specified row

5, Startline,+n

Starting from Startline, n rows backward;

Command :

D: Delete the qualifying line;

P: Displays rows that match the criteria;

A \string: Appends a new line after the specified line, with the contents of string

\ n: can be used for line wrapping

I \string: Adds a new row before the specified line, with the contents of string

R FileName: Adds the contents of the specified file to the qualifying line

W FileName: Saves the row in the range specified by the address to the specified file;

S/pattern (can use regular expression characters, and string cannot)/string/modifier: Find and Replace, default replaces the first string in each line that is matched to the pattern

Add modifier:

G: Global Substitution

I: Ignore character case

S///,s###,[email protected]@@,s::: sed can use any character as a delimiter

$.. $, \1,\2 \ (.. \) is used to match a string, the first string to match is \1, and so on \2

Sed exercises:

1, delete the/etc/grub.conf file in the beginning of the blank character;

Sed-r ' s#^[[:space:]]+# #g '/etc/grub.conf

2. Replace the number in the "Id:3:initdefault:" line in the/etc/inittab file with 5;

Sed ' [email protected]\ (id:\) [0-9]\ (: initdefault:\) @\1\5\[email protected] '/etc/inittab

3, delete the blank line in the/etc/inittab file;

Sed '/^$/d '/etc/inittab

4. Delete the # number at the beginning of the/etc/inittab file;

Sed ' s/^#//g '/etc/inittab

5, delete the beginning of a file # # and the following blank, but requires the # number must be followed by a white space character;

Sed ' s/^#[[:space:]]+//g ' sed.txt

6. Delete a file with a blank character followed by the line of the # Class of the head of the blank characters and #;

Sed ' s/^[[:space:]]+#//g ' sed.txt

7. Remove the directory name of a file path.

echo "/etc/rc.d/" | Sed-r ' [email protected]^ (/.*/) [^/]+/[email protected]\[email protected] '

3. awk

Awk is a programming language that is used to process text and data under Linux/unix. The data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script. Awk has many built-in features, such as arrays, functions, and so on, which are the same as the C language, and flexibility is the biggest advantage of awk.

Syntax: awk ' pattern[action] ' file

Print $

-F

# awk [Options] ' script ' file1 file2, ...

# awk [options] ' PATTERN ' {action} ' file1 file2, ...

the output of awk :

First, print

Use format for print:

Print ITEM1,ITEM2,...

Points:

1, the items are separated by commas, and the output is divided by blank characters;

2. The output item can be a string or numeric value, a field of the current record (such as $ $), a variable, or an expression of awk, and the value will be converted to a string before output;

3, the Print command after the item can be omitted, at this time its function is equivalent to print $, so if you want to output blank lines, you need to use Pring "";

Example:

# awk ' BEGIN {print ' line one\nline two\nline three '} '

Awk-f: ' {print $1,$3} '/etc/passwd

Second, awk variables

2.1 awk built-in variables of record variables

Fs:field separator, the field delimiter used when reading text;

Rs:record separator, enter the text message with the line break;

Ofs:output Filed Separator

Ors:output Row Separator

2.2 awk built-in variable data variables

Nr:the number of the input Records,awk command, and if there are multiple files, this number will uniformly count the rows of the processed multiple files;

Nf:number of field, current record number of field;

FNR: Unlike NR, FNR is used to record the number of rows being processed that are currently processed in this file in total;

ARGV: Array, save the command line itself this string, such as awk ' {print $} ' a.txt b.txt This command, argv[0] Save awk,argv[1] Save a.txt;

The number of arguments to the Argc:awk command;

The name of the file processed by the Filename:awk command;

ENVIRON: An associative array of current shell environment variables and their values;

such as: awk ' begin{print environ[' PATH '} '

Third, printf

Use format for printf commands:

printf format, item1, ITEM2, ...

Points:

1. The biggest difference from the print command is that printf needs to specify format;

2. Format is used to specify the output format of each subsequent item;

3. The printf statement does not automatically print line breaks; \ n

The format indicator is preceded by a%, followed by a character, as follows:

%c: The ASCII code that displays the characters;

%d,%i: decimal integer;

%e,%e: The scientific counting method shows the numerical value;

%f: Displays floating-point numbers;

%g,%g: Displays values in the format of scientific notation or in the format of floating-point numbers;

%s: Display string;

%u: unsigned integer;

Percent: show% itself;

Modifier:

N: Display width;

-: left-aligned;

+: Display numerical symbols;

Example:

# awk-f: ' {printf '%-15s%i\n ', $1,$3} '/etc/passwd

Iv. output Redirection

Print Items > Output-file

Print Items >> output-file

Print Items | Command

Special File Descriptor:

/dev/stdin: Standard input

/dev/sdtout: Standard Output

/dev/stderr: Error Output

/dev/fd/n: A specific file descriptor, such as/dev/stdin, is equivalent to/dev/fd/0;

Example:

# awk-f: ' {printf '%-15s%i\n ', $1,$3 > '/dev/stderr '} '/etc/passwd

The Five awk model:

awk ' program ' Input-file1 input-file2 ...

The program is:

Pattern {Action}

...

5.1 Common pattern types:

1, Regexp: Regular expression, the format is/regular expression/

2, expresssion: expression, whose value is not 0 or a non-null character satisfies the condition, such as: $ ~/foo/or $ = = "Magedu", with operator ~ (match) and!~ (mismatch).

3, Ranges: Specify the matching range, the format is PAT1,PAT2

4. Begin/end: Special mode, run only once or before the end of the awk command execution

5, empty (null mode): match any input line;

Pattern-Matching Expressions:

Mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.

BEGIN: Lets the user specify the action that occurs before the first input record is processed, which is where the global variable is usually set.

End: The action that occurs after the last input record has been read by the user.

This article is from the "small white School It" blog, please be sure to keep this source http://xiaojiejt.blog.51cto.com/12536455/1937412

Linux Text Processing Tools

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More