Text-processing tools in Linux

Source: Internet
Author: User
Tags control characters diff echo command stdin expression engine egrep

Text Processing Tools

There are a lot of text tools in the Linux system now specifically describes several tools and files such as extracting text Three Musketeers



File content: Less and cat

File interception: Head and tail

Extract by column: Cut

Extract by keyword: grep egrep



First there is a cat TAC to view the files



Cat [OPTION] ... [FILE] ...

-E: Display line terminator $

-N: Numbering each line displayed

-A: Show all control characters

-B: Non-empty line number

-S: Compress consecutive blank lines into a row


The TAC, like the Cat command, is simply an inverse


MORE: Paging through files

More [OPTIONS ...] FILE ...

-D: Show page flipping and exit tips


Less: A page-by-page view of a file or stdin output useful commands include:/text-search text n/n skip to the next or previous match the-A is a pager used by the man command


Show text before or after content


Head head [OPTION] ... [FILE] ...

-C #: Specify get before # bytes

-N #: Specifies the first # line to get

-#: Specify the number of rows


Tail tail [OPTION] ... [FILE] ...

-C #: Specifies the # bytes after fetching

-N #: Specifies that the # line-# is fetched:

-F: Trace display File New additions, common log monitoring


Extract text cut and merge files by column paste


Cut [OPTION] ... [FILE] ...

-D DELIMITER: Indicates delimiter, default tab

-F Fileds: #: # field #,#[,#]: Discrete multiple fields, such as 1,3,6 #-#: Continuous multiple fields, for example 1-6 mixed use: 1-3,7-c by character cut--output-delimiter=string Specify output delimiter



Display a specified column of a file or stdin data

Cut-d:-f1/etc/passwd

CAT/ETC/PASSWD |cut-d:-f7 cut-c2-5/usr/share/dict/words

Paste merge two files with row number columns to one line

Paste [OPTION] ... [FILE] ...

-D delimiter: Specify Delimiter, default tab

-S: All rows are composited on a single line display

Paste F1 F2 paste-s F1 f2


There are also analytical text tools such as text data statistics: WC finishing Text: Sort comparison files: diff and patch


Can collect text statistics WC


Count the total number of words, total number of rows, total number of bytes, and total number of characters you can run the data in a file or stdin. Story.txt 237 1901 Story.txt line number of digits

-L to count only the number of rows

-W to count only the total number of words

-C to count only the total number of bytes

-M to count only the number of characters


Sort used for text sorting


Display the collated text in stdout, without changing the original file $ sort [options] file (s)

Common options

-R performs reverse direction (top to bottom) finishing

-N Execution by number size

The-f option ignores character capitalization in the (fold) string

-u option (unique) Delete duplicate rows in output

The-t C option uses C as the field delimiter

The-k x option can be used multiple times by using the C character Delimited X column collation



Uniq of the multiple-sex


Uniq command: Remove duplicate front and back rows from input

Uniq [OPTION] ... [FILE] ...

-C: Shows the number of repetitions of each line;

-D: Show only the rows that have been repeated;

-U: Displays only rows that have not been duplicated, and is used together with the sort command in a continuous and identical way: sort Userlist.txt | Uniq-c


Diff used when comparing files


Compare the differences between two files $ diff foo.conf-broken foo.conf-works 5c5 < use_widgets = no--Use_widgets = yes Note 5th line there is a difference (change)


Patch Patches


The output of the diff command is saved in a file called "patches"

Use the-u option to output the "unified (Unified)" diff format file, which is best for patch files.

The patch command replicates changes made in other files (use caution!). )

Apply-B option to automatically back up changed files $ diff-u foo.conf-broken foo.conf-works > Foo.patch $ patch-b foo.conf-broken foo.patch



Text Processing tool for the last finale the Three Musketeers's grep and its extended regular expression egrep the other two musketeers, sed and awk, respectively.


grep: Text filter (Pattern: pattern) tool; grep, Egrep, fgrep (regular expression search not supported)

Sed:stream Editor, text editing tools;

Implementation Gawk on Awk:linux, Text Report Generator


Grep:global search REgular expression and Print out of the line.


Function: Text Search tool, according to user-specified "mode" to match the target text line by row to check; print matching lines; pattern: Filter conditions written by regular expression characters and text characters

grep [OPTIONS] PATTERN [FILE ...]

grep root/etc/passwd grep "$USER"/etc/passwd

grep ' $USER '/etc/passwd grep ' WhoAmI '/etc/passwd


The command line for grep has


--color=auto: Coloring the text to match to the display;

-V: Displays rows that cannot be matched to pattern;

-I: Ignore character case

-N: Show matching line numbers

-C: Count the number of matching rows

-O: Displays only the matching string;

-Q: Silent mode, does not output any information

-A #:after, after # line

-B #: Before, Front # line

-c #:context, front and back # lines

-E: Implementing a logical or relationship between multiple options grep–e ' cat '-e ' dog ' file

-W: Entire line matches Whole word

-E: Use ere



A detailed description of the regular expression


REGEXP: A pattern written by a class of special characters and text characters in which some characters (metacharacters) do not represent character literal meaning, while the function program support for control or wildcard: grep, vim, Less,nginx divide two classes: basic Regular expression: BRE extended Regular expression: ERE GREP-E, egrep Regular expression engine: using different algorithms, check the software module PCRE (Perl Compatible Regular Expressions) metacharacters for processing regular expressions: character matching, number of matches, position anchoring, grouping


Basic regular Expression meta-character match number of matches and position anchoring and grouping


Character matching

. : matches any single character;

[]: matches any single character within the specified range

[^]: matches any single character outside the specified range

[:d igit:] All numbers

[: Lower:] All lowercase letters

[: Upper:] All uppercase letters

[: Alpha:] All letters

[: Alnum:] All the letters and numbers

[:p UNCT:] All punctuation

[: Space:] Space and Tab



Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character

*: matches the preceding character any time, including 0 greedy modes: match as long as possible

. *: Any character of any length

\?: match its preceding character 0 or 1 times

\+: Matches the preceding characters at least 1 times

\{m\}: Matches the preceding character m times

\{m,n\}: Matches the preceding character at least m times, up to N times

\{,n\}: Matches the preceding character up to n times

\{m,\}: Matches the preceding character at least m times



Position anchoring: positioning where it appears

^: Anchor at the beginning of the line for the leftmost mode

$: End-of-line anchoring for the right-most mode

^pattern$: For pattern matching entire row

^$: Empty line ^[[:space:]]*$: blank line

\< or \b: The first anchor of the word, used for the left side of the word pattern

\> or \b: the ending anchor; for the right side of the word pattern

\<pattern\>: Match Whole word



Group: \ (\): Bind one or more characters together and treat them as a whole


such as: \ (root\) \+ the pattern in the grouping brackets matches the content that is recorded in the internal variables by the regular expression engine, and these variables are named: \1, \2, \3, ... \ 1: From the left, the first opening parenthesis and the matching closing parenthesis match the pattern between the characters;


Example: \ (string1\+\ (string2\) *\) \1:string1\+\ (string2\) * \2:string2 back reference: references the pattern in the preceding grouping brackets matches the character (not the pattern itself)


Egrep and extended regular expressions and basic regular expressions are almost only a few characters shorter


Egrep = Grep-e egrep [OPTIONS] PATTERN [FILE ...]

Extend the metacharacters of regular expressions:

Character Matching:

. Any single character

[] Specify the range of characters

[^] characters not in the specified range



Number of matches:

*: matches the preceding character any time

?: 0 or 1 times +:1 or more times

{m}: matches M-Times

{M,n}: At least m, up to N times


Location anchoring:

^: Beginning of the line

\<, \b: the first language

\>, \b: The end of the language

Group: () Back reference: \1, \2, ...

Or: A|b c|cat:c or Cat (c|c) At:cat or cat



Regular expressions for this chapter focus on personal understanding of regular expressions the flexibility of regular expressions requires the use of regular expressions to write the expressions that correspond to the different requirements of the search cause everyone's methods may not be the same, but the results are the same. This is precisely the essence of the regular expression can be freely combined without regularity to find The most important understanding is the problem.



Homework


Find all the lines in the/proc/meminfo file that begin with uppercase or lowercase s, at least three ways:

Grep-i "^s"/proc/meminfo

grep "^[ss]"/proc/meminfo

Grep-e "^ (s| S) "/proc/meminfo


Computer Demo


[Email protected] desktop]# echo "/etc/sysconfig/" |grep-oe "[^/]+/?$]

sysconfig/

[Email protected] desktop]# echo "/etc/sysconfig/" |grep-oe "[^/]+/?$" |cut-d/-f1

Sysconfig

[Email protected] desktop]# grep-i ' ^s '/proc/meminfo

swapcached:0 KB

swaptotal:2047996 KB

swapfree:2047996 KB

shmem:2512 KB

slab:87860 KB

sreclaimable:21188 KB

sunreclaim:66672 KB

[Email protected] desktop]# grep-e ' ^ (s|s) '/proc/meminfo

swapcached:0 KB

swaptotal:2047996 KB

swapfree:2047996 KB

shmem:2512 KB

slab:87876 KB

sreclaimable:21196 KB

sunreclaim:66680 KB

[[email protected] desktop]# grep ' ^]ss] '/proc/meminfo

[[email protected] desktop]# grep ' ^[ss] '/proc/meminfo

swapcached:0 KB

swaptotal:2047996 KB

swapfree:2047996 KB

shmem:2512 KB

slab:87860 KB

sreclaimable:21188 KB

sunreclaim:66672 KB

[Email protected] desktop]#





Displays information about Tian1 tian2 or tian3 users on the current system

Grep-e "^ (tian1|tian2|tian3) \>"/etc/passwd


Computer Demo


[[email protected] ~]# cat/etc/passwd |grep-e "^ (TIAN1|TIAN2|TIAN3) \>"

Tian1:x:505:505::/home/tian1:/bin/bash

Tian2:x:506:506::/home/tian2:/bin/bash

Tian3:x:507:507::/home/tian3:/bin/bash





Find the line at the beginning of the/etc/rc.d/init.d/functions file that follows a word (including an underscore) followed by a parenthesis

GREP-E-O "[_[:alnum:]]+\ (\)"/etc/rc.d/init.d/functions


Computer Demo


[[email protected] ~]# grep-e-O "[_[:alnum:]]+\ (\)"/etc/rc.d/init.d/functions

Fstab_decode_str ()

Checkpid ()

__readlink ()

__fgrep ()

__kill_pids_term_kill_checkpids ()

__kill_pids_term_kill ()

__umount_loop ()

__umount_loop_2 ()

__source_netdevs_fstab ()

__source_netdevs_mtab ()

__umount_loopback_loop ()

__find_mounts ()

__pids_var_run ()

__pids_pidof ()

Daemon ()

Killproc ()

Pidfileofproc ()

Pidofproc ()

Status ()

Echo_success ()

Echo_failure ()

Echo_passed ()

Echo_warning ()

Update_boot_stage ()

Success ()

Failure ()

Passed ()

Warning ()

Action ()

Action_silent ()

Strstr ()

Confirm ()

Get_numeric_dev ()

Is_ignored_file ()

Is_true ()

Is_false ()

Apply_sysctl ()

Key_is_random ()

Find_crypto_mount_point ()

Init_crypto ()

[Email protected] ~]#




Use the echo command to output an absolute path, using Egrep to remove the base name

echo/etc/sysconfig/| GREP-E-O "[^/]+/?$] | Cut-d/-F1


Computer Demo


[Email protected] desktop]# echo "/etc/sysconfig/" |grep-oe "[^/]+/?$"

sysconfig/

[Email protected] desktop]# echo "/etc/sysconfig/" |grep-oe "[^/]+/?$" |cut-d/-f1

Sysconfig

[Email protected] desktop]#







Find the value between 1-255 in the ifconfig command result

Ifconfig | GREP-E-O "\< ([1-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \> "


Computer Demo


[Email protected] ~]# Ifconfig | GREP-E-O "\< ([1-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \> "

29

67

46

172

18

16

92

172

18

16

255

255

255

255

64

1

62

120

8

5

1

127

1

255

1

128

1

12

12

[Email protected] ~]#



Find all IPV4 addresses in the ifconfig command result

Ifconfig | GREP-E-O "(\< ([1-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \>\.) (\< ([0-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \>\.) {2}\< ([0-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \> "


Computer Demo


[Email protected] ~]# Ifconfig | GREP-E-O "(\< ([1-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \>\.) (\< ([0-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \>\.) {2}\< ([0-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \> "

172.18.16.92

172.18.16.255

255.255.255.0

127.0.0.1

255.0.0.0



Find the row for the user name and shell name in the/etc/passwd file

Grep-e "^ ([^:]+\>). *\1$"/etc/passwd


Computer Demo



[Email protected] ~]# grep-e "^ ([^:]+\>). *\1$"/etc/passwd

Sync:x:5:0:sync:/sbin:/bin/sync

Shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

Halt:x:7:0:halt:/sbin:/sbin/halt

[Email protected] ~]#

















This article from "11892658" blog, declined reprint!

Text-processing tools in Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.