Several commonly used text processing gadget TR, Wc,cut,sort,uniq usage detailed

Source: Internet
Author: User

A few text-processing gadgets: TR, Wc,cut,sort,uniq

1. The TR command can replace, compress, and delete characters from standard input. It can turn a set of characters into another set of characters, often used to write graceful single-line commands, powerful.

Syntax: TR option SET1 SET2

-C or--complerment: Replaces all characters that are not part of the first character set (that is, the complement);

-D or--delete: Deletes all characters belonging to the first character set;

-S or--squeeze-repeats: the consecutive repeating character is represented by a single character;

-T or--truncate-set1: First remove the characters from the first character set that are more than the second character set.

Parameters

Character Set 1: Specifies the original character set to be converted or deleted. When you perform a conversion operation, you must use a parameter

Character Set 2 "specifies the target character set for the transformation. However, when performing a delete operation, the parameter "character Set 2" is not required;

Character Set 2: Specifies the target character set to convert to.

Example:

Handle string "Xt.,l 1jr#! $mn 2 c*/fe3 uz4", preserving only the numbers and spaces

[[email protected] testdir] #echo "Xt.,l 1 jr#all.logmn2 c*/fe3 uz4" | Tr-c-d[[:d Igit:]][[:space:] 1 2 3 4


The result of the above command is the addition of spaces and numbers left

(You can also echo ' xt.,l 1 jr#! $mn 2 c*/fe3uz4 ' |tr-d [[: alpha:]][[:p UNCT:]]) No previous method is good, note: Alpha is the letter set Punct as a punctuation set, digit a set of numbers, You can also directly specify the symbol or number directly in single quotation marks, such as: TR ' A-Z ' A-Z '

2. WC character Statistics

Grammar:

WC (optional) (parametric)

Options

-C or--bytes or--chars: Shows only the number of bytes;

-L or--lines: Displays only the number of columns;

-W or--words: Displays only the number of words.

Example:

[[Email protected] ~] #wc-c/etc/issue23/etc/issue[[email protected] ~] #wc-L/etc/issue3/etc/issue


3. Cut is used to display the specified part of the row, deleting the specified field in the file.

Grammar:

Cut (option) (parameter)

Options

-B: Displays only the contents of the direct range specified in the row;

-C: Displays only the characters of the specified range in the row;

-D: Specifies the delimiter for the field, and the default field delimiter is "tab";

-F: Displays the contents of the specified field;

-N: With the "-B" option, do not split multibyte characters;

--out-delimiter=< field delimiter;: Specifies the output content is the field separator;
--complement complements the selected byte, character, or field (that is, the meaning of the complement and the-F use)

Example:

[email protected] testdir]# cat TEST.FILEJACK:X:1000:1000:JACK_CUI:/HOME/JACK:/BIN/BASHAPACHE:X:48:48:APACHE:/USR /share/httpd:/sbin/nologingentoo:x:1001:1001:gentoodistribution:/home/gentoo:/bin/cshnatasha:x:1002:1003::/ Home/natasha:/bin/bash[[email protected]]# cut-d:-f2-3 test.file x:1000x:48x:1001x:1002


Displays the third character of each line:

[Email protected] testdir]# cut-c3 test.file Cant


Displays the length option other than the 3–7 segment (that is, show the first two paragraphs) specifies the complement set for display-D-the delimiter between the segment and the end, otherwise the default is a space delimiter

[email protected] testdir]# cut--complement-d:-f3-7test.file jack:xapache:xgentoo:xnatasha:x


4. Sort: Sorts the lines of text and outputs the results as standard:

Grammar:

Sort ( options ) ( parameters )

Options

-B: ignores whitespace characters that begin before each line;

-C: Check whether the file has been sorted in order;

-D: When sorting, ignore other characters while processing English letters, numbers and space characters;

-F: When sorting, lowercase letters are treated as uppercase letters;

-I: when sorting, Other characters are ignored except ASCII characters between 040 and 176;

-M: merge the files of several sort numbers;

-M: sorts the first 3 letters according to the abbreviation of the month;

-N: According to the size of the numerical order; -o< output file >: The sorted result is deposited into the prepared file;

-R: Sort in reverse order;

-t< delimited character >: Specifies the field separator character to use when sorting;


Example:

[email protected] testdir]# cat b.testjack:x:1000apache:x:48gentoo:x:1001ben:x:599rose:x:1natasha:23:xmike:x:1002 [Email protected] testdir]# sort-n-k3.1,3.2-t: B.test natasha:23:xrose:x:1gentoo:x:1001jack:x:1000mike:x : 1002apache:x:48ben:x:599


Note: The previous example,-N, means that it is sorted numerically only, otherwise by default in ASCII size,-K is sorted by the first and second numbers in the third column otherwise the default is to end from the specified column to the line, and the-T represents the delimiter as:

5. The Uniq command is used to report or ignore duplicate rows in a file and is generally used in conjunction with the sort command.

Grammar

Uniq ( option ) ( parameter )

Options

-C or --count: Displays the number of occurrences of the row next to each column;

-D or --repeated: Displays only the rows that appear repeatedly;

-f< field > or --skip-fields=< field >: Ignores the column specified by comparison;

-s< character position > or --skip-chars=< character position >: Ignores comparison of specified characters;

-U or --unique: Show only one row at a time;

-w< character position > or --check-chars=< character position >: Specifies the character to compare.

Parameters

Input file: Specifies the duplicate line file to be removed. If this item is not specified, the data is read from the standard;

Output file: Specifies the output file to be written to when the contents of the duplicate row are to be removed. If you do not specify this option, the content is displayed to the standard output device (display terminal).

Test text:

[email protected] testdir]# cat uniq.test This was Upiq Test line 1this was Upiq Test line 2this was Upiq Test line 3this is Upiq Test Line 3this was Upiq Test line 2this was Upiq Test line 1this are UPIQ Test line 4


1) Only rows that have been repeatedly repeated are displayed because the previous data stream has been sorted by sort, so the same row will be contiguous, otherwise only line 3 will be displayed for the following rows only

[Email protected] testdir]# sort uniq.test |uniq–d This was Upiq Test line 1this was Upiq Test line 2this is Upiq Test Lin E 3


2) Only show the occurrence of the row, here also need to sort, uniq statistics is also a continuous two lines to do comparison, not the full-text comparison, so you need to sort the same rows together first, and then to the statistics

[[email protected] testdir]# sort uniq.test |uniq–u This was Upiq Test line 4



This article from "Jackcui" blog, reproduced please contact the author!

Several commonly used text processing gadget TR, Wc,cut,sort,uniq usage detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.