Use Linux command lines to deduplicate text by line and sort the text by repeated times

Source: Internet
Author: User
Using Linux command lines for text deduplication and sorting by repetition linux command lines provide a very powerful text processing function. using linux commands in combination, you can implement many powerful functions. This document provides an example of how to use Linux command lines to deduplicate text by line and sort the text by repeated times. Using Linux command lines for text deduplication and sorting by repetition linux command lines provide a very powerful text processing function. using linux commands in combination, you can implement many powerful functions. This document provides an example of how to use Linux command lines to deduplicate text by line and sort the text by repeated times. The main commands used are sort, uniq, and cut. Among them, the main function of sort is sorting, and the main function of uniq is to de-duplicate adjacent text lines. cut can extract corresponding text columns from text lines (simply put, is to operate text rows by column ). The content of the test file used for demonstration is as follows: [plain] Hello World. apple and Nokia. hello World. I wanna buy an Apple device. the Iphone of Apple company. hello World. the Iphone of Apple company. my name is Friendfish. hello World. apple and Nokia. implementation commands and procedures are as follows: [plain] 1. text line deduplication (1) sorting because uniq commands can only perform deduplication on adjacent lines, before deduplication, sort the text rows so that duplicate rows can be grouped together. $ Sort test.txt Apple and Nokia. apple and Nokia. hello World. hello World. hello World. hello World. I wanna buy an Apple device. my name is Friendfish. the Iphone of Apple company. the Iphone of Apple company. (2) remove the adjacent duplicate rows $ sort test.txt | uniq Apple and Nokia. hello World. I wanna buy an Apple device. my name is Friendfish. the Iphone of Apple company. 2. deduplicate text lines and sort by repetition (1) first, deduplicate text lines and count the number of repetitions (uniq You can add the-c option to count the number of duplicates .). $ Sort test.txt | uniq-c 2 Apple and Nokia. 4 Hello World. 1 I wanna buy an Apple device. 1 My name is Friendfish. 2 The Iphone of Apple company. (2) sort text rows by repeated times. Sort-n identifies the numbers starting with each line and sorts the text lines by their size. The default value is in ascending order. if you want to add the-r option (sort-rn) in descending order ). $ Sort test.txt | uniq-c | sort-rn 4 Hello World. 2 The Iphone of Apple company. 2 Apple and Nokia. 1 My name is Friendfish. 1 I wanna buy an Apple device. (3) the number of times each row is deleted. The cut command can operate text lines by column. We can see that the previous repeat count accounts for 8 characters. Therefore, you can use the cut-c 9 command to retrieve 9th characters in each line and subsequent characters. $ Sort test.txt | uniq-c | sort-rn | cut-c 9-Hello World. the Iphone of Apple company. apple and Nokia. my name is Friendfish. I wanna buy an Apple device. the following describes how to use the cut command: [plain] cut-B list [-n] [file...] cut-c list [file...] cut-f list [-d delim] [-s] [file...] the preceding-B,-c, and-f indicate byte, character, and field (I .e. byte, character, and field), and list indicates the range of-B,-c, and-f operations, -n indicates a specific number. file indicates the name of the text file to be operated. delim (Full English: delimiter)) Indicates the delimiter, which is TAB by default;-s indicates that the line without the separator is not included (this is conducive to removing comments and titles, indicates to extract bytes (-B), or characters (-c), or fields (-f) from the specified range ). Representation of the range: n only has n-from n-th to n-m at the end of the row from n-th to m (including m) -m: From the beginning of a row to the entry m (including m)-all the items from the beginning to the end of a row. when writing this article, the shortcut key of vim case conversion is used: gu becomes smaller and gU is capitalized. The combination of ctrl + v can convert uppercase and lowercase characters in a piece of text, which is very easy to use.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.