Remember the process of data processing

Source: Internet
Author: User

Because the company is mainly to do the text messaging industry, usually and mobile phone number to deal with more, a variety of exotic demand is more, recently received a director of the wonderful demand, is the two files in the same mobile phone number processing, due to programming level and Excel play limited, so can only think of other measures to solve, First, there are several fields for each file, but they are all structured data in the following format:

15994710001,2016/11/3 0:24,5310001015994710001,2016/11/3 0:24,5310001015001313373,2016/11/3  3:39,5310001013937713309,2016/11/3 6:16,5310001013758943333,2016/11/3 7:19,5310001013868044333,2016/ 11/3 8:33,5310001013500732333,2016/11/3 10:29,5310001013523072333,2016/11/3  10:30,5310001015138132777,2016/11/3 10:31,5310001013960985779,2016/11/3 10:45,53100010 This file has more than 4,000 lines, file 2   Field more, just a part of the content is garbled, so also to protect personal privacy it. "311-sd10658" 2114781676479382330 "," 13703774555 "," 11λp50rit "," 1 "," 2016/11/3 10:07:43 "," 2016/11/3  10:07:41 "," 0 "," DELIVRD "" 311-sd10658 "2114781676479382330", "15920510111", "11λp50rit", "1", "2016/11/3  10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "" 311-sd10658 "2114781676479382330", "18319609333", "11λp50rit", " 1 "," 2016/11/3 10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "" 311-sd10658 "2114781676479382330", " 15221090555 "," 11λp50rit "," 1 "," 2016/11/3 10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "" 311-sd10658 " 2114781676479382330"," 13905879555 "," 11λp50rit "," 1 "," 2016/11/3 10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "" 311-sd10658 " 2114781676479382330 "," 13818586777 "," 11λp50rit "," 1 "," 2016/11/3 10:07:43 "," 2016/11/3 10:07:41 "," 0 "," Delivrd "" 311-sd10658 "2114781676479382330", "13916387773", "11λp50rit", "1", "2016/11/3 10:07:43", "2016/11/3  10:07:41 "," 0 "," DELIVRD "" 311-sd10658 "2114781676479382330", "13882133333", "11λp50rit", "1", "2016/11/3  10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "" 311-sd10658 "2114781676479382330", "18200980999", "11λp50rit", " 1 "," 2016/11/3 10:07:43 "," 2016/11/3 10:07:41 "," 0 "," DELIVRD "

Treatment of the idea:

Because just want the same number, so under Linux with some text processing tools to deal with it, first processing it into a mobile phone number of the file, and then do other processing

You can intercept related columns with cut or awk, but because awk is not familiar, you can use cut interception, note the delimiter, and the related column.

You can then use grep to compare and try diff, but the effect

1, statistics two text files of the same line

GREP-FF file1 file2


2, Statistics file2, file1 not in the row compared two different rows

GREP-VFF file2 file1


This article is from the "Keep Dreaming" blog, please be sure to keep this source http://dreamlinux.blog.51cto.com/9079323/1869844

Remember the process of data processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.