How to use shell scripts to quickly sort and de-rename file data

Source: Internet
Author: User



Before writing an article that uses shell scripts to weigh 10G data, see "Using a few shell commands to quickly weigh 10G data." Today, however, another business, the complexity of the business is more complex than the last simple to be important. Looking for a long time did not find the corresponding method, so with the shell script program to deal with. Specific business logic:



1. Sort by the given designation first



2, after sorting the given field to go to the weight, the rule is as follows:



A) If the number of rows with the same value for the given field in the adjacent N rows does not exceed two rows, the two rows are retained.



A) If the number of rows with the same value for the given field is more than two rows, the first row and the trailing row are retained.



The business logic, in fact, does not look too difficult. But the question is, how can we handle it quickly in the 10~20g data? On the internet for a long time did not find the corresponding treatment method, so first with a relatively stupid way to achieve.



Test data:


F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss
F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss
F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss
F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss
A0223EE1IDJDJ2938X39284BE,20080304041155 ,OQQQQ54,876F0,88888120,727271202,ss
A0223EE1IDJDJ2938X39284BE,20080304041155 ,OQQQQ54,876F0,88888120,727271202,ss


Shell script:


if ["$ #"! = "2"]; then
        echo "Usage: parameter 1: file path, parameter 2: file name."
        exit
fi
#Directory where source files are located
filepath = $ 1
#Source file absolute path
orgfile = $ filepath "/" $ 2
#Temporary files after merging fields
#mergerfile = "$ orgfile" _merge.txt
#Sorted temporary files
sortfile = "$ orgfile" _sort.txt
#Final result file
result_unique = "$ orgfile" _result_unique.txt
echo ""> $ result_unique
#echo "File: $ orgfile"
#echo "Begin merging fields ..."
#awk 'BEGIN {FS = ",";} {print $ 1 "," $ 2 "," $ 3 "," $ 4 "," $ 5 "," $ 6 "," $ 7 "," $ 1 $ 3 $ 4}' $ orgfile> $ mergerfile
#echo "Field merge ends ..."

echo "File sorting start ..."
#sort -t $ "," -k 1,1 -k 9,9 $ mergerfile> $ sortfile
sort -t $ "," -k 1,1 -k 3,4 $ orgfile> $ sortfile
echo "File sorting end ..."


printf "*********** File comparison start ************************** \ n"
echo "while read line <$ sortfile"
cnt = 0
# 首 行
firstline = ""
# 尾 行
lastline = ""
#Key of the last comparison
lastKey = ""
#File Lines
linecount = `sed -n‘ $ = ‘$ sortfile`
i = 1
echo "linecount ========== >>>>>>> $ linecount"
# ******************** [[-n "$ line"]] Make sure the last line can also be read ************ *****************
while read line || [[-n "$ line"]];
do
  echo $ line;
  #Merge fields to be compared
  compare = `echo" $ line "| awk -F‘, ’‘ {print $ 1 $ 3 $ 4} ’`
  echo "compare ===== $ compare"
  #Determine whether it is the last line and whether it is the same as the key in the previous line
  if ["$ i"! = "$ linecount" -a "$ lastKey" = "$ compare"]; then
    echo "[=]"
    cnt = $ (expr $ cnt + 1)
    lastline = "$ line"
  else
    #First come in
    if ["$ firstline" = ""]; then
        firstline = $ line
        cnt = 1
        echo "$ firstline" >> $ result_unique
    fi
    #echo "---- $ i ---------------- >>>>>>>>>>> $ cnt"
    if [$ cnt -gt 1 -o "$ i" == "$ linecount"]; then
        echo "---- $ i ---------------- >>>>>>>>>>> $ cnt"

        if ["$ lastline"! = ""]; then
                echo "$ lastline" >> $ result_unique
        fi

        # Special processing for the last line
        if ["$ i" == "$ linecount"]; then
                echo "================= last line ==================="
                echo "$ line" >> $ result_unique
        fi

        firstline = "$ line"
        lastline = ""
        cnt = 1
    fi
  fi
  # Compare key
  lastKey = "$ compare"
  let i ++
done <$ sortfile

echo "******************* File $ orgfile processing finished ************************ *** "
echo "******************* Result file $ result_unique ************************* ** "
exit 


To add execute permissions to the script:


chmod +x uniquefile. SH


Execute shell Script


sh ./uniquefile. sh ./file Path file name


Results:


[[email protected] ~] # sh uniquefile.sh ./ testfile.csv
File sorting start ...
File sorting end ...
*********** File comparison start **************************
while read line <.// testfile.csv_sort.txt
linecount ========== >>>>>>> 6
A0223EE1IDJDJ2938X39284BE, 20080304041155, OQQQQ54,876F0,88888120,727271202, ss
compare ===== A0223EE1IDJDJ2938X39284BEOQQQQ54876F0
A0223EE1IDJDJ2938X39284BE, 20080304041155, OQQQQ54,876F0,88888120,727271202, ss
compare ===== A0223EE1IDJDJ2938X39284BEOQQQQ54876F0
[=]
F250A4FFIDJDJ2938X39252E7,20080304041348, OQQQQB8,8769E, 88888626,727218105, ss
compare ===== F250A4FFIDJDJ2938X39252E7OQQQQB88769E
---- 3 ---------------- >>>>>>>>>>> 2
F250A4FFIDJDJ2938X39252E7,20080304041348, OQQQQB8,8769E, 88888626,727218105, ss
compare ===== F250A4FFIDJDJ2938X39252E7OQQQQB88769E
[=]
F250A4FFIDJDJ2938X39252E7,20080304041348, OQQQQB8,8769E, 88888626,727218105, ss
compare ===== F250A4FFIDJDJ2938X39252E7OQQQQB88769E
[=]
F250A4FFIDJDJ2938X39252E7,20080304041348, OQQQQB8,8769E, 88888626,727218105, ss
compare ===== F250A4FFIDJDJ2938X39252E7OQQQQB88769E
---- 6 ---------------- >>>>>>>>>>>> 3
================= last line ====================
******************* File. //Testfile.csv Processing ends ********************** *****
******************* Result file. //Testfile.csv_result_unique.txt ********************* ******


Final result file:


[[email protected] ~]# more testfile.csv_result_unique.txt 

A0223EE1IDJDJ2938X39284BE,20080304041155 ,OQQQQ54,876F0,88888120,727271202,ss
A0223EE1IDJDJ2938X39284BE,20080304041155 ,OQQQQ54,876F0,88888120,727271202,ss
F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss
F250A4FFIDJDJ2938X39252E7,20080304041348 ,OQQQQB8,8769E,88888626,727218105,ss


Time is more hurried, so realize it first. Who has a good way please tell me.



How to use shell scripts to quickly sort and de-rename file data


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.