Shell Script Data Cleansing

Source: Internet
Author: User

The task to be done is to process files of a similar format and extract each month and day to the end of each line (cleaned)

The idea is to first use the cut command to each line of the month and day of the hour to extract, give a variable, and then cycle through the SED command to copy the month and day hours to the end of the line

Feel a problem to realize, because the SED command is not ripe, sed-i don't know whether can realize

Get a quick look at sed

Sed is a good file processing tool, itself is a pipe command, mainly in the behavior of the unit processing, you can replace the data rows, delete, add, select and other specific work, the following first understand the use of SED
The SED command line format is:
sed [-nefri] ' command ' input text

Common options:
        -n: Use Quiet (silent) mode. In the usage of general sed, all data from stdin is generally listed on the screen. However, if you add the-n parameter, only the line (or action) that is specially processed by SED is listed.
       -E: Action editing of SED directly in instruction-column mode;
        -F: Directly write the SED action in a file, and-f filename to perform the SED action within filename;
       -r:sed The action supports the syntax of the extended formal notation. (Presupposition is the basic formal notation of French law)
        I: Directly modifies the contents of the read file, not the screen output.        

Common commands:
        a  : New, a can be followed by a string, and these strings will appear on a new line (the current next line) ~
        c  : Replace, C can be followed by a string, these strings can replace the line between N1,N2!
        d  : Delete, because it is deleted, so D usually does not take any of the following;
          i  : Insert, I can be followed by a string, and these strings will appear on a new line (the current line);
          p  : Print, that is, a selection of information printed. Normally p works with parameter Sed-n ~
         s  : Replace, can be directly replaced by work! Usually this s action can be paired with formal notation! For example 1,20s/old/new/g is!

Example: (Suppose we have a file named ab)
      Delete a row
     [[email protected] ruby] # sed ' 1d ' ab              #删除第一行  
     [[email protected] ruby] # sed ' $d ' ab                #删除最后一行
     [[email  Protected] ruby] # sed ' 1,2d ' ab            #删除第一行到第二行
     [[email protected] ruby] # sed ' 2, $d ' ab             #删除第二行到最后一行

Show a row
. [[email protected] ruby] # sed-n ' 1p ' ab #显示第一行
[[email protected] ruby] # sed-n ' $p ' ab #显示最后一行
[[email protected] ruby] # sed-n ' 1,2p ' ab #显示第一行到第二行
[[email protected] ruby] # sed-n ' 2, $p ' AB #显示第二行到最后一行

Querying using a pattern
[[email protected] ruby] # sed-n '/ruby/p ' ab #查询包括关键字ruby所在所有行
[[email protected] ruby] # sed-n '/\$/p ' AB #查询包括关键字 $ where all lines, using backslashes \ Shielding special meaning

Add one or more lines of string
[email protected] ruby]# Cat AB
Hello!
Ruby is me,welcome to my blog.
End
[[email protected] ruby] # sed ' 1a drink tea ' ab #第一行后增加字符串 "Drink Tea"
Hello!
Drink tea
Ruby is me,welcome to my blog.
End
[[email protected] ruby] # sed ' 1,3a drink tea ' ab #第一行到第三行后增加字符串 ' drink tea '
Hello!
Drink tea
Ruby is me,welcome to my blog.
Drink tea
End
Drink tea
[[email protected] ruby] # sed ' 1a drink tea\nor coffee ' ab #第一行后增加多行, using line break \ n
Hello!
Drink tea
or coffee
Ruby is me,welcome to my blog.
End

Instead of one row or more rows
[[email protected] ruby] # sed ' 1c Hi ' AB #第一行代替为Hi
Hi
Ruby is me,welcome to my blog.
End
[[email protected] ruby] # sed ' 1,2c Hi ' ab #第一行到第二行代替为Hi
Hi
End

Replace a section in a row
Format: sed ' s/string to replace/new string/g ' (the string to replace can be used with regular expressions)
[[email protected] ruby] # sed-n '/ruby/p ' ab | Sed ' s/ruby/bird/g ' #替换ruby为bird
[[email protected] ruby] # sed-n '/ruby/p ' ab | Sed ' s/ruby//g ' #删除ruby

Insert
[[email protected] ruby] # sed-i ' $a bye ' ab #在文件ab中最后一行直接输入 "Bye"
[email protected] ruby]# Cat AB
Hello!
Ruby is me,welcome to my blog.
End
Bye

Delete a matching row

Sed-i '/Match string/d ' filename (note: If the match string is a variable, you need "" instead of ". Remember as if it were)

Replace a string in a matching row

Sed-i '/Match string/s/Replace source string/Replace target string/g ' filename

It is more appropriate to find that SED is used to replace file content, so this method does not work, in other ways:

Export the year, month, day, and hour to a file, then use the Paste command for stitching, then >> add to a text document

The build script pours the month and day hours into the respective document

Add to the S1.ext file after stitching with paste

View the contents of the S1.ext document to see the completed scheduled task

Shell Script Data Cleansing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.