Python Practice-Data Processing (ii) data-specific format changes

Source: Internet
Author: User

The current situation is:

1. In one of my folders, there are many file names like this data file

part-m-0000

part-m-0001

part-m-0002

part-m-0003

...

2. The data in each of these folders is in this format:

"460030730101160", "3", "0", "0", "0", "2013/8/31 0:21:42"
"460036745672363", "3", "0", "0", "0", "2013/8/31 0:21:31"
"460030250931114", "3", "1307", "1", "0", "2013/8/31 0:21:40"
"460030250942643", "3", "0", "0", "0", "2013/8/31 0:21:40"
"460036650411006", "3", "1021", "1", "0", "2013/8/31 0:21:39"
"000000000009674", "8", "0", "0", "0", "2013/8/31 0:12:28"
"000000000005661", "8", "0", "0", "0", "2013/8/31 0:12:29"
"460030731390121", "3", "0", "0", "0", "2013/8/31 21:54:00"
"460030256111396", "3", "0", "0", "0", "2013/8/31 21:54:00"
"460030207447762", "3", "0", "0", "0", "2013/8/31 21:53:58"
"460030250939916", "3", "0", "0", "0", "2013/8/31 21:53:58"
"460030957972011", "3", "1613", "0", "0", "2013/8/31 21:53:51"
"460030237206739", "3", "0", "0", "0", "2013/8/31 21:53:59"
...

Now we need to remove the quotes from the numbers and extract the hours of the last column, and here's the process I'm working with Python:

1. First traverse all the files in the current folder that begin with ' part ';

2. For each file, read each line, according to "," to split;

3. Then read each part of the quotation mark in the middle of the section, the last time to take the part of the hour, where it is necessary to determine the number of hours is 1 or 2;

4. Write a line on each line read

Here is the specific to buy

#coding: Utf-8import osfor root,dir,files in Os.walk ("./"): For file in Files:if file.startswith ("p                        Art "): filepath ="./"+file #This is the current file path print filepath Newfilepath = "./data_handled/" +file[7:] # This is the file used to write into Fil                                E = open (filepath) newfile = open (Newfilepath, ' W ') for line in file:                                String = "" Line_ = Line.split (', ')                                        For I in range (len (line_)-1): j = Line_[i][1:len (Line_[i])-1] #Delte the ""                                string + = J String + = ', '                                        Len1 = Len (line_) If Len (Line_[len1-1]) > 12: If LINE_[LEN1-1][12]= = ': ': k = line_[len1-1][11:12] Else                                        : K = line_[len1-1][11:13] Else: K = "-1" string + = k NEWFILE.WR ITE (string+ "\ n") Newfile.close ()


Python Practice-Data Processing (ii) data-specific format changes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.