The current situation is:
1. In one of my folders, there are many file names like this data file
part-m-0000
part-m-0001
part-m-0002
part-m-0003
...
2. The data in each of these folders is in this format:
"460030730101160", "3", "0", "0", "0", "2013/8/31 0:21:42"
"460036745672363", "3", "0", "0", "0", "2013/8/31 0:21:31"
"460030250931114", "3", "1307", "1", "0", "2013/8/31 0:21:40"
"460030250942643", "3", "0", "0", "0", "2013/8/31 0:21:40"
"460036650411006", "3", "1021", "1", "0", "2013/8/31 0:21:39"
"000000000009674", "8", "0", "0", "0", "2013/8/31 0:12:28"
"000000000005661", "8", "0", "0", "0", "2013/8/31 0:12:29"
"460030731390121", "3", "0", "0", "0", "2013/8/31 21:54:00"
"460030256111396", "3", "0", "0", "0", "2013/8/31 21:54:00"
"460030207447762", "3", "0", "0", "0", "2013/8/31 21:53:58"
"460030250939916", "3", "0", "0", "0", "2013/8/31 21:53:58"
"460030957972011", "3", "1613", "0", "0", "2013/8/31 21:53:51"
"460030237206739", "3", "0", "0", "0", "2013/8/31 21:53:59"
...
Now we need to remove the quotes from the numbers and extract the hours of the last column, and here's the process I'm working with Python:
1. First traverse all the files in the current folder that begin with ' part ';
2. For each file, read each line, according to "," to split;
3. Then read each part of the quotation mark in the middle of the section, the last time to take the part of the hour, where it is necessary to determine the number of hours is 1 or 2;
4. Write a line on each line read
Here is the specific to buy
#coding: Utf-8import osfor root,dir,files in Os.walk ("./"): For file in Files:if file.startswith ("p Art "): filepath ="./"+file #This is the current file path print filepath Newfilepath = "./data_handled/" +file[7:] # This is the file used to write into Fil E = open (filepath) newfile = open (Newfilepath, ' W ') for line in file: String = "" Line_ = Line.split (', ') For I in range (len (line_)-1): j = Line_[i][1:len (Line_[i])-1] #Delte the "" string + = J String + = ', ' Len1 = Len (line_) If Len (Line_[len1-1]) > 12: If LINE_[LEN1-1][12]= = ': ': k = line_[len1-1][11:12] Else : K = line_[len1-1][11:13] Else: K = "-1" string + = k NEWFILE.WR ITE (string+ "\ n") Newfile.close ()
Python Practice-Data Processing (ii) data-specific format changes