A few days ago received a task, from Gerrit on the SSH command to get some commit-related data into a text document, and then put the data into Excel. The data format is as shown
It is observed that the existence of data in a text document conforms to a certain format, and that processing and writing to an Excel document through Python reads and regular expressions greatly reduces the amount of manual processing.
1. Obtain the original information from Gerrit and deposit it in a text document:
$ssh –p 29418 <your-account>@192.168.1.16 Gerrit query status:merged since:<date/7/days/ago> 2>&1 | Tee Merged_patch_this_week.txt
2. Read the data from the TXT document.
In Python's standard library, file objects provide three "read" methods:. Read (),. ReadLine (), and. ReadLines (). Each method can accept a variable to limit the amount of data that is read each time, but they typically do not use variables. Read () reads the entire file each time, and it is typically used to place the contents of the file into a string variable. however. Read () produces the most direct string representation of the file content, but it is not necessary for continuous row-oriented processing and is not possible if the file is larger than the available memory.
The difference between ReadLine () and ReadLines () is that the latter reads the entire file one at a time, like. Read (): ReadLines () automatically parses the contents of the file into a list of rows that can be used by Python for ... in ... Structure for processing. On the other hand,. ReadLine () reads only one line at a time, usually much slower than. ReadLines (). You should use. ReadLine () only if there is not enough memory to read the entire file at once.
patch_file_name = "merged_patch_this_week.txt"
patch_file = open (patch_file_name, ‘r’) #Open the document and read the data line by line
for line in open (patch_file_name):
line = patch_file.readline ()
print line
3. Writing to an Excel document
Python handles Excel's library of functions, XLRD, XLWT, xlutils are more commonly used, there are many online information about them. However, since they do not support Excel 2007 later versions (. xlsx), they can only reluctantly give up.
After a search, I found the library of OPENPYXL, which not only supports Excel 2007, but has been maintained (currently the latest version is released on the March 31 of the 2.2.1,2015 year). The official description is:
A Python Library to Read/write Excel xlsx/xlsm files, its documentation is legible, related websites: http://openpyxl.readthedocs.org/en/ Latest/index.html
openpyxl:https://bitbucket.org/openpyxl/openpyxl/get/2.2.1.tar.bz2
It relies on the Jdcal module: https://pypi.python.org/packages/source/j/jdcal/jdcal-1.0.tar.gz
Installation method (Windows 7): Install the Jdcal module first-unzip to a directory, CD to the directory, run "python setup.py install". Then install OPENPYXL in the same way.
The following steps are written:
1. Open the workbook:
Wb=load_workbook ('android_patch_review-y2015.xlsx')
2. Get a worksheet
sheetnames = wb.get_sheet_names()
ws = wb.get_sheet_by_name(sheetnames[2])
3. Write the data in the TXT document and format the cells
patch_file_name = "merged_patch_this_week.txt"
patch_file = open (patch_file_name, ‘r’) #Open the document and read the data line by line
ft = Font (name = ‘Neo Sans Intel’, size = 11)
for line in open (patch_file_name):
line = patch_file.readline ()
ws.cell (row = 1, column = 6) .value = re.sub ('project:', '', line) # matches the project line. If the match is successful, the string "project:" is deleted, and the rest Write to Excel row 1 column 6
ws.cell (row = rows + 1, column = 1) .font = ft
4. Save Workbook
Wb.save ('android_patch_review-y2015.xlsx')
The complete code is as follows:
from openpyxl.workbook import Workbook
from openpyxl.reader.excel import load_workbook
from openpyxl.styles import PatternFill, Border, Side, Alignment, Protection, Font
import re
#from openpyxl.writer.excel import ExcelWriter
#import xlrd
ft=Font(name=‘Neo Sans Intel‘,size=11) #define font style
bd=Border(left=Side(border_style=‘thin‘,color=‘00000000‘), right=Side(border_style=‘thin‘,color=‘00000000‘), top=Side(border_style=‘thin‘,color=‘00000000‘), bottom=Side(border_style=‘thin‘,color=‘00000000‘)) #define border style
alg_cc=Alignment(horizontal=‘center‘, vertical=‘center‘, text_rotation=0, wrap_text=True, shrink_to_fit=True, indent=0) #define alignment styles
alg_cb=Alignment(horizontal=‘center‘, vertical=‘bottom‘, text_rotation=0, wrap_text=True, shrink_to_fit=True, indent=0)
alg_lc=Alignment(horizontal=‘left‘, vertical=‘center‘, text_rotation=0, wrap_text=True, shrink_to_fit=True, indent=0)
patch_file_name="merged_patch_this_week.txt"
patch_file=open(patch_file_name,‘r‘) #get data patch text
wb=load_workbook(‘Android_Patch_Review-Y2015.xlsx‘) #open excel to write
sheetnames = wb.get_sheet_names()
ws = wb.get_sheet_by_name(sheetnames[2]) #get sheet
rows=len(ws.rows)
assert ws.cell(row=rows,column=1).value!=None, ‘New Document or empty row at the end of the document? Please input at least one row!‘
print "The original Excel document has %d rows totally." %(rows)
end_tag=‘type: stats‘
for line in open(patch_file_name):
line=patch_file.readline()
if re.match(end_tag,line) is not None: #end string
break
if len(line)==1: #go to next patch
rows=rows+1
continue
line = line.strip()
# print line
ws.cell(row=rows+1,column=1).value=ws.cell(row=rows,column=1).value+1 #Write No.
ws.cell(row=rows+1,column=1).font=ft
ws.cell(row=rows+1,column=1).border=bd
ws.cell(row=rows+1,column=1).alignment=alg_cb
ws.cell(row=rows+1,column=5).border=bd
ws.cell(row=rows+1,column=9).border=bd
if re.match(‘change‘,line) is not None:
ws.cell(row=rows+1,column=2).value=re.sub(‘change‘,‘‘,line) #Write Gerrit ID
ws.cell(row=rows+1,column=2).font=ft
ws.cell(row=rows+1,column=2).border=bd
ws.cell(row=rows+1,column=2).alignment=alg_cb
if re.match(‘url:‘,line) is not None:
ws.cell(row=rows+1,column=3).value=re.sub(‘url:‘,‘‘,line) #Write Gerrit url
ws.cell(row=rows+1,column=3).font=ft
ws.cell(row=rows+1,column=3).border=bd
ws.cell(row=rows+1,column=3).alignment=alg_cb
if re.match(‘project:‘,line) is not None:
ws.cell(row=rows+1,column=6).value=re.sub(‘project:‘,‘‘,line) #Write project
ws.cell(row=rows+1,column=6).font=ft
ws.cell(row=rows+1,column=6).border=bd
ws.cell(row=rows+1,column=6).alignment=alg_lc
if re.match(‘branch:‘,line) is not None:
ws.cell(row=rows+1,column=7).value=re.sub(‘branch:‘,‘‘,line) #Write branch
ws.cell(row=rows+1,column=7).font=ft
ws.cell(row=rows+1,column=7).border=bd
ws.cell(row=rows+1,column=7).alignment=alg_cc
if re.match(‘lastUpdated:‘,line) is not None:
ws.cell(row=rows+1,column=8).value=re.sub(‘lastUpdated:|CST‘,‘‘,line) #Write update time
ws.cell(row=rows+1,column=8).font=ft
ws.cell(row=rows+1,column=8).border=bd
ws.cell(row=rows+1,column=8).alignment=alg_cc
if re.match(‘commitMessage:‘,line) is not None:
description_str=re.sub(‘commitMessage:‘,‘‘,line)
if re.match(‘Product:|BugID:|Description:|Unit Test:|Change-Id:‘,line) is not None:
description_str=description_str+‘\n‘+line #
if re.match(‘Signed-off-by:‘,line) is not None:
description_str=description_str+‘\n‘+line
ws.cell(row=rows+1,column=4).value=description_str #Write patch description
ws.cell(row=rows+1,column=4).font=ft
ws.cell(row=rows+1,column=4).border=bd
ws.cell(row=rows+1,column=4).alignment=alg_lc
wb.save(‘Android_Patch_Review-Y2015.xlsx‘)
print ‘Android_Patch_Review-Y2015.xlsx saved!\nPatch Collection Done!‘
#patch_file.close()
So far, the basic functionality has been implemented, but there are two questions that are not clear:
The first one is the last line of comment in the complete code, and I've found several blogs about OPENPYXL that don't have close after opening the file, so I don't have close in the code either. In theory, it still needs to be felt. This problem will continue to be considered when the file object is understood more deeply.
The second is when you run the script with a warning, " userwarning:discarded range with reserved Name,warnings.warn (" discarded range with reserved Name"), is still in the search for reasons, if you have a clear, please do not hesitate to inform.
(Reference: http://blog.csdn.net/werm520/article/details/6898473)
Read data row by line from a TXT document that conforms to a certain format and write to Excel (OPENPYXL supports Excel. xlsx format) in Python