Read data row by line from a TXT document that conforms to a certain format and write to Excel (OPENPYXL supports Excel. xlsx format) in Python

Source: Internet
Author: User

A few days ago received a task, from Gerrit on the SSH command to get some commit-related data into a text document, and then put the data into Excel. The data format is as shown

It is observed that the existence of data in a text document conforms to a certain format, and that processing and writing to an Excel document through Python reads and regular expressions greatly reduces the amount of manual processing.

1. Obtain the original information from Gerrit and deposit it in a text document:

$ssh –p 29418 <your-account>@ Gerrit query status:merged since:<date/7/days/ago> 2>&1 | Tee Merged_patch_this_week.txt

2. Read the data from the TXT document.

In Python's standard library, file objects provide three "read" methods:. Read (),. ReadLine (), and. ReadLines (). Each method can accept a variable to limit the amount of data that is read each time, but they typically do not use variables. Read () reads the entire file each time, and it is typically used to place the contents of the file into a string variable. however. Read () produces the most direct string representation of the file content, but it is not necessary for continuous row-oriented processing and is not possible if the file is larger than the available memory.

The difference between ReadLine () and ReadLines () is that the latter reads the entire file one at a time, like. Read (): ReadLines () automatically parses the contents of the file into a list of rows that can be used by Python for ... in ... Structure for processing. On the other hand,. ReadLine () reads only one line at a time, usually much slower than. ReadLines (). You should use. ReadLine () only if there is not enough memory to read the entire file at once.

patch_file_name = "merged_patch_this_week.txt"
patch_file = open (patch_file_name, ‘r’) #Open the document and read the data line by line
for line in open (patch_file_name):
     line = patch_file.readline ()
     print line
3. Writing to an Excel document

Python handles Excel's library of functions, XLRD, XLWT, xlutils are more commonly used, there are many online information about them. However, since they do not support Excel 2007 later versions (. xlsx), they can only reluctantly give up.

After a search, I found the library of OPENPYXL, which not only supports Excel 2007, but has been maintained (currently the latest version is released on the March 31 of the 2.2.1,2015 year). The official description is:

A Python Library to Read/write Excel xlsx/xlsm files, its documentation is legible, related websites: Latest/index.html


It relies on the Jdcal module:

Installation method (Windows 7): Install the Jdcal module first-unzip to a directory, CD to the directory, run "python install". Then install OPENPYXL in the same way.

The following steps are written:

1. Open the workbook:

Wb=load_workbook (' android_patch_review-y2015.xlsx ')

2. Get a worksheet

Sheetnames = Wb.get_sheet_names ()                   ws = Wb.get_sheet_by_name (sheetnames[2])  

3. Write the data in the TXT document and format the cells

patch_file_name = "merged_patch_this_week.txt"
patch_file = open (patch_file_name, ‘r’) #Open the document and read the data line by line
ft = Font (name = ‘Neo Sans Intel’, size = 11)
for line in open (patch_file_name):
     line = patch_file.readline ()
     ws.cell (row = 1, column = 6) .value = re.sub ('project:', '', line) # matches the project line. If the match is successful, the string "project:" is deleted, and the rest Write to Excel row 1 column 6
     ws.cell (row = rows + 1, column = 1) .font = ft

4. Save Workbook (' android_patch_review-y2015.xlsx ')

The complete code is as follows:

from openpyxl.workbook import Workbook
from openpyxl.reader.excel import load_workbook
from openpyxl.styles import PatternFill, Border, Side, Alignment, Protection, Font
import re
#from openpyxl.writer.excel import ExcelWriter
#import xlrd

ft=Font(name=‘Neo Sans Intel‘,size=11)             #define font style
bd=Border(left=Side(border_style=‘thin‘,color=‘00000000‘),          right=Side(border_style=‘thin‘,color=‘00000000‘),          top=Side(border_style=‘thin‘,color=‘00000000‘),          bottom=Side(border_style=‘thin‘,color=‘00000000‘))            #define border style
alg_cc=Alignment(horizontal=‘center‘,                        vertical=‘center‘,                        text_rotation=0,                        wrap_text=True,                        shrink_to_fit=True,                        indent=0)                           #define alignment styles
alg_cb=Alignment(horizontal=‘center‘,                        vertical=‘bottom‘,                        text_rotation=0,                        wrap_text=True,                        shrink_to_fit=True,                        indent=0)
alg_lc=Alignment(horizontal=‘left‘,                        vertical=‘center‘,                        text_rotation=0,                        wrap_text=True,                        shrink_to_fit=True,                        indent=0)                           
patch_file=open(patch_file_name,‘r‘)        #get data patch text

wb=load_workbook(‘Android_Patch_Review-Y2015.xlsx‘)          #open excel to write
sheetnames = wb.get_sheet_names()                   
ws = wb.get_sheet_by_name(sheetnames[2])                    #get sheet

assert ws.cell(row=rows,column=1).value!=None, ‘New Document or empty row at the end of the document? Please input at least one row!‘
print "The original Excel document has %d rows totally." %(rows)
end_tag=‘type: stats‘

for line in open(patch_file_name):
    if re.match(end_tag,line) is not None:    #end string
    if  len(line)==1:                                       #go to next patch
    line = line.strip()    
#    print line
    ws.cell(row=rows+1,column=1).value=ws.cell(row=rows,column=1).value+1           #Write No.
    if re.match(‘change‘,line) is not None:
        ws.cell(row=rows+1,column=2).value=re.sub(‘change‘,‘‘,line)           #Write Gerrit ID

    if re.match(‘url:‘,line) is not None:
        ws.cell(row=rows+1,column=3).value=re.sub(‘url:‘,‘‘,line)             #Write Gerrit url

    if re.match(‘project:‘,line) is not None:
        ws.cell(row=rows+1,column=6).value=re.sub(‘project:‘,‘‘,line)             #Write project
    if re.match(‘branch:‘,line) is not None:
        ws.cell(row=rows+1,column=7).value=re.sub(‘branch:‘,‘‘,line)              #Write branch
    if re.match(‘lastUpdated:‘,line) is not None:
        ws.cell(row=rows+1,column=8).value=re.sub(‘lastUpdated:|CST‘,‘‘,line)             #Write update time
    if re.match(‘commitMessage:‘,line) is not None:
    if re.match(‘Product:|BugID:|Description:|Unit Test:|Change-Id:‘,line) is not None:
        description_str=description_str+‘\n‘+line                       #
    if re.match(‘Signed-off-by:‘,line) is not None:
        ws.cell(row=rows+1,column=4).value=description_str                #Write patch description
print ‘Android_Patch_Review-Y2015.xlsx saved!\nPatch Collection Done!‘


So far, the basic functionality has been implemented, but there are two questions that are not clear:

The first one is the last line of comment in the complete code, and I've found several blogs about OPENPYXL that don't have close after opening the file, so I don't have close in the code either. In theory, it still needs to be felt. This problem will continue to be considered when the file object is understood more deeply.

The second is when you run the script with a warning, " userwarning:discarded range with reserved Name,warnings.warn (" discarded range with reserved Name"), is still in the search for reasons, if you have a clear, please do not hesitate to inform.

Read data row by line from a TXT document that conforms to a certain format and write to Excel (OPENPYXL supports Excel. xlsx format) in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.