This article mainly introduces the use of Python processing MS Word instance, has a certain reference value, now share to everyone, the need for friends can refer to
Use the Python tool to read and write MS Word files (docx and Doc files), mainly using the Python-docx package. This article gives some common operations, and completes a sample to help you get started quickly.
Installation
Pyhton processing a docx file requires the use of the Python-docx package, which can be easily installed with the PIP tool and the PIP tool in the Scripts folder under the Python installation path
Pip Install Python-docx
Of course you can also choose to use Easy_install or manual installation
Write file contents
Here we give a sample directly, to extract the useful content according to our own needs
#coding =utf-8from docx Import documentfrom docx.shared import ptfrom docx.shared import inchesfrom docx.oxml.ns import qn# Open Documents Document = document () #加入不同等级的标题document. add_heading (U ' MS Word write Test ', 0) document.add_heading (U ' level header ', 1) document.add_heading (U ' two level title ', 2) #添加文本paragraph = document.add_paragraph (U ' We're doing text testing!) ') #设置字号run = Paragraph.add_run (U ' Set font size, ') run.font.size = Pt #设置字体run = Paragraph.add_run (' Set font, ') Run.font.name = ' Consolas ' #设置中文字体run = Paragraph.add_run (U ' set Chinese font, ') run.font.name=u ' arial ' r = Run._elementr.rpr.rfonts.set (Qn (' W: Eastasia '), U ' arial ') #设置斜体run = paragraph.add_run (U ' Italic, ') Run.italic = true# set Bold run = Paragraph.add_run (U ' Bold '). Bold = true# Add Reference document.add_paragraph (' intense quote ', style= ' intense quote ') #增加无序列表document. Add_paragraph (U ' unordered list element 1 ', style= ' List Bullet ') document.add_paragraph (U ' unordered list element 2 ', style= ' list Bullet ') #增加有序列表document. Add_paragraph (U ' ordered list element 1 ', style= ' list number ') Document.add_paragraph (U ' ordered list element 2 ', style= ' list number ') # Add an image (use the image image.bmp here, and make your own script in the same directory) DocumEnt.add_picture (' Image.bmp ', width=inches (1.25)) #增加表格table = Document.add_table (Rows=1, cols=3) Hdr_cells = Table.rows[0].cellshdr_cells[0].text = ' Name ' hdr_cells[1].text = ' Id ' hdr_cells[2].text = ' Desc ' #再增加3行表格元素for i in Xrange (3): Row_cells = Table.add_row (). Cells Row_cells[0].text = ' Test ' +str (i) Row_cells[1].text = str (i) row_cells[2]. Text = ' desc ' +str (i) #增加分页document. Add_page_break () #保存文件document. Save (U ' Test. docx ')
The document style generated by the snippet is as follows
Note: There is a problem not found how to resolve, that is, how to set the border line for the table. If you know, please also be able to advise.
Read File contents
#coding =utf-8from docx Import document# Open documents = document (U ' test. docx ') #读取每段资料l = [Paragraph.text.encode (' gb2312 ') For paragraph in document.paragraphs]; #输出并观察结果, you can also manipulate the text by other means to read the table material for the I in L:print i# and output the result tables = [table for table in Document.tables];for table in Tables:for row in table.rows: for cell in Row.cells: print Cell.text.encode (' gb231 2 '), ' \ t ', print print ' \ n '
We still use the file we generated just now, we can see that the result of the output is
Note: Here we use gb2312 encoding method to read, mainly to ensure that the Chinese read and write correctly. In general, use the Utf-8 encoding method. In addition, python-docx main processing of docx files, when loading doc files, there is a problem, if there are a large number of doc files, it is recommended to bulk convert the doc file to docx file, such as the use of tools Doc2Doc