Handling instances of MS Word with Python

Source: Internet
Author: User
Tags print print
This article mainly introduces the use of Python processing MS Word instance, has a certain reference value, now share to everyone, the need for friends can refer to

Use the Python tool to read and write MS Word files (docx and Doc files), mainly using the Python-docx package. This article gives some common operations, and completes a sample to help you get started quickly.

Installation

Pyhton processing a docx file requires the use of the Python-docx package, which can be easily installed with the PIP tool and the PIP tool in the Scripts folder under the Python installation path

Pip Install Python-docx

Of course you can also choose to use Easy_install or manual installation

Write file contents

Here we give a sample directly, to extract the useful content according to our own needs

#coding =utf-8from docx Import documentfrom docx.shared import ptfrom docx.shared import inchesfrom docx.oxml.ns import qn# Open Documents Document = document () #加入不同等级的标题document. add_heading (U ' MS Word write Test ', 0) document.add_heading (U ' level header ', 1) document.add_heading (U ' two level title ', 2) #添加文本paragraph = document.add_paragraph (U ' We're doing text testing!) ') #设置字号run = Paragraph.add_run (U ' Set font size, ') run.font.size = Pt #设置字体run = Paragraph.add_run (' Set font, ') Run.font.name = ' Consolas ' #设置中文字体run = Paragraph.add_run (U ' set Chinese font, ') run.font.name=u ' arial ' r = Run._elementr.rpr.rfonts.set (Qn (' W: Eastasia '), U ' arial ') #设置斜体run = paragraph.add_run (U ' Italic, ') Run.italic = true# set Bold run = Paragraph.add_run (U ' Bold '). Bold = true# Add Reference document.add_paragraph (' intense quote ', style= ' intense quote ') #增加无序列表document. Add_paragraph (U ' unordered list element 1 ', style= ' List Bullet ') document.add_paragraph (U ' unordered list element 2 ', style= ' list Bullet ') #增加有序列表document. Add_paragraph (U ' ordered list element 1 ', style= ' list number ') Document.add_paragraph (U ' ordered list element 2 ', style= ' list number ') # Add an image (use the image image.bmp here, and make your own script in the same directory) DocumEnt.add_picture (' Image.bmp ', width=inches (1.25)) #增加表格table = Document.add_table (Rows=1, cols=3) Hdr_cells = Table.rows[0].cellshdr_cells[0].text = ' Name ' hdr_cells[1].text = ' Id ' hdr_cells[2].text = ' Desc ' #再增加3行表格元素for i in Xrange (3): Row_cells = Table.add_row (). Cells Row_cells[0].text = ' Test ' +str (i) Row_cells[1].text = str (i) row_cells[2]. Text = ' desc ' +str (i) #增加分页document. Add_page_break () #保存文件document. Save (U ' Test. docx ')

The document style generated by the snippet is as follows

Note: There is a problem not found how to resolve, that is, how to set the border line for the table. If you know, please also be able to advise.

Read File contents

#coding =utf-8from docx Import document# Open documents = document (U ' test. docx ') #读取每段资料l = [Paragraph.text.encode (' gb2312 ')  For paragraph in document.paragraphs]; #输出并观察结果, you can also manipulate the text by other means to read the table material for the I in L:print i# and output the result tables = [table for table in Document.tables];for table in Tables:for row in table.rows: for  cell in Row.cells:   print Cell.text.encode (' gb231 2 '), ' \ t ',  print print ' \ n '

We still use the file we generated just now, we can see that the result of the output is

Note: Here we use gb2312 encoding method to read, mainly to ensure that the Chinese read and write correctly. In general, use the Utf-8 encoding method. In addition, python-docx main processing of docx files, when loading doc files, there is a problem, if there are a large number of doc files, it is recommended to bulk convert the doc file to docx file, such as the use of tools Doc2Doc

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.