Ways to generate Word and PDF documents using Python programs

Last Update:2017-02-20 Source: Internet

Author: User

Tags wkhtmltopdf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the use of Python program to generate word and PDF documents, the text gives a detailed introduction and sample code, I believe that we have a certain reference value, the need for friends below to see it together.

I. How to export Word documents by program

Exporting web/html content as a world document, there are many solutions in Java, such as using Jacob, Apache POI, Java2word, Itext and so on, and using a template engine like Freemarker. There are some methods in PHP, but there are few ways to generate a world document for web/html content in Python. The most difficult solution is how to use the JS code to get the populated data asynchronously, the picture is exported to the Word document.

1. Unoconv

Function:

1. Support the conversion of local HTML documents to a DOCX-formatted document, so you need to save the HTML file in your Web page locally before calling Unoconv for conversion. The conversion effect is also good, the use of the method is very simple.

\# installing sudo apt-get install unoconv\# using unoconv-f PDF *.odtunoconv-f doc *.odtunoconv-f html *.odt

Disadvantages:

1. Only static HTML can be converted, and there is no place for the page to get data asynchronously using AJAX (mainly to ensure that there is data in the HTML file saved from the Web page).

2. Only the HTML can be converted, if the page has the use of JS code such as echarts,highcharts generated pictures, it is not possible to convert these pictures into a Word document;

3. The resulting Word document content format is not easy to control.

2. Python-docx

Function:

1.python-docx is a Python library that can read and write Word documents.

How to use:

1. Get the data from the Web page and add it to the Word document using Python manual typesetting.

From docx import documentfrom docx.shared Import inchesdocument = document () document.add_heading (' document Title ', 0) p = d Ocument.add_paragraph (' A Plain paragraph having some ') P.add_run (' bold '). Bold = Truep.add_run (' and some ') P.add_run (' Italic. '). Italic = truedocument.add_heading (' Heading, Level 1 ', level=1) document.add_paragraph (' intense quote ', style= ' Intensequote ') document.add_paragraph (' first item in unordered list ', style= ' Listbullet ') document.add_paragraph (' First item in ordered list ', style= ' Listnumber ') document.add_picture (' Monty-truth.png ', width=inches (1.25)) Table = Document.add_table (Rows=1, cols=3) hdr_cells = Table.rows[0].cellshdr_cells[0].text = ' Qty ' hdr_cells[1].text = ' Id ' HDR _cells[2].text = ' Desc ' for item in recordset:row_cells = Table.add_row (). Cells Row_cells[0].text = str (item.qty) Row_cell S[1].text = str (item.id) Row_cells[2].text = Item.descdocument.add_page_break () document.save (' Demo.docx ')

From docx import documentfrom docx.shared Import inchesdocument = Document () for row in range (9): t = document.add_table (RO Ws=1,cols=1,style = ' Table Grid ') T.autofit = False #很重要! W = float (row)/2.0 T.columns[0].width = Inches (w) document.save (' Table-step.docx ')

Disadvantages:

function is very weak. There are many restrictions, such as not supporting templates and so on, only the simple format of Word documents can be generated.

Ii. Procedures for exporting PDF document methods

1.pdfkit

Function:

1.wkhtmltopdf is used primarily for HTML-generated PDFs.

2.pdfkit is a wkhtmltopdf-based Python package that supports the conversion of URLs, local files, text content to PDFs, and ultimately calls the Wkhtmltopdf command. Python is currently exposed to generate PDF effect is better.

Advantages:

1.wkhtmltopdf: Using the WebKit kernel to convert HTML to PDF

WebKit is an efficient, open-source browser kernel that is used by browsers, including Chrome and Safari. Chrome prints the functionality of the current page, with one option being to "Save as PDF" directly.

2.wkhtmltopdf uses the WebKit kernel's PDF rendering engine to convert HTML pages to PDFs. High Fidelity, the conversion quality is very good, and the use is very simple.
How to use:

\# install pip install pdfkit\# using import pdfkitpdfkit.from_url (' http://google.com ', ' out.pdf ') pdfkit.from_file (' test.html ') , ' Out.pdf ') pdfkit.from_string (' hello! ', ' out.pdf ')

Disadvantages:

1. The icon generated for JS code such as Echarts,highcharts cannot be converted to PDF (because it functions primarily to convert HTML to PDF instead of converting JS to pdf). The conversion effect for a purely static page is still good.

2. Other

Other plugins that generate PDFs are: WEASYPRINT,REPORTLAB,PYPDF2 and so on, the simple test is not as good as the pdfkit effect, and some usage is complex.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More