This article mainly introduces the use of Python program to generate word and PDF documents, the text gives a detailed introduction and sample code, I believe that we have a certain reference value, the need for friends below to see it together.
I. How to export Word documents by program
Exporting web/html content as a world document, there are many solutions in Java, such as using Jacob, Apache POI, Java2word, Itext and so on, and using a template engine like Freemarker. There are some methods in PHP, but there are few ways to generate a world document for web/html content in Python. The most difficult solution is how to use the JS code to get the populated data asynchronously, the picture is exported to the Word document.
1. Unoconv
Function:
1. Support the conversion of local HTML documents to a DOCX-formatted document, so you need to save the HTML file in your Web page locally before calling Unoconv for conversion. The conversion effect is also good, the use of the method is very simple.
\# installing sudo apt-get install unoconv\# using unoconv-f PDF *.odtunoconv-f doc *.odtunoconv-f html *.odt
Disadvantages:
1. Only static HTML can be converted, and there is no place for the page to get data asynchronously using AJAX (mainly to ensure that there is data in the HTML file saved from the Web page).
2. Only the HTML can be converted, if the page has the use of JS code such as echarts,highcharts generated pictures, it is not possible to convert these pictures into a Word document;
3. The resulting Word document content format is not easy to control.
2. Python-docx
Function:
1.python-docx is a Python library that can read and write Word documents.
How to use:
1. Get the data from the Web page and add it to the Word document using Python manual typesetting.
From docx import documentfrom docx.shared Import inchesdocument = document () document.add_heading (' document Title ', 0) p = d Ocument.add_paragraph (' A Plain paragraph having some ') P.add_run (' bold '). Bold = Truep.add_run (' and some ') P.add_run (' Italic. '). Italic = truedocument.add_heading (' Heading, Level 1 ', level=1) document.add_paragraph (' intense quote ', style= ' Intensequote ') document.add_paragraph (' first item in unordered list ', style= ' Listbullet ') document.add_paragraph (' First item in ordered list ', style= ' Listnumber ') document.add_picture (' Monty-truth.png ', width=inches (1.25)) Table = Document.add_table (Rows=1, cols=3) hdr_cells = Table.rows[0].cellshdr_cells[0].text = ' Qty ' hdr_cells[1].text = ' Id ' HDR _cells[2].text = ' Desc ' for item in recordset:row_cells = Table.add_row (). Cells Row_cells[0].text = str (item.qty) Row_cell S[1].text = str (item.id) Row_cells[2].text = Item.descdocument.add_page_break () document.save (' Demo.docx ')
From docx import documentfrom docx.shared Import inchesdocument = Document () for row in range (9): t = document.add_table (RO Ws=1,cols=1,style = ' Table Grid ') T.autofit = False #很重要! W = float (row)/2.0 T.columns[0].width = Inches (w) document.save (' Table-step.docx ')
Disadvantages:
function is very weak. There are many restrictions, such as not supporting templates and so on, only the simple format of Word documents can be generated.
Ii. Procedures for exporting PDF document methods
1.pdfkit
Function:
1.wkhtmltopdf is used primarily for HTML-generated PDFs.
2.pdfkit is a wkhtmltopdf-based Python package that supports the conversion of URLs, local files, text content to PDFs, and ultimately calls the Wkhtmltopdf command. Python is currently exposed to generate PDF effect is better.
Advantages:
1.wkhtmltopdf: Using the WebKit kernel to convert HTML to PDF
WebKit is an efficient, open-source browser kernel that is used by browsers, including Chrome and Safari. Chrome prints the functionality of the current page, with one option being to "Save as PDF" directly.
2.wkhtmltopdf uses the WebKit kernel's PDF rendering engine to convert HTML pages to PDFs. High Fidelity, the conversion quality is very good, and the use is very simple.
How to use:
\# install pip install pdfkit\# using import pdfkitpdfkit.from_url (' http://google.com ', ' out.pdf ') pdfkit.from_file (' test.html ') , ' Out.pdf ') pdfkit.from_string (' hello! ', ' out.pdf ')
Disadvantages:
1. The icon generated for JS code such as Echarts,highcharts cannot be converted to PDF (because it functions primarily to convert HTML to PDF instead of converting JS to pdf). The conversion effect for a purely static page is still good.
2. Other
Other plugins that generate PDFs are: WEASYPRINT,REPORTLAB,PYPDF2 and so on, the simple test is not as good as the pdfkit effect, and some usage is complex.