This article mainly introduces how to use python programs to generate word and PDF documents. This article provides a detailed introduction and sample code, and I believe it will be of reference value to everyone, if you need it, let's take a look. This article mainly introduces how to use python programs to generate word and PDF documents. This article provides a detailed introduction and sample code, and I believe it will be of reference value to everyone, if you need it, let's take a look.
I. Procedure for exporting word documents
Export web/html content as world documents, and there are many solutions in java, such as using Jacob, Apache POI, Java2Word, iText, and other methods, and use a template engine like freemarker. Php also has some corresponding methods, but there are few methods for generating world documents from web/html content in python. The most difficult solution is how to asynchronously retrieve the filled data using js code and export the images to the Word documents.
1. unoconv
Function:
1. you can convert a local html document to a document in docx format. Therefore, you need to save the html file on the webpage to your local computer before calling unoconv for conversion. The conversion effect is also good, and the usage is very simple.
\ # Install sudo apt-get install unoconv \ # use unoconv-f pdf *. odtunoconv-f doc *. odtunoconv-f html *. odt
Disadvantages:
1. only static html can be converted. ajax cannot be used to obtain data asynchronously on the page (mainly to ensure that there is data in the html file saved from the web page ).
2. only html can be converted. if images generated by js code such as echarts and highcharts are used on the page, they cannot be converted to Word documents;
3. the format of the generated word documents is not easy to control.
2. python-docx
Function:
1. python-docx is a python library that can read and write Word documents.
Usage:
1. obtain the data on the webpage and add it to the word documents manually using python layout.
from docx import Documentfrom docx.shared import Inchesdocument = Document()document.add_heading('Document Title', 0)p = document.add_paragraph('A plain paragraph having some ')p.add_run('bold').bold = Truep.add_run(' and some ')p.add_run('italic.').italic = Truedocument.add_heading('Heading, level 1', level=1)document.add_paragraph('Intense quote', style='IntenseQuote')document.add_paragraph( 'first item in unordered list', style='ListBullet')document.add_paragraph( 'first item in ordered list', style='ListNumber')document.add_picture('monty-truth.png', width=Inches(1.25))table = document.add_table(rows=1, cols=3)hdr_cells = table.rows[0].cellshdr_cells[0].text = 'Qty'hdr_cells[1].text = 'Id'hdr_cells[2].text = 'Desc'for item in recordset: row_cells = table.add_row().cells row_cells[0].text = str(item.qty) row_cells[1].text = str(item.id) row_cells[2].text = item.descdocument.add_page_break()document.save('demo.docx')
From docx import Documentfrom docx. shared import Inchesdocument = Document () for row in range (9): t = document. add_table (rows = 1, cols = 1, style = 'Table grid') t. autofit = False # very important! W = float (row)/2.0 t. columns [0]. width = inches(ww.document.save('table-step.docx ')
Disadvantages:
Weak functions. There are many restrictions, for example, templates are not supported. you can only generate Word documents in simple format.
II. Procedure for Exporting PDF documents
1. Development Kit
Function:
1. wkhtmltopdf is mainly used to generate PDF in HTML.
2.pdf kit is a python package based on wkhtmltopdf. it supports conversion from URL, local files, text content to PDF. In the end, it still calls the wkhtmltopdf command. Is currently exposed to python to generate pdf better results.
Advantages:
1. wkhtmltopdf: convert HTML to PDF using webkit kernel
Webkit is an efficient and open-source browser kernel, which is used by browsers including Chrome and Safari. Chrome prints the current webpage. one option is to directly "Save as PDF ".
2. wkhtmltopdf converts HTML pages to PDF using the PDF rendering engine of webkit kernel. High Fidelity, good conversion quality, and easy to use.
Usage:
\ # Install pip install development kit \ # Use import Export KitKit. from_url ('http: // google.com ', 'out.}'{kit.from_file('test.html', 'output') to install kit. from_string ('Hello! ', 'Output ')
Disadvantages:
1. the icons generated by js code such as echarts and highcharts cannot be converted to pdf (because its function is mainly to convert html to pdf rather than JavaScript to pdf ). The pure static page conversion effect is good.
2. Miscellaneous
Other pdf ins that generate pdf files include weasyprint, reportlab, and PyPDF2. after a simple experiment, these plug-ins are not as effective as the development kit and have complicated usage.
For more information about how to use python programs to generate word and PDF documents, refer to PHP!