Python3.x:pdf2htmlex (parsing pdf) Introduction to installation and use
Pdf2htmlex is a great tool for converting PDFs into HTML.
Download
Windows:http://soft.rubypdf.com/software/pdf2htmlex-windows-version
Installation
Download Pdf2htmlex-win32-0.14.6-with-poppler-data.zip, directly decompression, you can use;
Test
In the DOS window, switch to the Unzip directory:
CD/D D:\pdf2htmlEX-win32-0.14.6
Enter the test command:
Pdf2htmlex-v
The result is that the installation was successful;
pdf2html command Usage
Usage: Pdf2htmlex [options] <input.pdf> [<output.html>] -F,--first-page <int> start Page to convert (default: 1) -L,--last-page <int> The last page to convert (default: 2147483647) --zoom <fp>Zoom ratio--fit-width <fp> Fit Width <fp>Pixel--fit-height <fp> Fit Height <fp>Pixel--use-cropbox <int> Use the Cut box (default:1) --hdpi <fp> Image Horizontal resolution (default:144) --VDPI <fp> Image Vertical resolution (default:144) --embed <string>specifies which elements should be embedded in the output--EMBED-CSS <int> embed a CSS file in the output (default:1) --embed-font <int> embed font files in the output (default:1) --embed-image <int> embed the picture file in the output (default:1) --embed-javascript <int> embed JavaScript files in the output (default:1) --embed-outline <int> embed the link in the output (default:1) --split-pages <int>splitting a page into a separate file (default:0)--dest-dir <string> Specify target directory (default:".") --css-filename <string> The file name of the generated CSS file (default:"") --page-filename <string> Split page name (default:"") --outline-filename <string> generated link file name (default:"") --process-nontext <int> render chart lines, except text (default:1) --process-outline <int> Show links in html (default:1) --printing <int> Support Printing (default:1) --fallback <int>output in Standby mode (default:0)--embed-external-font <int> embed locally matched external fonts (default:1) --font-format <string> embedded font file suffix (ttf,otf,woff,svg) (default:"Woff") --decompose-ligature <int> decomposed ligaturesfi (default:0)--auto-hint <int>do not prompt when using fonts on FontForge autohint (default:0)--external-hint-tool <string> font External prompt tool (Overrides--auto-hint) (default:"") --stretch-narrow-glyph <int>stretch narrow glyphs instead of padding (default:0)--squeeze-wide-glyph <int> shrinks a wide glyph, rather than truncate (default:1) --override-fstype <int> Clear the fstype bitsinchttf/OTF fonts (default:0)--process-type3 <int> convert Type 3 fonts forWeb (Experimental) (default:0)--heps <fp> merged text horizontal threshold, in pixels (default:1) --veps <fp> Vertical threshold forMerging text,inchPixels (default:1) --space-threshold <fp> Hyphenation Threshold (critical value * em) (default:0.125) --font-size-multiplier <fp> a value greater than 1 increases rendering accuracy (default:4) --space-as-offset <int>Use the space character as an offset (default:0)--tounicode <int> How to deal with Tounicode CMap (0=auto, 1=force,-1=ignore) (default:0)--optimize-text <int>minimize the number of HTML elements used for text (default:0)--bg-format <string> Specify the background image format (default:"PNG") -O,--owner-password <string>owner password (in order to encrypt files)-U,--user-password <string>user password (in order to encrypt files)--NO-DRM <int>override DRM settings for a document (default:0)--clean-tmp <int> Delete temporary files after conversion (default:1) --data-dir <string> specified data directory (default:". \share\pdf2htmlex") --debug <int>Print debug Information (default:0)-V,--Version print copyright and release information-H,--Help print Usage assistance information
Example of calling Pdf2htmlex in Python3
Python3.x:pdf2htmlex (parsing pdf) Installation and use