PDF operation and conversion in Linux
If PDF is electronic paper, pdftk is electronic dingtalk, punching machine, adhesive, decryption ring and X-ray lens. Pdftk is a simple tool that can perform various daily operations on PDF documents. Pdftk allows you to easily and freely operate PDF files. It does not need Acrobat and can run on Linux, Windows, Mac OS X, FreeBSD, and Solaris. In Debian/Ubuntu, you can install pdftk through apt:
$ Sudo aptitude install pdftk
Merge two or more PDF files into a new document
$ Pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
Or (use a handle ):
$ Pdftk a00001.pdf b00002.pdf cat a B output 12.pdf
Or (use wildcards ):
$ Pdftk *. pdf cat output combined.pdf
Separate selected pages from multiple PDF files to form a new document
$ Pdftk amo-one% B %two%cat A1-7 A8 output combined%
Rotate the first page of the PDF 90 degrees clockwise
$ Pdftk in1_cat 1E 2-end output outband
Rotate the page of the entire PDF document 180 degrees
$ Pdftk in1_cat 1-endS output outband
Encrypt a PDF with 128 bits (default) and retain all rights (default)
$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foopass
Same as above. The only exception is that you need a password to open this PDF file.
$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foo user_pw baz
Same as above. The exception is that printing is allowed (after a PDF file is opened)
$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foo user_pw baz allow printing
Encrypt a PDF file
$ Pdftk secured1_input_pw foopass output unsecured.pdf
Merge two files, one of which is encrypted (the output is not encrypted)
$ Pdftk a=secured1_mydoc1_input_pw A = foopass cat output combined.pdf
Decompress the PDF page stream to edit the PDF code in the text editor.
$ Pdftk mydoc1_output mydoc.clear?uncompress
Fix an XREF table and stream length damaged by a PDF file (if possible)
$ Pdftk broken=output fixed.pdf
Split a single document into one page, and report the relevant data to doc_data.txt.
$ Pdftk mydoc.pdf burst
Report the metadata, bookmarks, and page tags of PDF documents
$ Pdftk mydoc1_dump_data output report.txt
Poppler is a PDF rendering library based on xpdf-3.0 code. The Poppler-utils package includes the pdftops (PDF to Postscript converter), pdfinfo (PDF Document Information Extraction Tool), pdfimages (PDF image Extraction Tool), and pdftohtml (PDF to HTML Converter ), pdftotext (PDF to text Converter), and pdffonts (PDF Font analyzer ). Debian/Ubuntu users can install poppler through apt:
$ Sudo aptitude install poppler-utils
Convert PDF to TEXT
Pdftotext converts a portable PDF file into plain text.
$ Export totext example.pdf example.txt
If the text file is not specified, pdftotextconverts file.txt to file.txt. If the text file is? -', The document will be delivered to the standard output.
Convert 3rd to 7 pages (including 3 and 7) use:
$ Export totext-f 3-l 7 example.pdf example.txt
Extract only 3rd pages
$ Export totext-f 3-l 3 example.pdf example.txt
$ Export totext-layout example.txt
The preceding command can maintain the original physical layout and output the text in the reading order. If you do not want to insert a page separator, you can set the-nopgbrk option. If the PDF file is password protected, you can set the-opw (owner password) or-upw (User Password) option.
Extract images from PDF
Pdfimages extracts images from portable PDF files and saves them as portable pixel maps (PPM), portable bitmaps (PBM), or JPEG files. Unzip images reads PDF files, scans one or more pages, and writes each image to a PPM, PBM, or JPEG file named image-root-nnn.xxx, where nnn is the image number, xxx is the image type (. ppm ,. pbm ,. jpg ). Original images extracts original image data from PDF files without any additional changes. The rotation, cutting, and Color Reversal actions in any PDF content stream are ignored.
$ Pfdimages example.pdf exampleimage
This command extracts all images from example.pdf. The image is saved in the PPM format.
Save the image as JPG using the-j Option
$ Pfdimages-j example.pdf exampleimage
Use the-f and-l options to create the start and end pages. To scan pages 3rd to 7 (including 3 and 7), use:
$ Pfdimages-f 3-l 7 example.pdf exampleimage
Scan a specified page only:
$ Pfdimages-f 3-l 3 example.pdf exampleimage
If the PDF file has password protection, use the-opw and-upw options:
-The opw has a password.
-Upw User Password
Convert PDF to HTML
Pdftohtml is a program that converts pdf files into html files. It generates output in the current working directory.
Usage:
$ Export tohtml file1_file.html
If you want to see the image, you need to use the-c (that is, "complex") option:
$ Export tohtml-c file=file.html
Convert PDF to image
First, you must have installed ImageMagick on your machine. To install ImageMagick On Debian/Ubuntu, run the following command:
$ Sudo aptitude install imagemagick
To convert a pdf file to an image, run the 'convert' command:
$ Convert doc.pdf doc.jpeg
Convert to tiff
$ Convert doc.pdf doc. tiff
This article permanently updates the link address: