PDF operation and conversion in Linux

Source: Internet
Author: User
Tags imagemagick password protection

PDF operation and conversion in Linux

If PDF is electronic paper, pdftk is electronic dingtalk, punching machine, adhesive, decryption ring and X-ray lens. Pdftk is a simple tool that can perform various daily operations on PDF documents. Pdftk allows you to easily and freely operate PDF files. It does not need Acrobat and can run on Linux, Windows, Mac OS X, FreeBSD, and Solaris. In Debian/Ubuntu, you can install pdftk through apt:

$ Sudo aptitude install pdftk

Merge two or more PDF files into a new document

$ Pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf

Or (use a handle ):

$ Pdftk a00001.pdf b00002.pdf cat a B output 12.pdf

Or (use wildcards ):

$ Pdftk *. pdf cat output combined.pdf

Separate selected pages from multiple PDF files to form a new document

$ Pdftk amo-one% B %two%cat A1-7 A8 output combined%

Rotate the first page of the PDF 90 degrees clockwise

$ Pdftk in1_cat 1E 2-end output outband

Rotate the page of the entire PDF document 180 degrees

$ Pdftk in1_cat 1-endS output outband

Encrypt a PDF with 128 bits (default) and retain all rights (default)

$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foopass

Same as above. The only exception is that you need a password to open this PDF file.

$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foo user_pw baz

Same as above. The exception is that printing is allowed (after a PDF file is opened)

$ Pdftk mydoc1_output mydoc.128.pdf owner_pw foo user_pw baz allow printing

Encrypt a PDF file

$ Pdftk secured1_input_pw foopass output unsecured.pdf

Merge two files, one of which is encrypted (the output is not encrypted)

$ Pdftk a=secured1_mydoc1_input_pw A = foopass cat output combined.pdf

Decompress the PDF page stream to edit the PDF code in the text editor.

$ Pdftk mydoc1_output mydoc.clear?uncompress

Fix an XREF table and stream length damaged by a PDF file (if possible)

$ Pdftk broken=output fixed.pdf

Split a single document into one page, and report the relevant data to doc_data.txt.

$ Pdftk mydoc.pdf burst

Report the metadata, bookmarks, and page tags of PDF documents

$ Pdftk mydoc1_dump_data output report.txt

Poppler is a PDF rendering library based on xpdf-3.0 code. The Poppler-utils package includes the pdftops (PDF to Postscript converter), pdfinfo (PDF Document Information Extraction Tool), pdfimages (PDF image Extraction Tool), and pdftohtml (PDF to HTML Converter ), pdftotext (PDF to text Converter), and pdffonts (PDF Font analyzer ). Debian/Ubuntu users can install poppler through apt:

$ Sudo aptitude install poppler-utils

Convert PDF to TEXT

Pdftotext converts a portable PDF file into plain text.

$ Export totext example.pdf example.txt

If the text file is not specified, pdftotextconverts file.txt to file.txt. If the text file is? -', The document will be delivered to the standard output.

Convert 3rd to 7 pages (including 3 and 7) use:

$ Export totext-f 3-l 7 example.pdf example.txt

Extract only 3rd pages

$ Export totext-f 3-l 3 example.pdf example.txt

$ Export totext-layout example.txt

The preceding command can maintain the original physical layout and output the text in the reading order. If you do not want to insert a page separator, you can set the-nopgbrk option. If the PDF file is password protected, you can set the-opw (owner password) or-upw (User Password) option.

Extract images from PDF

Pdfimages extracts images from portable PDF files and saves them as portable pixel maps (PPM), portable bitmaps (PBM), or JPEG files. Unzip images reads PDF files, scans one or more pages, and writes each image to a PPM, PBM, or JPEG file named image-root-nnn.xxx, where nnn is the image number, xxx is the image type (. ppm ,. pbm ,. jpg ). Original images extracts original image data from PDF files without any additional changes. The rotation, cutting, and Color Reversal actions in any PDF content stream are ignored.

$ Pfdimages example.pdf exampleimage

This command extracts all images from example.pdf. The image is saved in the PPM format.

Save the image as JPG using the-j Option

$ Pfdimages-j example.pdf exampleimage

Use the-f and-l options to create the start and end pages. To scan pages 3rd to 7 (including 3 and 7), use:

$ Pfdimages-f 3-l 7 example.pdf exampleimage

Scan a specified page only:

$ Pfdimages-f 3-l 3 example.pdf exampleimage

If the PDF file has password protection, use the-opw and-upw options:

-The opw has a password.

-Upw User Password

Convert PDF to HTML

Pdftohtml is a program that converts pdf files into html files. It generates output in the current working directory.

Usage:

$ Export tohtml file1_file.html

If you want to see the image, you need to use the-c (that is, "complex") option:

$ Export tohtml-c file=file.html

Convert PDF to image

First, you must have installed ImageMagick on your machine. To install ImageMagick On Debian/Ubuntu, run the following command:

$ Sudo aptitude install imagemagick

To convert a pdf file to an image, run the 'convert' command:

$ Convert doc.pdf doc.jpeg

Convert to tiff

$ Convert doc.pdf doc. tiff

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.