PHP read PDF content with xpdf use _php instance

Source: Internet
Author: User
Tags mkdir

A. Download
First, let's get the data down first.
If you don't need to transfer to Chinese, just download it: Xpdf-bin-linux-3.03.tar, if you need to transfer Chinese, you'll need it again: Xpdf-chinese-simplified.tar

two. Installation
Now that the download is complete, we can install it.
[Root@localhost ~]# mkdir-p/lcf/upan
[Root@localhost ~]# mkdir-p/lcf/cdrom
[Root@localhost ~]# mkdir-p/lcf/xpdf
[Root@localhost ~]# cd/lcf/upan/
[Root@localhost upan]# CP xpdf/*. /xpdf/(download files into the/lcf/xpdf directory)
[Root@localhost upan]# CD ... /xpdf/
[Root@localhost xpdf]# TAR-ZXVF xpdfbin-linux-3.03.tar.gz
[Root@localhost xpdf]# CD xpdfbin-linux-3.03
[Root@localhost xpdfbin-linux-3.03]# Cat INSTALL
[Root@localhost xpdfbin-linux-3.03]# CD bin32/
[Root@localhost bin32]# CP./*/usr/local/bin/
[Root@localhost bin32]# CD ... /doc/
[Root@localhost doc]# mkdir-p/usr/local/man/man1
[Root@localhost doc]# mkdir-p/usr/local/man/man5
[Root@localhost doc]# CP *.1/usr/local/man/man1
[Root@localhost doc]# CP *.5/usr/local/man/man5
If you don't need to read Chinese, you can end up here, if you need to, then we'll continue.
[Root@localhost doc]# CP SAMPLE-XPDFRC/USR/LOCAL/ETC/XPDFRC
[Root@localhost xpdf]# Cd/lcf/xpdf
[Root@localhost xpdf]# TAR-ZXVF xpdf-chinese-simplified.tar.gz
[Root@localhost xpdf]# CD xpdf-chinese-simplified
[Root@localhost xpdf]# mkdir-p/usr/local/share/xpdf/chinese-simplified
[Root@localhost xpdf]# CD xpdf-chinese-simplified/
[Root@localhost xpdf-chinese-simplified]# cp Adobe-gb1.cidtounicode Iso-2022-cn.unicodemap EUC-CN.unicodeMap Gbk.unicodemap cmap/usr/local/share/xpdf/chinese-simplified/
Copy the contents of the Chinese-simplified file ADD-TO-XPDFRC to the/USR/LOCAL/ETC/XPDFRC file. Remember the path inside to be correct. (Note that the Simplified Chinese package includes the following three formats: ISO-2022-CN,EUC-CN,GBK, see Clearly, do not support UTF-8, you can switch to GBK first, and then escape)

three. function Realization
Now that all the configuration is complete, we're going to start using it.
If it is a simple PDF read, then just use the following statement to OK.
$content = shell_exec ('/usr/local/bin/pdftotext '. $filename. ');
If you need to transfer to Chinese, and so on, add parameters.
$content = Shell_exec ('/usr/local/bin/pdftotext-layout-enc GBK '. $filename. ');
Of course, after adding the parameter is still not affect the conversion of English, so, rest assured use it. Need to note that, here turned out is GBK code Oh, now a lot of the site is UTF-8, want to not display garbled words, need to escape again oh.
$content = mb_convert_encoding ($content, ' UTF-8 ', ' GBK ');
So far, it's done. read out the content, you want to use, and then write code to deal with it.
Finally add the parameters of Pdftotext to everyone.

The main parameters are as follows:
OPTIONS
Many of the following options can is set with configuration file com-
Mands. These are listed into square brackets with the description of the
corresponding command line option.
-F Number
Specifies the page to convert.
-L Number
Specifies the last page to convert.
-layout
Maintain (as best as possible) the original physical layout of
The text. The default is to ' Undo ' physical Layout (columns,
Hyphenation, etc.) and output the text in reading order.
-fixed number
Assume Fixed-pitch (or tabular) text, with the specified charac-
ter width (in points). This forces physical layout mode.
-raw Keep the text in content stream order. This is a hack which
Often "undoes" column formatting, etc. Use the raw mode is no
Longer recommended.
-htmlmeta
Generate a simple HTML file, including the meta information.
This simply wraps the "text in <pre>" </pre> and prepends the
Meta headers.
-enc Encoding-name

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.