A. Download
First, let's get the data down first.
If you don't need to transfer to Chinese, just download it: Xpdf-bin-linux-3.03.tar, if you need to transfer Chinese, you'll need it again: Xpdf-chinese-simplified.tar
two. Installation
Now that the download is complete, we can install it.
[Root@localhost ~]# mkdir-p/lcf/upan
[Root@localhost ~]# mkdir-p/lcf/cdrom
[Root@localhost ~]# mkdir-p/lcf/xpdf
[Root@localhost ~]# cd/lcf/upan/
[Root@localhost upan]# CP xpdf/*. /xpdf/(download files into the/lcf/xpdf directory)
[Root@localhost upan]# CD ... /xpdf/
[Root@localhost xpdf]# TAR-ZXVF xpdfbin-linux-3.03.tar.gz
[Root@localhost xpdf]# CD xpdfbin-linux-3.03
[Root@localhost xpdfbin-linux-3.03]# Cat INSTALL
[Root@localhost xpdfbin-linux-3.03]# CD bin32/
[Root@localhost bin32]# CP./*/usr/local/bin/
[Root@localhost bin32]# CD ... /doc/
[Root@localhost doc]# mkdir-p/usr/local/man/man1
[Root@localhost doc]# mkdir-p/usr/local/man/man5
[Root@localhost doc]# CP *.1/usr/local/man/man1
[Root@localhost doc]# CP *.5/usr/local/man/man5
If you don't need to read Chinese, you can end up here, if you need to, then we'll continue.
[Root@localhost doc]# CP SAMPLE-XPDFRC/USR/LOCAL/ETC/XPDFRC
[Root@localhost xpdf]# Cd/lcf/xpdf
[Root@localhost xpdf]# TAR-ZXVF xpdf-chinese-simplified.tar.gz
[Root@localhost xpdf]# CD xpdf-chinese-simplified
[Root@localhost xpdf]# mkdir-p/usr/local/share/xpdf/chinese-simplified
[Root@localhost xpdf]# CD xpdf-chinese-simplified/
[Root@localhost xpdf-chinese-simplified]# cp Adobe-gb1.cidtounicode Iso-2022-cn.unicodemap EUC-CN.unicodeMap Gbk.unicodemap cmap/usr/local/share/xpdf/chinese-simplified/
Copy the contents of the Chinese-simplified file ADD-TO-XPDFRC to the/USR/LOCAL/ETC/XPDFRC file. Remember the path inside to be correct. (Note that the Simplified Chinese package includes the following three formats: ISO-2022-CN,EUC-CN,GBK, see Clearly, do not support UTF-8, you can switch to GBK first, and then escape)
three. function Realization
Now that all the configuration is complete, we're going to start using it.
If it is a simple PDF read, then just use the following statement to OK.
$content = shell_exec ('/usr/local/bin/pdftotext '. $filename. ');
If you need to transfer to Chinese, and so on, add parameters.
$content = Shell_exec ('/usr/local/bin/pdftotext-layout-enc GBK '. $filename. ');
Of course, after adding the parameter is still not affect the conversion of English, so, rest assured use it. Need to note that, here turned out is GBK code Oh, now a lot of the site is UTF-8, want to not display garbled words, need to escape again oh.
$content = mb_convert_encoding ($content, ' UTF-8 ', ' GBK ');
So far, it's done. read out the content, you want to use, and then write code to deal with it.
Finally add the parameters of Pdftotext to everyone.
The main parameters are as follows:
OPTIONS
Many of the following options can is set with configuration file com-
Mands. These are listed into square brackets with the description of the
corresponding command line option.
-F Number
Specifies the page to convert.
-L Number
Specifies the last page to convert.
-layout
Maintain (as best as possible) the original physical layout of
The text. The default is to ' Undo ' physical Layout (columns,
Hyphenation, etc.) and output the text in reading order.
-fixed number
Assume Fixed-pitch (or tabular) text, with the specified charac-
ter width (in points). This forces physical layout mode.
-raw Keep the text in content stream order. This is a hack which
Often "undoes" column formatting, etc. Use the raw mode is no
Longer recommended.
-htmlmeta
Generate a simple HTML file, including the meta information.
This simply wraps the "text in <pre>" </pre> and prepends the
Meta headers.
-enc Encoding-name