- [Root@localhost ~]# mkdir-p/lcf/upan
- [Root@localhost ~]# mkdir-p/lcf/cdrom
- [Root@localhost ~]# mkdir-p/lcf/xpdf
- [Root@localhost ~]# cd/lcf/upan/
- [Root@localhost upan]# CP xpdf/*. /xpdf/(downloaded files into the/lcf/xpdf directory)
- [Root@localhost upan]# CD. /xpdf/
- [Root@localhost xpdf]# TAR-ZXVF xpdfbin-linux-3.03.tar.gz
- [Root@localhost xpdf]# CD xpdfbin-linux-3.03
- [Root@localhost xpdfbin-linux-3.03]# Cat INSTALL
- [Root@localhost xpdfbin-linux-3.03]# CD bin32/
- [Root@localhost bin32]# CP./*/usr/local/bin/
- [Root@localhost bin32]# CD. /doc/
- [Root@localhost doc]# mkdir-p/usr/local/man/man1
- [Root@localhost doc]# mkdir-p/usr/local/man/man5
- [Root@localhost doc]# CP *.1/usr/local/man/man1
- [Root@localhost doc]# CP *.5/usr/local/man/man5
Copy CodeIf you don't need to read Chinese, you can end up here, and if you want, we'll go back.
- [Root@localhost doc]# CP SAMPLE-XPDFRC/USR/LOCAL/ETC/XPDFRC
- [Root@localhost xpdf]# Cd/lcf/xpdf
- [Root@localhost xpdf]# TAR-ZXVF xpdf-chinese-simplified.tar.gz
- [Root@localhost xpdf]# CD xpdf-chinese-simplified
- [Root@localhost xpdf]# mkdir-p/usr/local/share/xpdf/chinese-simplified
- [Root@localhost xpdf]# CD xpdf-chinese-simplified/
- [Root@localhost xpdf-chinese-simplified]# cp Adobe-gb1.cidtounicode Iso-2022-cn.unicodemap EUC-CN.unicodeMap Gbk.unicodemap cmap/usr/local/share/xpdf/chinese-simplified/
Copy CodeCopy the contents of the chinese-simplified inside the file ADD-TO-XPDFRC to the/USR/LOCAL/ETC/XPDFRC file. Remember that the path inside is correct. (Note that the Simplified Chinese package includes the following three formats: ISO-2022-CN,EUC-CN,GBK, see Clearly, do not support UTF-8, can be converted to GBK, and then escaped) Third, the realization of the function at this point, all the configuration is complete, we have to start using it. If it is a simple PDF read, then just use the following statement is OK. $content = shell_exec ('/usr/local/bin/pdftotext ' $filename. '-'); If you need to change to Chinese, this and so on, plus parameters. $content = Shell_exec ('/usr/local/bin/pdftotext-layout-enc GBK '. $filename. '-'); Of course, the addition of parameters will still not affect the conversion of English, so, rest assured that use it. Need to note is, here is GBK encoded Oh, now the site is a lot of UTF-8, want to not show garbled words, need to escape again oh. $content = mb_convert_encoding ($content, ' UTF-8 ', ' GBK '); Read the content, you can write the code to handle it yourself. Main parameters of Pdftotext: Optionsmany of the following options can be set with configuration file Com-mands. These is listed in square brackets with the description of thecorresponding command line option.-f numberspecifies the fi RST page to convert.-l numberspecifies the last page to Convert.-layoutmaintain (as best as possible) the original Physica L layout ofthe text. The default is to ' Undo ' physical layout (columns,hyphenation, etc.) and output the text in reading order.-fixed Numberass Ume Fixed-pitch (or tabular) text, with the specified Charac-ter width (in points). This forces physical layout Mode.-raw Keep theText in content stream order. This is a hack whichoften "undoes" column formatting, etc. Use of raw mode was Nolonger recommended.-htmlmetagenerate a simple HTML file, including the meta information. This simply wraps the text in and prepends Themeta Headers.-enc encoding-name |