PHP reads the contents of a PDF with Xpdf

Source: Internet
Author: User
    1. [Root@localhost ~]# mkdir-p/lcf/upan
    2. [Root@localhost ~]# mkdir-p/lcf/cdrom
    3. [Root@localhost ~]# mkdir-p/lcf/xpdf
    4. [Root@localhost ~]# cd/lcf/upan/
    5. [Root@localhost upan]# CP xpdf/*. /xpdf/(downloaded files into the/lcf/xpdf directory)
    6. [Root@localhost upan]# CD. /xpdf/
    7. [Root@localhost xpdf]# TAR-ZXVF xpdfbin-linux-3.03.tar.gz
    8. [Root@localhost xpdf]# CD xpdfbin-linux-3.03
    9. [Root@localhost xpdfbin-linux-3.03]# Cat INSTALL
    10. [Root@localhost xpdfbin-linux-3.03]# CD bin32/
    11. [Root@localhost bin32]# CP./*/usr/local/bin/
    12. [Root@localhost bin32]# CD. /doc/
    13. [Root@localhost doc]# mkdir-p/usr/local/man/man1
    14. [Root@localhost doc]# mkdir-p/usr/local/man/man5
    15. [Root@localhost doc]# CP *.1/usr/local/man/man1
    16. [Root@localhost doc]# CP *.5/usr/local/man/man5
Copy Code

If you don't need to read Chinese, you can end up here, and if you want, we'll go back.

    1. [Root@localhost doc]# CP SAMPLE-XPDFRC/USR/LOCAL/ETC/XPDFRC
    2. [Root@localhost xpdf]# Cd/lcf/xpdf
    3. [Root@localhost xpdf]# TAR-ZXVF xpdf-chinese-simplified.tar.gz
    4. [Root@localhost xpdf]# CD xpdf-chinese-simplified
    5. [Root@localhost xpdf]# mkdir-p/usr/local/share/xpdf/chinese-simplified
    6. [Root@localhost xpdf]# CD xpdf-chinese-simplified/
    7. [Root@localhost xpdf-chinese-simplified]# cp Adobe-gb1.cidtounicode Iso-2022-cn.unicodemap EUC-CN.unicodeMap Gbk.unicodemap cmap/usr/local/share/xpdf/chinese-simplified/
Copy Code

Copy the contents of the chinese-simplified inside the file ADD-TO-XPDFRC to the/USR/LOCAL/ETC/XPDFRC file. Remember that the path inside is correct. (Note that the Simplified Chinese package includes the following three formats: ISO-2022-CN,EUC-CN,GBK, see Clearly, do not support UTF-8, can be converted to GBK, and then escaped)

Third, the realization of the function at this point, all the configuration is complete, we have to start using it. If it is a simple PDF read, then just use the following statement is OK. $content = shell_exec ('/usr/local/bin/pdftotext ' $filename. '-'); If you need to change to Chinese, this and so on, plus parameters. $content = Shell_exec ('/usr/local/bin/pdftotext-layout-enc GBK '. $filename. '-'); Of course, the addition of parameters will still not affect the conversion of English, so, rest assured that use it. Need to note is, here is GBK encoded Oh, now the site is a lot of UTF-8, want to not show garbled words, need to escape again oh. $content = mb_convert_encoding ($content, ' UTF-8 ', ' GBK '); Read the content, you can write the code to handle it yourself. Main parameters of Pdftotext: Optionsmany of the following options can be set with configuration file Com-mands. These is listed in square brackets with the description of thecorresponding command line option.-f numberspecifies the fi RST page to convert.-l numberspecifies the last page to Convert.-layoutmaintain (as best as possible) the original Physica L layout ofthe text. The default is to ' Undo ' physical layout (columns,hyphenation, etc.) and output the text in reading order.-fixed Numberass Ume Fixed-pitch (or tabular) text, with the specified Charac-ter width (in points). This forces physical layout Mode.-raw Keep theText in content stream order. This is a hack whichoften "undoes" column formatting, etc. Use of raw mode was Nolonger recommended.-htmlmetagenerate a simple HTML file, including the meta information. This simply wraps the text in

and prepends Themeta Headers.-enc encoding-name

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.