The text in a PDF file can be exported and modified.
Likewise, illustrations in PDF files can be extracted.
Convert PDF to plain text:
Pdftotext-enc gbk godson2e-data.Sheet.pdf text.gbk.txt
-ENC (encoding)
Refer to the encoding format mentioned in/etc/xpdf/xpdfrc. For Chinese, use GBK
You can.
Illustration extraction:
Drawing Images godson2e-data.Sheet.pdf img
This command generates n ppm files (in BMP format)
File Name Is img-00 ?. Ppm
? = (1 ~ N)
You can use the convert tool in ImageMagick to convert it to the desired format:
Convert img-001.ppm img-001.jpg
Or
Convert img-001.ppm img-001.eps
The above method is successfully tested in Ubuntu.
Specifically, pdfimages and pdftotext are from the xpdf-utils package and xpdf Chinese encoding.
Supports xpdf-Chinese-Simplified packages. In other words, to implement the above functions
Installation:
Xpdf-utils
Xpdf-Chinese-Simplified
Installation Method:
Aptitude install xpdf-utils xpdf-Chinese-Simplified
If you do not have aptitude, you can use apt-Get
Apt-Get install xpdf-utils xpdf-Chinese-Simplified