Python reads PDF document __python

Source: Internet
Author: User
#-*-Coding:utf-8-*-
#读取pdf文档 from

pdfminer.pdfparser import pdfparser,pdfdocument
from Pdfminer.pdfinterp import Pdfresourcemanager, pdfpageinterpreter from
pdfminer.pdfdevice import Pdfdevice
Import pdfminer.pdfinterp


#获取文档对象
fp = open ("Naacl06-shinyama.pdf", "RB")
#创建一个与文档关联的解释器
parser= Pdfparser (FP)
#PDF文档对象
doc = pdfdocument ()
#链接解释器和文档对象
parser.set_document (doc)
doc.set_ Parser (parser)

#初始化文档
doc.initialize ("")

#创建pdf资源管理器
resource = Pdfresourcemanager ()

# Parameter Analyzer
Laparam = Laparams ()

#创建一个聚合器
device = Pdfpageaggregator (Resource,laparams=laparam)

# Create a PDF page interpreter
interpreter=pdfpageinterpreter (device,device)

#使用文档对象得到页面的集合 for
page in Doc.get_pages ( ):
    #使用页面解释器来读取
    interpreter.process_page (page)

    #使用聚合器来获取内容
    layout=  device.get_result () In

    layout:
        print (Out.get_text ())
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.