Python batch extract pdf file text scripts, python extract pdf scripts

Source: Internet
Author: User

Python batch extract pdf file text scripts, python extract pdf scripts

This article provides examples of how to extract text from PDF files in Python in batches for your reference. The specific content is as follows:

First, run pip install unzip miner3k to install the extension library that processes PDF files.

Import osimport sysimport timepdfs = (pdfs for pdfs in OS. listdir ('. ') if response s.endswith('shanghai') for pdf1 in pdfs: pdf = encoding 1.replace ('','_'). replace ('-','_'). replace ('&', '_') OS. rename (pdf1, pdf) print ('=' * 30) print (pdf) txt = pdf [: -4] + '.txt 'exe = '"' + sys.exe cutable + '" "'cmd2txt = OS .path.dirname(sys.exe cutable) cmd2txt = cmd2txt +' \ scripts \ cmd2txt. py "-o 'try: # Call the command line tool cmd2txt. py conversion # If the pdf is encrypted, rewrite the following code # Use-P before-o to specify the password cmd = exe + 20.2txt + txt + ''+ pdf OS. popen (cmd) # The conversion takes some time. Generally, 2 seconds is enough for a small file. sleep (2) # output the converted text, First 200 characters with open (txt, encoding = 'utf8') as fp: print (fp. read (200) failed T: pass

Source: python hut

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.