Python converts Word files to PDF files in batches,

Source: Internet
Author: User
Tags glob

Python converts Word files to PDF files in batches,

This article shares with you how to convert Word files into PDF files in batches in python for your reference. The details are as follows:

1. Purpose

Use the omnipotent Python tool to convert all Word files in a directory into PDF files.

2. traverse directories

The author summarizes three methods for directory traversal, as shown below.

2. 1. Call glob

Traverse all files and folders in a specified directory without recursion. You need to manually perform recursive traversal.

import glob as gbpath = gb.glob('d:\\2\\*')for path in path: print path

2. Call OS. walk

Traverses all files and folders in a specified directory, recursively traversing, and providing powerful functions. We recommend that you use them.

import osfor dirpath, dirnames, filenames in os.walk('d:\\2\\'): for file in filenames:  fullpath = os.path.join(dirpath, file)  print fullpath, file

2. DIY

Traverse all the files and folders in a specified directory, recursively traverse, write independently, and have high scalability.

import os; files = list(); def DirAll(pathName):  if os.path.exists(pathName):   fileList = os.listdir(pathName);   for f in fileList:    if f=="$RECYCLE.BIN" or f=="System Volume Information":     continue;    f=os.path.join(pathName,f);    if os.path.isdir(f):      DirAll(f);        else:     dirName=os.path.dirname(f);     baseName=os.path.basename(f);     if dirName.endswith(os.sep):      files.append(dirName+baseName);     else:      files.append(dirName+os.sep+baseName); DirAll("D:\\2\\"); for f in files:  print f # print f.decode('gbk').encode('utf-8'); 

2. 4. Remarks

Note: If the file name or file path is garbled during traversal, you can refer to the references in this article to solve the problem.

3. Convert the Word file to PDF

The Windows Com component (win32com) is used to call the Word Service (Word. Application) to convert Word to PDF files. Therefore, it is required that the Python program be run on a Windows machine with the Word service (or at least version 2007 may be required.

# Coding: utf8import OS, sysreload (sys) sys. setdefaultencoding ('utf8') from win32com. client import Dispatch, constants, gencacheinput = 'd: \ 2 \ test \ 11.docx 'output = 'd: \ 2 \ test \ 22.pdf 'print 'input file', inputprint 'output file', output # enable python COM support for Word 2007 # this is generated by: makepy. py-I "Microsoft Word 12.0 Object Library" gencache. ensureModule ('{00020905-0000-0000-C000-000000000046}', 0, 8, 4) # Start to convert w = Dispatch ("Word. application ") try: doc = w. documents. open (input, ReadOnly = 1) doc. exportAsFixedFormat (output, constants. wdExportFormatPDF, \ Item = constants. wdExportDocumentWithMarkup, CreateBookmarks = constants. wdExportCreateHeadingBookmarks) failed T: print 'exception' finally: w. quit (constants. wdDoNotSaveChanges) if OS. path. isfile (output): print 'translate success 'else: print 'translate fail'

4. batch conversion

To implement batch quasi-change, combine the functions of step 1 and Step 2 to directly add the code.

#-*-Coding: UTF-8-*-# doc2pdf. py: python script to convert doc to pdf with bookmarks! # Requires Office 2007 SP2 # Requires python for win32 extensionimport glob as gbimport sysreload (sys) sys. setdefaultencoding ('utf8') ''' reference: http://blog.csdn.net/rumswell/article/details/7434302'''import sys, osfrom win32com. client import Dispatch, constants, gencache # from config import REPORT_DOC_PATH, report_cmd_pathreport_doc_path = 'd:/2/doc/'report _ cmd_path = 'd: /2/doc/'# convert Word to javasdef word2pdf (fi Lename): input = filename + '.docx' output = filename + 'delimiter' using _name = output # judge whether the file exists OS. chdir (REPORT_DOC_PATH) if not OS. path. isfile (input): print u '% s not exist' % input return False # The document path must be an absolute path, because the current path after Word is started is not the current path when the script is called. If (not OS. path. isabs (input): # determine whether the path is an absolute path # OS. chdir (REPORT_DOC_PATH) input = OS. path. abspath (input) # returns the absolute path else: print u '% s not absolute path' % input return False if (not OS. path. isabs (output): OS. chdir (report_cmd_path) output = OS. path. abspath (output) else: print u '% s not absolute path' % output return False try: print input, output # enable python COM support for Word 2007 # this is generated: makepy. py-I "Microsoft Word 12.0 Object Library" gencache. ensureModule ('{00020905-0000-0000-C000-000000000046}', 0, 8, 4) # Start to convert w = Dispatch ("Word. application ") try: doc = w. documents. open (input, ReadOnly = 1) doc. exportAsFixedFormat (output, constants. wdExportFormatPDF, \ Item = constants. wdExportDocumentWithMarkup, CreateBookmarks = constants. wdExportCreateHeadingBookmarks) failed T: print 'exception' finally: w. quit (constants. wdDoNotSaveChanges) if OS. path. isfile (partition _name): print 'translate success 'return True else: print 'translate fail' return False else t: print 'exception' return-1if _ name _ = '_ main _': # img_path = gb. glob (REPORT_DOC_PATH + "*") # for path in img_path: # print path # rc = word2pdf (path) # rc = word2pdf ('1') # print rc, # if rc: # sys. exit (rc) # sys. exit (0) import OS for dirpath, dirnames, filenames in OS. walk (REPORT_DOC_PATH): for file in filenames: fullpath = OS. path. join (dirpath, file) print fullpath, file rc = word2((file.rstrip('.docx '))

5. References

Convert word 2007 documents into PDF files using Python

Traverse all folders and file paths in a directory and output Chinese garbled characters

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.