Knowledge management system data solution development diary 14 how to program PDF Format documents

Source: Internet
Author: User

This seriesArticleThis section describes how to process local documents and Internet data. The topic to be discussed today is how to convert the PDF document format to the Editable RTF format. The PDF format itself is read-only and must be edited by software such as Acrobat professional, or converted to PDF format after being edited in word. This is what I want to achieve.

If PDF is selected, all PDF documents in the path will be processed and then imported into the document database.
Well, for this purpose, I tried the following methods:

    1. Adobe Acrobat SDK is an official development kit for processing PDF documents. It should also be the most authoritative processing tool. It seems that the authorization is expensive and the free version cannot be found.
    2. the pdf. Net toolkit is quite good. It contains all aspects of processing the PDF format and provides a complete example of hosting Code . The trial version can only process the first few pages of the PDF document, and the trial watermark is added.
    3. aspose pdf. NET and aspone recognition are also powerful and easy to use. There are some restrictions when converting PDF to DOC format. After all, it is not a professional OCR toolkit. When your PDF format contains Chinese characters, it cannot be processed. As described in the official known limitations, predefined composite fonts encodings used for Chinese, Japanese and Korean limits ages are not implemented. These language packs are also separately installed when operating systems are installed.
    4. Solid converter PDF: 95% of the content is completely retained and correctly converted from PDF to Doc formats. Unfortunately, there is no development kit and it cannot be called programmatically.
    5. convert a PDF file to a TIFF file, and then convert the TIFF file to a doc file using OCR. Find some toolkit, which is a control in OCX format, mainly unstable, and does not provide pure. Net packaging. In theory,. Net can call a component package in the com format. In fact, some details about how to use com are not handled well, And. Net calls may also cause some problems.
    6. Solid PDF tools is also a good tool for converting PDF to Doc. The most important thing is that it can accept command line parameters, which can be called by My Program .

After several layers of selection, the third-party toolkit is called using command line parameters. In this way, the converted format is good, that is, the solid PDF tools is selected as the tool of the 5th items. Therefore, in order for the doc handler in Your Data Loader to work properly to process PDF files, please install this tool first. Data Loader converts PDF files with it.

See, this is the interface effect. This software is small, but it is not inferior to Adobe professional in editing PDF format. The most important thing I like is that it supports command line calling and can be called by. Net or anyProgramming LanguageCall.

Let's take a look at how the. NET call code is written. Its method is as follows:

  Public   Static   String Convertpdftodoc ( String PDF ){ String Outputfolder = path. getdirectoryname (PDF ); String Outputfile = path. Combine (path. getdirectoryname (PDF ),
Path. getfilenamewithoutextension (PDF) +". Doc" ); String Pathwithfile = pdf. Replace ( @"\",@" \\ "); Outputfolder = outputfolder. Replace (@" \ ",@" \\ "); String samplescript = @" </Filename ( "+ Pathwithfile + @" )> Fileopen </worddocumenttype/doc/outputfolder ( "+ Outputfolder + @" )/Reconstructionmode/flowing/launchviewer False > Converttoword exit "; Solidscript = new solidscript (" Solid PDF Tools ");
// Replace solid PDF tools with the product you're Scripting Solidscript. runscript (samplescript ); Return Outputfile ;}

Yes, this is all the details of Data Loader's internal processing of the PDF file format. Then, call solidscript to generate a process call and pass the parameters to the solid PDF tools process. If you do not understand the meaning of the above. Net code, you can refer to the following process.

Open notepad and enter the following script code,

</Filename (G: \ tddownload \ solidwatcher \ bin \ debug \ fileloud)> fileopen

</Worddocumenttype/doc/outputfolder (G: \ eclipse \ doc)

/Reconstructionmode/flowing

/Launchviewer false> converttoword

Exit

Run the command again. Run CMD and enter the following command:

In this way, you can see how solid PDF tools processes the PDF Format of input parameters. Encapsulate this process as. Net script code, which is the convert‑todoc method you see above.

 

If you are not trying to convert PDF to Doc in code mode, you can try the following tools. They are all small green software and help you Convert PDF to Doc.

    1. 4 media PDF to word Converter
    2. Anybizsoftpdfconverterportable
    3. Nitro PDF
    4. PDF to Word RTF Converter
    5. Simpow.converterportable

These five tools are very good. Green, small size. If you need to call command line, I recommend solid PDF tools.

In the process of converting PDF to DOC format, optical recognition technology is used. It converts the PDF file to the TIFF format and the TIFF file to the doc format. For OCR technology, abbyy SDK is the most famous in the industry. It is one of the most excellent OCR technology package.

Let's take a look at its product abbyy finereader 9. The interface effect is as follows:

It can directly connect to the hardware device and convert the scanned content to the Editable DOC format. If you have the energy, you can try out the abbyy SDK and provide C ++ and. Net platforms and Development Kits for various platform languages. Here, I began to like C ++ code. In the hosting language field, there are powerful Visual Studio, customized C # Language and Its Compiler, a large number of third-party component vendors. However, in the field of unmanaged code. Without the monopoly dominance of Microsoft, software companies have their expertise to survive the technical resources. Therefore, if you feel that your development tools are not easy to use and the sdks you have purchased are not friendly enough, I think this is your opportunity to improve and cut in.

 

Go to epn.codeplex.com to download the latest data loader program and related development documents.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.