How to Convert PDF files from editable text and word

Source: Internet
Author: User

-- PDF file format conversion experience
There is no such thing as absolute. First, I tried several software for decryption. It is best to use passware_acrobat key, followed by adult PDF password recovery v2.2.0 and PDF password remover v2.2, as for the Image Scanning text conversion, Chinese words, more trouble, the image is not compressed Tif format, with Tsinghua TH-OCR 9.0 or Wang identification conversion, if only part of the recognition can be removed from the image, the screen capture and recognition can be performed by the chinacache mouse. The above three OCR software can be used in verycd. download from Com. If it is in text format, you can use solid converter PDF to convert it to word for editing and translation. However, solid converter PDF supports many languages, and it should be OK for both English and Traditional Chinese, for English, the text format PDF can be converted using abbyy PDF transformer 1.0. The format is RTF and can be edited in word. The OCR software _ Iris readiris pro v10.0 was just launched, and the speed and effect were both good. Finally, the translation software should look at your preferences. These are my personal tips and are for your reference only!
The recently updated recosoft effec2office personal V2.0 software can also Convert PDF files into Doc formats and support Chinese characters. It is better if you have a professional edition. Iris readiris pro v10.0 also has the Asian language support package OCR. If you can download to the latest version with keygen, you can convert Chinese! At present, we can only wait for a while!
Some software can be downloaded from the ISO and 0day of, or some cannot be found at!

It would be too easy to use English,
Abbyy. finereader. v7.0.professional can directly convert an image PDF file to a doc file,
The text and chart formats remain unchanged,
Unfortunately, finereader supports even Zulu,
That is, Chinese characters are not supported.



Http:// Id = 296


Therefore, Chinese is a little more complex,
Use Adobe Acrobat 5.0/6.0 to save the PDF file as JPG,
Then, how can I perform OCR,
The Hanwang 6.0 recommended by Ken is good,
The best OCR software I have ever used.
Is Adobe Acrobat,
Not Acrobat Reader!

Optical character recognition (OCR) is short for optical character recognition, its working principle is to obtain text and Image Information on paper through optical input devices such as scanners or digital cameras, and analyze text morphological features using various pattern recognition algorithms to determine the standard encoding of Chinese characters, and stored in a text file in a common format. It can be seen that OCR allows computers to recognize words and achieve automatic text input. It is a fast, labor-saving, and efficient text input method.

1. Tsinghua Wentong th-OCR 9.0
TH-OCR was developed by Tsinghua University since 1985. TH is the abbreviation of Tsinghua (Tsinghua). TH-OCR stands for OCR software developed by beijing tsinghua Ziguang Wentong Information Technology Co., Ltd. With the support of the national "863" plan, it has sustained scientific research achievements for more than 10 years and has been upgraded from version 1.0 to version 9.0. Exclusive Chinese-English hybrid sorting at the same time recognition, for the first time in the world broke through the OCR product can only process Chinese characters or English single text limitations, added the Oriental text (Simplified Chinese, Japanese, Korean) recognition function: the recognition level of Japanese and Korean and English documents exceeds the recognition level of Japanese and Korean documents, which has a significant impact on both Chinese and foreign documents, it has been rated as an excellent software product by the China Software Industry Association for three consecutive years and has become a milestone in Chinese Character Input Technology. The TH-OCR9.0 version has been applied to many fields, including e-government, e-publications, newspapers, banks, post, taxation, libraries and so on, and has become a pioneer in domestic OCR market.
All the proposals of the NPC and CPPCC Representatives adopted the TH-OCR9.0 of Tsinghua Ziguang Wentong, a world leading product in the field of intellectual property rights in China, it won praise from staff of the two sessions for its accurate recognition rate and excellent recognition speed. The successful application of Tsinghua TH-OCR Technology in the NPC and CPPCC demonstrates that our country has its own excellent technology.

TH-OCR highlights:
◇ Both Chinese and English are mixed, with the highest recognition rate, ranking first in the world.
◇ It can recognize black and white, gray, and color images and read multiple image formats.
◇ Create an electronic document layout restoration function for the recognition results. What you see is what you get.
◇ The first recognition function is Japanese, Korean, Japanese, English, and Korean. The recognition rate is over 98%.

TH-OCR of the six major advantages:

1. It is the only multi-body character recognition system that can recognize more than 20 thousand Chinese characters. Chinese character recognition is the best in China.

2. Chinese and English hybrid sorting, Japanese and English hybrid sorting, Korean and English hybrid sorting are recognized at the same time.

3. the recognition rate of Chinese characters is the highest. Tsinghua Ziguang Wentong Co., TH-OCR. After the "863" Intelligent Expert Group for 100,000 words Index Evaluation and China Software Evaluation Center for strict testing of products, the recognition accuracy rate exceeds 99.5%, representing the current highest level of printed text recognition.

4. Multiple environment interfaces are supported. Tsinghua Ziguang Wentong TH-OCR supports Windows environment and a variety of Internal Code such as GB, big5, GBK, JIS, SHIFT-JIS and KSC, can be used for Windows NT and Windows 98/2000/XP, suitable for all regions of the world. TH-OCR also has the self-learning function, no matter what uncommon words, can be learned through the keyboard input, greatly broaden the OCR system recognition character set.

5. All the domestic certifications were evaluated as "world-leading" by an appraisal committee composed of experts from the Chinese Emy of Sciences and the Chinese Emy of engineering ".

Crack download: select the third link to download
Note: There are no font restrictions, no scan restrictions, no common error warnings, and perfect cracking.
This is the latest cracked version. It is said that there are no restrictions. After I try it, errors will still occur when exporting it to the RTF format in batches.
Wang Han and Ziguang have their own merits. If you like Ziguang, go down.

2. Han Wang Wenhao 5800
In addition to the features of the classic edition of Wenhao 5800, such as simple and quick operations and one-click scanning and recognition output to Word documents, Wenhao can also accurately recognize various forms and images, moreover, it has added a series of user-friendly functional designs, such as batch engineering processing, table splicing, ACDSee text and text indexing, and excerpt experts. It is equipped with an ultra-thin high-speed scanner with an optical resolution of 1200 dpi X 2400 DPI, 48-bit color enhancement technology, and USB 99.5%. the recognition rate of printed documents can reach more, it can easily recognize over types of printed fonts and text in various text/Text mixing formats.

For users with batch entry requirements, the "engineering files" of Wenhao 5800 can solve many problems in batch entry, and it can automatically save the work progress, when the user opens the project again, It can automatically point to the work breakpoint, eliminating the trouble of repeated search, recognition, and verification.

Note: If you want to directly convert the scanned text or table to the doc, RTF, txt, and other formats, you must pay attention to the placement of printed materials, if the direction is reversed, all the identified text will be garbled. Although there is a print placement direction prompt on the scanner, if it is not actually used, it is difficult for users to determine how to place the scan results in order to obtain the correct scan output results. This also reflects the product detail design deficiencies to a certain extent.

In general, Wang Wenhao, Han Wang, 5800, has a high recognition rate and fast recognition speed when identifying printed materials. For users who need to input a large number of texts and convert traditional printed materials into electronic files, wang Wenhao 5800 is a very good choice. But on the whole, Wenhao 5800's supporting software is not well-designed, the interface is not beautiful enough, and the style is not uniform. many nuances should be improved. Compared with the supporting management software of Hanwang Mingqi, the supporting software of Hanwang Wenhao 5800 is obviously much inferior.

The use of Hanwang YiWang Wenhao 5800 is relatively simple. Its Supporting printed User Manual is illustrated and detailed in hardware installation and software use. The electronic help documentation is relatively simple.

Simple Description: quick entry of a document form
One-click Scanning
Word output
Saves the gray scale adjustment for Traditional Scan input.
Tilt correction and many other steps
Batch Identification and entry of 1000-page manuscripts
Up to 6000 words/minute
Copy a text table graph to the word with one click!

Hanwang Wenhao 5800:
Hanwang Wenhao 5300:
Tom ocr2.5: ftp: // software@

Installation instructions for Hanwang ocr2.5:
After the downloaded package is decompressed, there should be three files: The hwdocsetup folder and the hwdoc upgrade 2.5.exe、 2.5scan tool cracking program _crk.exe.
Installation sequence:
1. First install the main program of Hanwang 2.3 In the hwdocsetup folder
2. Run "hwdoc upgrade 2.5.exe" to upgrade
3. Run " 2.5 _crk.exe to crack
Everything is OK !!!

Although Han Wang has made 5300, 5800, and 6800, only he ocr2.5 is actually completely cracked. I like to use the OCR core of Hanwang, 5300, and 5800. The same is true. It is said that the OCR core is completely cracked, but errors will still occur during batch processing, so I still use 2.5.

3, abbyy finereader OCR professional 7.0
Abbyy finereader 7.0 Professional Edition is the latest and most accurate version of abbyy OCR software. It can provide users with the highest level of literacy accuracy, is a very time-saving good solution. Finereader allows you to convert, edit, and reuse various paper and electronic files, including magazines, newspapers, faxes, copies, and PDF files.
: Http://

Hui Shi Xiaoling rat
The on-screen text recognition system can recognize text information from pictures taken by digital cameras and other devices. The online handwritten text recognition system can write any text without the limitations of the tablet. Hui Shi-ling mouse (including new technologies such as screen reading and mouse handwriting input)
Download disk1.rarto disk6.rar, decompress it to the unified folder, and then run setup.exe in the disk1directory.
Http:// Channelid = 12 & catid = 21 & id = 341

I tried it for a while and it is very useful. It is very convenient to extract text that cannot be copied directly. "huishi" is the most convenient, saving the text input process. However, in the United States, it is not suitable for batch recognition, and it is not convenient for verification. It is better to use it for batch file recognition.

Others include: Shangshu OCR, Hanwang OCR, mengyi OCR, and danqing OCR.

OCR software usage

How to convert a PDF file into text?

This problem can be solved in two parts:
1. If the PDF file itself is converted from word:
For more information about this, see:
Other PDF-to-word tools are also available, such as: "PDF-to-word Tools"
2. If the PDF file itself is converted from a scanned file, the above method will not work. It takes several steps to complete:
1. Convert the PDF file into an image:
Available: "galcott PDF converter" software converts PDF to image format
2. Use OCR software to identify and proofread:
We recommend that you use "Hanwang ocr2.5"
FTP: // software@
Although Wang has produced 5300, 5800, and 6800 million records, only the ocr2.5 files of the Chinese King are completely cracked. The batch file processing mode can be used for automatic identification and verification.
3. output to text:
After the recognition is proofread, you can use my previous "OCR assistant" software.
Delete redundant line breaks and merge them into a single text file.
4. Perform the final modification in word.

How can I convert a PDG file into text?

The simplest way is to use the OCR feature provided by the super star for text recognition, but the effect and efficiency are .........
Recommended Methods:
The general principle is to first convert the PDG into an image, then use professional software to identify and proofread it, and finally output it as a text.
1. Convert PDG to image
First install the image capture software "snagit"
Let you install this software, instead of using it to capture the superstar graph ~~~, What we need is its "virtual printing" function (You must select and install virtual printing during installation ).
Usage: In "Super Star", open the book for format conversion, and then "print". In the displayed window, select "snagit" for the printer, and set the output image to "Black and White" (such as "color", the output file is terrible .), After the printing is complete, the main interface of the snagit program is automatically displayed. Save it.
2. Text Recognition and Proofreading
3 ,...........
4 .............
See the preceding description.
After learning how to convert a PDF or PDG file to a text file, we can find that the key is to convert the source file to an image format before text recognition. I recommend using snagit and Hanwang OCR, common, fast, and convenient.
If you encounter CAJ files from the Chinese Journal network and NLC files from the National graph... and so on, you don't need to wait .......

Other Instructions:
If you want to ocr pdf files, it seems that the file size should not be an obstacle, because we will output each page of the PDF file into an image file (as long as your disk space permits ).
If a book is a PDF file, it is much easier to operate. If a book is composed of multiple PDF files, you need to perform repeated operations.
If you just want to OCR part of the content, you can use the "print" method (see Forum = 6 & topic = 289 & show = 0), and select the corresponding page number when printing.
If you want to split or merge PDF files, you can use the PDF split-merge software ,:
Http:// Url = 8080/down/HB-PDFSM11-fxj.ZIP
In addition, the serial: 1.4 of PDF converter 3861794



Office2003 Convert PDF files to Word documents


After my attempt, I found that Microsoft Office document imaging component in Office 2003 can be used to convert PDF to Word documents, that is, word can be used to complete this task. The method is as follows:
Open the PDF file you want to convert with Adobe Reader, and then select the "File> Print" menu, in the print window that opens, set the name in the printer column to Microsoft Office document image writer. After confirmation, the PDF file is output as a virtual print file in the MDI format.
Note: If the "Microsoft Office document image writer" item is not found, use "Add/delete component" on the Office 2003 installation disc to update and install the component, and select "Microsoft draw converter ".
Then, Run "Microsoft Office document imaging" and use it to open the saved MDI file. Select "Tools> send text to word, in the displayed window, select "keep the Image Layout unchanged during output". After confirmation, the system will prompt "you must re-run OCR before performing this operation. This may take some time ", regardless of it, just confirm.
Note: the recognition rate for converting PDF files to Doc is not perfect, and the original typographical format will be lost after conversion. Therefore, you need to manually typeset and proofread it after conversion.
The above is only available in word2003, and other versions do not have Microsoft Office document image writer.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.