Java implementation HTML Rich Text Export to Word perfect solution

Source: Internet
Author: User

First, the question of the proposed

Recently developed a science and technology project information management system with Java, there is a Project application template to fill in the project declaration information function, there is a science and technology project application Word export function.

Existing implementation: The standard JSP template output implementation, in short, is to render the data into the JSP page, and then save this page as a doc document, in order to achieve word export effect. However, there are several issues:

(1) Because of the exported HTML page format, when Word is opened, the view mode displayed by default is Web layout view;

(2) After the Word document is modified, an associated folder is added, and Word's HTML refers to the resources in the folder, such as styles, pictures, themes, etc., so that if you only transfer the Word document itself, you cannot find the associated resource.

(3) Since some of our field content is filled with a rich text editor (Baidu's Ueditor), there are attachments uploaded (mainly pictures). For the image, the HTML img tag has a src attribute, this src is the server's picture resource path. If this image is to be displayed, the customer's machine must be networked so that Word cannot be archived offline.

(4) This exported word print out of the effect is a mess, unacceptable.

Because of these problems, there is an urgent need to find another solution.

Ii. Available Programmes

On the Internet to find information, summed up two more feasible options.

(1) Make Word template, export to MHT file (Single page Web page format), and then render the data in the template, and finally produce Word document.

(2) Create a Word template, export it to an XML file, and then render the data to the template, resulting in a Word document.

Both of these are the use of the idea of template, template production compared to a single, more than using POI to organize Word format is a lot more simple. The only difference is the format of the exported file, one is an MHT file, and the other is an XML file. In view of the project declaration in this project, the individual fields are implemented by a rich text editor, and saved in the database is a string in HTML format, so we adopt the first scheme, that is, the MHT file is implemented.

Three, the thought arrangement and analysis

1. Analysis of file storage structure for MHT files

Open the Project Declaration template file provided by the customer, for example, select Save As-Other Format menu, select "Saved as MHT file (Single page file)" as shown in:

Once saved, open with a text editor (UltraEdit or sublime, etc.). After opening, there are some rules to follow, the key points are as follows:

(1) The content of the MHT file is encoded in 3DUS-ASCII encoding format, and the Chinese strings are encoded into unreadable content;

(2) MHT is a single page file, which contains a lot of resources, especially to pay attention to the picture resources. We look for "image" and will find "image001" "image002" .... and other related matching values. There are three places for each picture (such as image002).

First, in the HTML <v:shape> tag, the following

Second, in the embedded resource block, the content of the picture is Base64 encoded. The specific format is as follows

Third, at the end of the file, there is a <xml> tag, which has an HREF attribute identifier, the specific content is as follows:

2. Specific implementation ideas

(1) Make Word template, use the syntax of specific template engine (we use freemarker) to generate placeholders, and then export the MHT file;

(2) Organize and process the data, then use the template engine to render the template.

(3) Save the rendered result as a doc file.

The most important thing is to process the data in the 2nd step. Depending on the format requirements of the MHT file, there are several areas that need to be addressed.

(1) Encode the data of string type into 3DUS-ASCII format;

(2) Processing of rich text data. Mainly in the above three places of processing, one, the rich text of the HTML of the IMG processing, conversion to <v:shape> label format; second, take out the actual storage location of the IMG, encode the contents of the image according to Base64, and add the encoded content into the corresponding location In the third, the XML tag at the end of the MHT file is appended with the associated resource to introduce the string.

Iv. implementation steps and Precautions

1. Fill in the placeholder with the template engine syntax rules to make the Word template, save as MHT file.

Save as MHT file, with a text editor open, the main binding statements can not be broken, such as ${PROJECTSBINFO.XMNAMECN} saved as MHT file, may become a ${projectsbinfo.=

XMNAMECN} This format needs to be modified manually.

It is also necessary to insert the base64 of the picture resource in the MHT file and the placeholder for the href reference of the XML. Such as

2. Organize and process data

The general attribute data is simple to organize, it is to get from the database, and the processing is simple. Special attention should be paid to the processing of the following data.

(1) Processing of image elements in HTML. The processing flow is as follows:

(2) Since MHT files are "US-ASCII" encoded, they must be preceded by a 3D prefix. So it takes a bit of substitution to include HTML content.

3. Render the template, and then save the word format.

Related source code: Http://files.cnblogs.com/files/liaofeifight/word.rar

Java implementation HTML Rich Text Export to Word perfect solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.