Remove redundant code after Word table goes to HTML

Source: Internet
Author: User

Word can be saved as HTML file, through this function, can quickly realize the Web page to display Word content, especially the form of editing, it contains TR, TD, Th, rowspan, colspan and other content, direct write more cumbersome.

But word converted HTML By default is with a lot of format code, then how to remove these redundant code, only the main content?

Originally intended to find tools from the Internet, but found that there is no ready-made, is generally recommended to use the tool text replacement to remove, so it can not be reused. Therefore, I used Nodejs to write a small piece of code, to remove redundant code.

The main ideas are:

    1. Nodejs reading the text contents of an HTML file
    2. Get table contents with substring function
    3. Remove excess labels with regular
    4. Remove excess attributes with regular
    5. Remove extra spaces with regular
varFS = require (' FS ')//Asynchronous ReadFs.readfile (' static/detail/county-hhz.html ',function(err, data) {if(err) {returnConsole.error (ERR); }    //Step 1: Get table content    varContent =data.tostring (); Content= Content.substring (Content.indexof ("<table"), Content.indexof ("</table>") + 8); //Step 2: Remove the excess labels[' span ', ' P ', ' o ', ' Font '].foreach (item ={content= Content.replace (NewRegExp (' <${item} (. *?) > (. *?) <\/${item}.*?> ', ' gi '), function (match, p1, p2) {returnP2;    }); })    //Step 3: Remove the extra attribute elementsContent = Content.replace (/style= ". *?") /g, "");//Remove Style PropertyContent = Content.replace (/(Class|border|cellspacing| msonormaltable|valign|width|center|&nbsp;) (=\s*)/g, ""); //Step 4: Remove the extra spaceContent = Content.replace (/(\s+) (\s+)/g,function(Match, p1, p2) {returnP1 + "; }) Content= Content.replace (/(\s) (>|<)/g,function(Match, p1, p2) {returnP2; }) console.log (content); });

Remove redundant code after Word table goes to HTML

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.