Example of PHP implementing Word to HTML document

Source: Internet
Author: User
Tags ini

For the perfect solution, office to convert PDF or HTML, preferably with Windows Office software, LibreOffice not perfect conversion, WPS has no API.

First confirm that the COM module is not open, phpinfo inside if there is a com_dotnet module, the description has been opened, if not, modify the php.ini,
Com.allow_dcom = True

The previous comments removed, restart on OK, PHP official website said, php5.4.5 before the COM module is built, in fact, is not necessarily all, the official website of the PHP 5.3.39,com module is not built.

If not the built-in module, PHP.ini Plus, the premise of your Ext folder, there is the extension

Extension=php_com_dotnet.dll

And then reboot, OK.


function word2html ($wordname, $htmlname)
{
$word = new COM ("Word.Application") or Die ("Unable to instanciate word");
$word->visible = 1;
$word->documents->open ($wordname);
$word->documents[1]->saveas ($htmlname, 8);
$word->quit ();
$word = null;
Unset ($word);
}

word2html (' D:/www/test/6.docx ', ' d:/www/test/6.html ');

Attention:

1, converted out of the HTML, view the source code, the more chaotic
2, the Winword.exe is invoked during the conversion process
3, if the page has been loaded, rename the document and then turn it back on.

Add an example


function Lego_clean ($text) {

$text = Implode ("\ r", $text);

Normalize White
$text = Eregi_replace ("[[: Space:]]+", "", $text);
$text = Str_replace ("> <", ">\r\r<", $text);
$text = Str_replace ("<br>", "<br>\r", $text);

Remove everything before <body>
$text = Strstr ($text, "<body");

Keep tags, strip attributes
$text = ereg_replace ("<p [^>]*bodytextindent[^>]*>" ([^\n|\n\015|\015\n]*) </p> "," <p>\\1< /p> ", $text);
$text = Eregi_replace ("<p [^>]*margin-left[^>]*>" ([^\n|\n\015|\015\n]*) </p> "," <blockquote> \\1</blockquote> ", $text);
$text = Str_replace ("", "", $text);

Clean up whatever are left inside <p> and <li>
$text = Eregi_replace ("<p [^>]*>", "<p>", $text);
$text = Eregi_replace ("<li [^>]*>", "<li>", $text);

Kill unwanted Tags
$text = Eregi_replace ("</?span[^>]*>", "", $text);
$text = Eregi_replace ("</?body[^>]*>", "", $text);
$text = Eregi_replace ("</?div[^>]*>", "", $text);
$text = Eregi_replace ("<\![ ^>]*> "," ", $text);
$text = Eregi_replace ("</?[ A-z]\:[^>]*> "," ", $text);

Kill style and on mouse* tags
$text = Eregi_replace ("([\f\r\t\n\ ']) style=[^>]+", "\\1", $text);
$text = Eregi_replace ("([\f\r\t\n\ ']) on[a-z]+=[^>]+", "\\1", $text);

Remove empty paragraphs
$text = Str_replace ("<p></p>", "", $text);

Remove closing $text = Str_replace ("
Clean up white space again
$text = Eregi_replace ("[[: Space:]]+", "", $text);
$text = Str_replace ("> <", ">\r\r<", $text);
$text = Str_replace ("<br>", "<br>\r", $text);
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.