要想完美解決,office轉pdf或者html,最好還是用windows office軟體,libreoffice不能完美轉換,wps沒有api。
先確認com模組是不是開啟,phpinfo裡面如果有com_dotnet模組,說明已開啟,如果沒有,修改php.ini,
com.allow_dcom = true
前面的注釋去掉,重啟就OK了,php官方網站說,php5.4.5之前,com模組是內建的,其實也不一定全是,官網下的php 5.3.39,com模組就沒有內建。
如果不是內建模組的話,php.ini加上,前提你的ext檔案夾下,有該擴充
extension=php_com_dotnet.dll
然後重啟就OK了
function word2html($wordname,$htmlname)
{
$word = new COM("word.application") or die("Unable to instanciate Word");
$word->Visible = 1;
$word->Documents->Open($wordname);
$word->Documents[1]->SaveAs($htmlname,8);
$word->Quit();
$word = null;
unset($word);
}
word2html('D:/www/test/6.docx','D:/www/test/6.html');
注意:
1,轉換出來的html,查看源碼,比較亂的
2,轉換過程中會調用winword.exe
3,如果頁面一直在載入,把文檔重新命名,然後在重新轉。
補充一個例子
function lego_clean($text) {
$text = implode("\r",$text);
// normalize white space
$text = eregi_replace("[[:space:]]+", " ", $text);
$text = str_replace("> <",">\r\r<",$text);
$text = str_replace("<br>","<br>\r",$text);
// remove everything before <body>
$text = strstr($text,"<body");
// keep tags, strip attributes
$text = ereg_replace("<p [^>]*BodyTextIndent[^>]*>([^\n|\n\015|\015\n]*)</p>","<p>\\1</p>",$text);
$text = eregi_replace("<p [^>]*margin-left[^>]*>([^\n|\n\015|\015\n]*)</p>","<blockquote>\\1</blockquote>",$text);
$text = str_replace(" ","",$text);
//clean up whatever is left inside <p> and <li>
$text = eregi_replace("<p [^>]*>","<p>",$text);
$text = eregi_replace("<li [^>]*>","<li>",$text);
// kill unwanted tags
$text = eregi_replace("</?span[^>]*>","",$text);
$text = eregi_replace("</?body[^>]*>","",$text);
$text = eregi_replace("</?div[^>]*>","",$text);
$text = eregi_replace("<\![^>]*>","",$text);
$text = eregi_replace("</?[a-z]\:[^>]*>","",$text);
// kill style and on mouse* tags
$text = eregi_replace("([ \f\r\t\n\'\"])style=[^>]+", "\\1", $text);
$text = eregi_replace("([ \f\r\t\n\'\"])on[a-z]+=[^>]+", "\\1", $text);
//remove empty paragraphs
$text = str_replace("<p></p>","",$text);
//remove closing </html>
$text = str_replace("</html>","",$text);
//clean up white space again
$text = eregi_replace("[[:space:]]+", " ", $text);
$text = str_replace("> <",">\r\r<",$text);
$text = str_replace("<br>","<br>\r",$text);
}