Code for exporting Web pages as Word documents in PHP _php tutorial

Source: Internet
Author: User
Generally, there are 2 ways to export doc documents, one is to use COM, and as an extension of PHP to install to the server, and then create a COM, call its methods. Servers that have installed Office can call a COM called word.application, which can generate a Word document, but I don't recommend it because the execution is inefficient (I tested it and the server actually opens a word client when executing the code). The ideal COM should be no interface, the data conversion in the background, so the effect will be better, but these extensions generally need to charge.

The 2nd method is to use PHP to write the contents of our Doc document directly into a file suffix doc. Using this approach requires no reliance on third-party extensions and is more efficient to perform.

Word itself is very powerful, it can open HTML-formatted files, and can preserve the format, even if the suffix is doc, it can also recognize the normal open. This gives us the convenience. But there is a problem, the HTML format of the file in the image only one address, the real picture is saved elsewhere, that is, if the HTML format is written in Doc, then Doc will not contain the picture. So how do we create doc documents that contain images? We can use the MHT format that is close to HTML.

The MHT format is similar to HTML, except that in MHT format, externally linked files are encoded and stored by Base64, Javascript, and CSS. As a result, a single MHT file can hold all the resources in a Web page, and of course it will be larger in size than HTML.

Can the MHT format be recognized by word? I saved a webpage as MHT, then changed the suffix to Doc, opened it with Word, Ok,word also recognized the MHT file, and can display the picture.

Well, since Doc can recognize MHT, here's how to put the picture into MHT. Since the address of the image in the HTML code is written in the SRC attribute of the img tag, the image address can be obtained as long as the SRC attribute value in the HTML code is extracted. Of course, it is possible that you get a relative path, it doesn't matter, plus the URL prefix, change to absolute path. With the image address, we can get the specific contents of the picture file through the File_get_content function, and then call the Base64_encode function to encode the contents of the file into Base64 encoding, and finally insert into the appropriate location of the MHT file.

Finally, we have two ways to send the file to the client, one is to generate a doc document on the server side, then record the address of the doc document, and finally, through the header ("Location:xx.doc"), you can let the client download the doc. Another is to send the HTML request directly, modify the header portion of the HTML protocol, set its content-type to Application/doc, set the content-disposition to attachment, followed by the file name, After the HTML protocol is sent, the contents of the file are sent directly to the client, and the client can be downloaded to the doc document.


Through the above principles introduced, I believe you should have a preliminary understanding of the process of implementation, below I give an export function, this function can export HTML code into an MHT document, the parameters are 3, of which the last 2 are optional parameters
Content: HTML code to convert
Absolutepath: If the image address in the HTML code is a relative path, then this parameter is the absolute path missing from the HTML code.
Iseraselink: Whether to remove hyperlinks in HTML code
Returns the contents of the file with the MHT value, which you can save by File_put_content to a file with a suffix named doc
The main function of this function is to parse all the image addresses in the HTML code and download them sequentially. After getting the contents of the picture, call the Mhtfilemaker class and add the picture to the MHT file. Specific add-on details, encapsulated in the Mhtfilemaker class.
Copy CodeThe code is as follows:
* Get Word document content based on HTML code
* Create a document that is essentially MHT, which parses the contents of a file and downloads the picture resource from the page remotely
* This function depends on the class Mhtfilemaker
* This function parses the IMG tag and extracts the SRC attribute values. However, the attribute value of SRC must be surrounded by quotation marks, otherwise it cannot be extracted
* @param string $content HTML content
* @param string $absolutePath The absolute path of the Web page. If the image path in the HTML content is a relative path, you need to fill in this parameter to let the function automatically fill in the absolute path. This parameter finally needs to be/end
* @param bool $isEraseLink whether to remove links from HTML content
function Getworddocument ($content, $absolutePath = "", $isEraseLink = True)
$MHT = new Mhtfilemaker ();
if ($isEraseLink)
$content = Preg_replace ('/(\s*.*?\s*) <\/a>/i ', ' $ ', $content); Remove link
$images = Array ();
$files = Array ();
$matches = Array ();
This algorithm requires that attribute values after src must be enclosed in quotation marks.
if (Preg_match_all ('//i ', $content, $matches))
$arrPath = $matches [1];
for ($i =0; $i {
$path = $arrPath [$i];
$imgPath = Trim ($path);
if ($imgPath! = "")
$files [] = $imgPath;
if (substr ($imgPath, 0,7) = = ' http//')
Absolute links, without prefixes
$imgPath = $absolutePath. $imgPath;
$images [] = $imgPath;
$MHT->addcontents ("tmp.html", $mht->getmimetype ("tmp.html"), $content);
for ($i =0; $i {
$image = $images [$i];
if (@fopen ($image, ' R '))
$imgcontent = @file_get_contents ($image);
if ($content)
$MHT->addcontents ($files [$i], $mht->getmimetype ($image), $imgcontent);
echo "File:". $image. "Not exist!
return $mht->getfile ();

How to use:
Copy CodeThe code is as follows:
$fileContent = Getworddocument ($content, "");
$fp = fopen ("Test.doc", ' W ');
Fwrite ($fp, $fileContent);
Fclose ($FP);

Where the $content variable should be the HTML source code, the following link should be able to fill the HTML code in the image relative to the path of the URL address
Note that before using this function, you need to include the class Mhtfilemaker, which can help us generate the MHT document.
Copy CodeThe code is as follows:
CLASS:MHT File Maker
version:1.2 Beta
Description:the class can make. mht file.
Class mhtfilemaker{
var $config = array ();
var $headers = array ();
var $headers _exists = Array ();
var $files = array ();
var $boundary;
var $dir _base;
var $page _first;
function Mhtfile ($config = Array ()) {
function SetHeader ($header) {
$this->headers[] = $header;
$key = Strtolower (substr ($header, 0, Strpos ($header, ': ')));
$this->headers_exists[$key] = TRUE;
function Setfrom ($from) {
$this->setheader ("From: $from");
function Setsubject ($subject) {
$this->setheader ("Subject: $subject");
function SetDate ($date = NULL, $istimestamp = FALSE) {
if ($date = = NULL) {
$date = time ();
if ($istimestamp = = TRUE) {
$date = Date (' d, D M Y h:i:s O ', $date);
$this->setheader ("Date: $date");
function setboundary ($boundary = NULL) {
if ($boundary = = NULL) {
$this->boundary = '--'. Strtoupper (MD5 (Mt_rand ())). ' _multipart_mixed ';
} else {
$this->boundary = $boundary;
function Setbasedir ($dir) {
$this->dir_base = str_replace ("\ \", "/", Realpath ($dir));
function Setfirstpage ($filename) {
$this->page_first = str_replace ("\ \", "/", Realpath ("{$this->dir_base}/$filename"));
function Autoaddfiles () {
if (!isset ($this->page_first)) {
Exit (' not set the first page. ');
$filepath = Str_replace ($this->dir_base, ", $this->page_first);
$filepath = ' Http://mhtfile '. $filepath;
$this->addfile ($this->page_first, $filepath, NULL);
$this->adddir ($this->dir_base);
function Adddir ($dir) {
$handle _dir = Opendir ($dir);
while ($filename = Readdir ($handle _dir)) {
if ($filename! = ') && ($filename! = ' ... ') && ("$dir/$filename"! = $this->page_first)) {
if (Is_dir ("$dir/$filename")) {
$this->adddir ("$dir/$filename");
} elseif (Is_file ("$dir/$filename")) {
$filepath = Str_replace ($this->dir_base, "," $dir/$filename ");
$filepath = ' Http://mhtfile '. $filepath;
$this->addfile ("$dir/$filename", $filepath, NULL);
Closedir ($handle _dir);
function AddFile ($filename, $filepath = null, $encoding = null) {
if ($filepath = = NULL) {
$filepath = $filename;
$mimetype = $this->getmimetype ($filename);
$filecont = file_get_contents ($filename);
$this->addcontents ($filepath, $mimetype, $filecont, $encoding);
function addcontents ($filepath, $mimetype, $filecont, $encoding = NULL) {
if ($encoding = = NULL) {
$filecont = Chunk_split (Base64_encode ($filecont), 76);
$encoding = ' base64 ';
$this->files[] = array (' filepath ' = = $filepath,
' MimeType ' = $mimetype,
' Filecont ' = $filecont,
' Encoding ' = $encoding);
function Checkheaders () {
if (!array_key_exists (' Date ', $this->headers_exists)) {
$this->setdate (NULL, TRUE);
if ($this->boundary = = NULL) {
$this->setboundary ();
function Checkfiles () {
if (count ($this->files) = = 0) {
return FALSE;
} else {
return TRUE;
function GetFile () {
$this->checkheaders ();
if (! $this->checkfiles ()) {
Exit (' No file was added. ');
$contents = Implode ("\ r \ n", $this->headers);
$contents. = "\ r \ n";
$contents. = "mime-version:1.0\r\n";
$contents. = "content-type:multipart/related;\r\n";
$contents. = "\tboundary=\" {$this->boundary}\ "; \ r \ n";
$contents. = "\ttype=\" ". $this->files[0][' mimetype '). "\" \ r \ n ";
$contents. = "x-mimeole:produced by Mht File Maker v1.0 beta\r\n";
$contents. = "\ r \ n";
$contents. = "This was a multi-part message in MIME format.\r\n";
$contents. = "\ r \ n";
foreach ($this->files as $file) {
$contents. = "--{$this->boundary}\r\n";
$contents. = "Content-type: $file [mimetype]\r\n";
$contents. = "Content-transfer-encoding: $file [encoding]\r\n";
$contents. = "Content-location: $file [filepath]\r\n";
$contents. = "\ r \ n";
$contents. = $file [' Filecont '];
$contents. = "\ r \ n";
$contents. = "--{$this->boundary}--\r\n";
return $contents;
function MakeFile ($filename) {
$contents = $this->getfile ();
$fp = fopen ($filename, ' w ');
Fwrite ($fp, $contents);
Fclose ($FP);
function GetMimeType ($filename) {
$pathinfo = PathInfo ($filename);
Switch ($pathinfo [' extension ']) {
Case ' htm ': $mimetype = ' text/html '; Break
Case ' html ': $mimetype = ' text/html '; Break
Case ' txt ': $mimetype = ' text/plain '; Break
Case ' cgi ': $mimetype = ' text/plain '; Break
Case ' php ': $mimetype = ' text/plain '; Break
Case ' CSS ': $mimetype = ' text/css '; Break
Case ' jpg ': $mimetype = ' image/jpeg '; Break
Case ' jpeg ': $mimetype = ' image/jpeg '; Break
Case ' JPE ': $mimetype = ' image/jpeg '; Break
Case ' gif ': $mimetype = ' image/gif '; Break
Case ' png ': $mimetype = ' image/png '; Break
Default: $mimetype = ' application/octet-stream '; Break
return $mimetype;
?> true techarticle generally, there are 2 ways to export doc documents, one is to use COM, and as an extension of PHP to install to the server, and then create a COM, call its methods. Installed off ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.