Principle
Generally, there are 2 ways to export doc documents, one is to use COM, and as an extension library of PHP to install to the server, and then create a COM, call it the method. A server that has Office installed can invoke a COM called word.application to generate a Word document, but I don't recommend it because of inefficient execution (I tested it and the server actually opens a word client when executing the code). The ideal COM should be no interface, in the background data conversion, so the effect will be better, but these extensions generally need to charge.
The 2nd way is to use PHP to write the contents of our Doc document directly into a file with a suffix of doc. There is no need to rely on Third-party extensions for this approach, and execution is more efficient.
Word itself is powerful enough to open HTML-formatted files and to preserve formatting, even if the suffix is doc, it recognizes normal opening. This provides convenience for us. But there is a problem, the HTML format of the file in the picture only one address, the real picture is saved elsewhere, that is, if the HTML format is written to Doc, then Doc will not contain pictures. So how do we create doc documents that contain pictures? We can use the MHT format that is very close to the HTML.
The MHT format is similar to HTML, except that in the MHT format, externally linked files, such as pictures, Javascript, and CSS, are encoded and stored base64. As a result, a single MHT file can hold all of the resources in a Web page, and, of course, it will be larger in size than HTML.
Can the MHT format be recognized by word? I save a Web page as a mht, then modify the suffix name doc, and then open it in Word, Ok,word can also recognize MHT files, and can display pictures.
Well, now that Doc can identify MHT, here's how to put the picture in the MHT. Because the address of the picture in the HTML code is written in the SRC attribute of the IMG tag, you can get the picture address as long as you extract the SRC attribute value from the HTML code. Of course, it is possible that you get a relative path, okay, plus the prefix of the URL, change to absolute path on it. With the image address, we can get the specific content of the picture file through the File_get_content function, then call the Base64_encode function to encode the contents of the file into Base64, and finally insert the appropriate location of the MHT file.
Finally, there are two ways to send a file to the client, one is to generate a doc on the server side, and then record the address of the doc document, and finally, through the header ("Location:xx.doc"), you can have the client download this doc. Another is to send the HTML request directly, modify the header portion of the HTML protocol, set its content-type to Application/doc, set the content-disposition to attachment, followed by the filename, After the HTML protocol is sent, the file content is sent directly to the client, or the client can be downloaded to the doc document.
Realize
Through the above principles, I believe you should have a preliminary understanding of the process of implementation, the following I give an export function, this function can export HTML code into a MHT document, there are 3 parameters, of which the last 2 are optional parameters
Content: HTML code to convert
Absolutepath: If the image address in the HTML code is a relative path, then this parameter is the absolute path missing from the HTML code.
Iseraselink: Do you want to get rid of hyperlinks in HTML code
Returns the contents of a file with a value of MHT, which you can save by File_put_content to a file with a suffix named doc
The main function of this function is to analyze all the image addresses in the HTML code and download them sequentially. After you get the contents of the picture, call the Mhtfilemaker class and add the picture to the MHT file. Details are added, encapsulated in the Mhtfilemaker class.
Copy Code code as follows:
/**
* Get Word document content based on HTML code
* Create an essentially MHT document that analyzes the contents of the file and downloads the picture resources from the remote download page
* This function depends on the class Mhtfilemaker
* This function analyzes the IMG tag to extract the attribute value of SRC. However, the attribute value of SRC must be surrounded by quotes, otherwise it cannot be extracted
*
* @param string $content HTML content
* @param string $absolutePath The absolute path of the Web page. If the image path in the HTML content is a relative path, then you need to fill in the parameter to make the function automatically fill the absolute path. This parameter finally needs to end with/
* @param bool $isEraseLink remove links from HTML content
* by Www.jb51.net
*/
function Getworddocument ($content, $absolutePath = "", $isEraseLink = True)
{
$MHT = new Mhtfilemaker ();
if ($isEraseLink)
$content = Preg_replace ('/<a\s*.*?\s*> (\s*.*?\s*) <\/a>/i ', ' $ ', $content); Remove link
$images = Array ();
$files = Array ();
$matches = Array ();
This algorithm requires that the attribute value after SRC must be enclosed in quotes
if (Preg_match_all ('/<img[.\n]*?src\s*?=\s*?[ \"\'](.*?) [\"\'] (.*?) \/>/i ', $content, $matches))
{
$arrPath = $matches [1];
for ($i =0; $i <count ($arrPath); $i + +)
{
$path = $arrPath [$i];
$imgPath = Trim ($path);
if ($imgPath!= "")
{
$files [] = $imgPath;
if (substr ($imgPath, 0,7) = = ' http://')
{
Absolute link, no prefix
}
Else
{
$imgPath = $absolutePath. $imgPath;
}
$images [] = $imgPath;
}
}
}
$MHT->addcontents ("tmp.html", $mht->getmimetype ("tmp.html"), $content);
for ($i =0; $i <count ($images); $i + +)
{
$image = $images [$i];
if (@fopen ($image, ' R '))
{
$imgcontent = @file_get_contents ($image);
if ($content)
$MHT->addcontents ($files [$i], $mht->getmimetype ($image), $imgcontent);
}
Else
{
echo "File:" $image. "Not exist!<br/>";
}
}
return $mht->getfile ();
}
How to use:
Copy Code code as follows:
$fileContent = Getworddocument ($content, "http://www.jb51.net/Music/etc/");
$fp = fopen ("Test.doc", ' W ');
Fwrite ($fp, $fileContent);
Fclose ($FP);
Where the $content variable should be HTML source code, and the following link should be a URL that fills the relative path of the picture in the HTML code.
Note that before using this function, you need to include class Mhtfilemaker, which can help us generate MHT documents.
Copy Code code as follows:
<?php
/***********************************************************************
CLASS:MHT File Maker
version:1.2 Beta
Author:wudi <wudicgi@yahoo.de>
Description:the class can make. mht file.
***********************************************************************/
Class mhtfilemaker{
var $config = array ();
var $headers = array ();
var $headers _exists = Array ();
var $files = array ();
var $boundary;
var $dir _base;
var $page _first;
function Mhtfile ($config = Array ()) {
}
function SetHeader ($header) {
$this->headers[] = $header;
$key = Strtolower (substr ($header, 0, Strpos ($header, ': '));
$this->headers_exists[$key] = TRUE;
}
function Setfrom ($from) {
$this->setheader ("From: $from");
}
function Setsubject ($subject) {
$this->setheader ("Subject: $subject");
}
function setdate ($date = NULL, $istimestamp = FALSE) {
if ($date = = NULL) {
$date = time ();
}
if ($istimestamp = = TRUE) {
$date = Date (' d, D M Y h:i:s O ', $date);
}
$this->setheader ("Date: $date");
}
function setboundary ($boundary = NULL) {
if ($boundary = = NULL) {
$this->boundary = '--'. Strtoupper (MD5 (Mt_rand ())). ' _multipart_mixed ';
} else {
$this->boundary = $boundary;
}
}
function Setbasedir ($dir) {
$this->dir_base = str_replace ("\", "/", Realpath ($dir));
}
function Setfirstpage ($filename) {
$this->page_first = str_replace ("\", "/", Realpath ("{$this->dir_base}/$filename"));
}
function Autoaddfiles () {
if (!isset ($this->page_first)) {
Exit (' not set ' the ' the '. ');
}
$filepath = Str_replace ($this->dir_base, ", $this->page_first);
$filepath = ' Http://mhtfile '. $filepath;
$this->addfile ($this->page_first, $filepath, NULL);
$this->adddir ($this->dir_base);
}
function Adddir ($dir) {
$handle _dir = Opendir ($dir);
while ($filename = Readdir ($handle _dir)) {
if ($filename!= '. ') && ($filename!= ' ... ') && ("$dir/$filename"!= $this->page_first)) {
if (Is_dir ("$dir/$filename")) {
$this->adddir ("$dir/$filename");
} elseif (Is_file ("$dir/$filename")) {
$filepath = Str_replace ($this->dir_base, "$dir/$filename");
$filepath = ' Http://mhtfile '. $filepath;
$this->addfile ("$dir/$filename", $filepath, NULL);
}
}
}
Closedir ($handle _dir);
}
function AddFile ($filename, $filepath = null, $encoding = null) {
if ($filepath = = NULL) {
$filepath = $filename;
}
$mimetype = $this->getmimetype ($filename);
$filecont = file_get_contents ($filename);
$this->addcontents ($filepath, $mimetype, $filecont, $encoding);
}
function addcontents ($filepath, $mimetype, $filecont, $encoding = NULL) {
if ($encoding = = NULL) {
$filecont = Chunk_split (Base64_encode ($filecont), 76);
$encoding = ' base64 ';
}
$this->files[] = array (' filepath ' => $filepath,
' MimeType ' => $mimetype,
' Filecont ' => $filecont,
' Encoding ' => $encoding);
}
function Checkheaders () {
if (!array_key_exists (' Date ', $this->headers_exists)) {
$this->setdate (NULL, TRUE);
}
if ($this->boundary = = NULL) {
$this->setboundary ();
}
}
function Checkfiles () {
if (count ($this->files) = = 0) {
return FALSE;
} else {
return TRUE;
}
}
function GetFile () {
$this->checkheaders ();
if (! $this->checkfiles ()) {
Exit (' No file was added. ');
}//www.jb51.net
$contents = Implode ("\ r \ n", $this->headers);
$contents. = "\ r \ n";
$contents. = "mime-version:1.0\r\n";
$contents. = "content-type:multipart/related;\r\n";
$contents. = "\tboundary=\" {$this->boundary}\ "; \ r \ n";
$contents. = "\ttype=\" ". $this->files[0][' mimetype ']. "\ \ r \ n";
$contents. = "x-mimeole:produced by Mht File Maker v1.0 beta\r\n";
$contents. = "\ r \ n";
$contents. = "This is a multi-part message in MIME format.\r\n";
$contents. = "\ r \ n";
foreach ($this->files as $file) {
$contents. = "--{$this->boundary}\r\n";
$contents. = "Content-type: $file [mimetype]\r\n";
$contents. = "Content-transfer-encoding: $file [encoding]\r\n";
$contents. = "Content-location: $file [filepath]\r\n";
$contents. = "\ r \ n";
$contents. = $file [' Filecont '];
$contents. = "\ r \ n";
}
$contents. = "--{$this->boundary}--\r\n";
return $contents;
}
function MakeFile ($filename) {
$contents = $this->getfile ();
$fp = fopen ($filename, ' w ');
Fwrite ($fp, $contents);
Fclose ($FP);
}
function GetMimeType ($filename) {
$pathinfo = PathInfo ($filename);
Switch ($pathinfo [' extension ']) {
Case ' htm ': $mimetype = ' text/html '; Break
Case ' html ': $mimetype = ' text/html '; Break
Case ' txt ': $mimetype = ' text/plain '; Break
Case ' cgi ': $mimetype = ' text/plain '; Break
Case ' php ': $mimetype = ' text/plain '; Break
Case ' CSS ': $mimetype = ' text/css '; Break
Case ' jpg ': $mimetype = ' image/jpeg '; Break
Case ' jpeg ': $mimetype = ' image/jpeg '; Break
Case ' JPE ': $mimetype = ' image/jpeg '; Break
Case ' gif ': $mimetype = ' image/gif '; Break
Case ' png ': $mimetype = ' image/png '; Break
Default: $mimetype = ' application/octet-stream '; Break
}
return $mimetype;
}
}
?>
The above discusses the use of MHT files, to achieve the PHP export doc format. This method can solve a problem, is to make the exported doc file contain pictures, of course, if you want to include more content, such as CSS style sheets, you just need to parse the link tag in the HTML code with the regular expression, extract the address of the CSS style file, and then read and encode the base64. Finally, add to the MHT file.