PHP exports the code of a webpage as a Word document. Generally, there are two ways to export doc files. one is to use com and install it on the server as an extension library of php, and then create a com and call its method. If off is installed, there are two methods to export doc files. one is to use com and install it on the server as an extension library of php, and then create a com, call its method. The server that has installed the office can call a name called word. application com can generate Word documents, but I do not recommend this method because the execution efficiency is relatively low (I tested it, during code execution, the server will actually open a word client ). The ideal com should have no interface and perform data conversion in the background. This will produce better results, but these extensions are generally charged.
The 2nd method is to use PHP to directly write the content of our doc file into a file suffixed with doc. To use this method, you do not need to rely on third-party extensions, and the execution efficiency is high.
Word itself is very powerful, it can open html files, and can retain the format, even if the suffix is doc, it can recognize the normal open. This provides us with convenience. However, there is a problem: the image in an html file has only one address, and the real image is saved elsewhere. that is to say, if the HTML format is written to the doc, therefore, the doc cannot contain images. How can we create a doc file containing images? We can use mht format close to html.
The mht format is similar to html, but in the mht format, files that are externally linked, such as slice, Javascript, and CSS, are encoded and stored by Base64. Therefore, a single mht file can store all the resources in a web page. of course, the size of a single mht file is larger than that of html.
Can the mht format be recognized by word? I save a webpage as mht, modify the suffix to doc, and use word to open it. OK and word can also identify the mht file and display images.
Now that doc can recognize mht, we will consider how to put the image into mht. Because the image addresses in html code are all written in the src attribute of the img tag, you only need to extract the src attribute value in html code to obtain the image address. Of course, it is possible that you get the relative path, it does not matter, add the URL prefix, change to the absolute path. With the image address, we can use the file_get_content function to obtain the specific content of the image file, and then call the base64_encode function to encode the file content into base64 encoding, insert it to the appropriate location of the mht file.
Finally, we have two methods to send files to the client. one is to generate a doc file on the server first, record the address of the doc file, and finally, you can use header ("location: xx.doc") to download the doc from the client. Another method is to directly send an html request, modify the header of the HTML protocol, set its content-type to application/doc, and set content-disposition to attachment, followed by the file name, after the html protocol is sent, you can directly send the file content to the client. you can also download the file to the client.
Implementation
Based on the above principles, I believe you should have a preliminary understanding of the implementation process. next I will provide an export function, which can export HTML code into an mht document, there are three parameters, the last two of which are optional.
Content: HTML code to be converted
AbsolutePath: If the image addresses in the HTML code are relative paths, this parameter is the absolute path missing in the HTML code.
IsEraseLink: remove HTML code hyperlinks
The content of a file whose return value is mht. you can use file_put_content to save it as a file whose suffix is doc.
The main function of this function is to analyze all the image addresses in the HTML code and download them one by one. After obtaining the image content, call the MhtFileMaker class to add the image to the mht file. The added details are encapsulated in the MhtFileMaker class.
The code is as follows:
/**
* Get the word document content based on HTML code
* Create a document that is essentially mht. this function analyzes the file content and downloads image resources from the remote download page.
* This function depends on the MhtFileMaker class.
* This function analyzes the img label and extracts the src attribute value. However, the src property value must be enclosed by quotation marks; otherwise, it cannot be extracted.
*
* @ Param string $ content HTML content
* @ Param string $ absolutePath indicates the absolute path of the webpage. If the image path in the HTML content is relative, you need to fill in this parameter so that the function can automatically fill in the absolute path. This parameter must end with a slash (/).
* @ Param bool $ whether isEraseLink removes the link from the HTML content
*/
Function getWordDocument ($ content, $ absolutePath = "", $ isEraseLink = true)
{
$ Mht = new MhtFileMaker ();
If ($ isEraseLink)
$ Content = preg_replace ('/(\ s *.*? \ S *) <\/a>/I ',' $ 1', $ content); // remove the link
$ Images = array ();
$ Files = array ();
$ Matches = array ();
// This algorithm requires that the attribute values after src be enclosed in quotation marks.
If (preg_match_all ('// I', $ content, $ matches ))
{
$ ArrPath = $ matches [1];
For ($ I = 0; $ I {
$ Path = $ arrPath [$ I];
$ ImgPath = trim ($ path );
If ($ imgPath! = "")
{
$ Files [] = $ imgPath;
If (substr ($ imgPath, 0, 7) = 'http ://')
{
// Absolute link without prefix
}
Else
{
$ ImgPath = $ absolutePath. $ imgPath;
}
$ Images [] = $ imgPath;
}
}
}
$ Mht-> AddContents ("tmp.html", $ mht-> GetMimeType ("tmp.html"), $ content );
For ($ I = 0; $ I {
$ Image = $ images [$ I];
If (@ fopen ($ image, 'r '))
{
$ Imgcontent = @ file_get_contents ($ image );
If ($ content)
$ Mht-> AddContents ($ files [$ I], $ mht-> GetMimeType ($ image), $ imgcontent );
}
Else
{
Echo "file:". $ image. "not exist!
";
}
}
Return $ mht-> GetFile ();
}
Usage:
The code is as follows:
$ FileContent = getWordDocument ($ content, "http://www.yoursite.com/Music/etc ");
$ Fp = fopen ("test.doc", 'w ');
Fwrite ($ fp, $ fileContent );
Fclose ($ fp );
Among them, the $ content variable should be the HTML source code, and the link below should be the URL address that can fill the relative path of the image in the HTML code
Note: Before using this function, you must first include the MhtFileMaker class, which can help us generate Mht documents.
The code is as follows:
/*************************************** ********************************
Class: Mht File Maker
Version: 1.2 beta
Date: 02/11/2007
Author: Wudi
Description: The class can make. mht file.
**************************************** *******************************/
Class MhtFileMaker {
Var $ config = array ();
Var $ headers = array ();
Var $ headers_exists = array ();
Var $ files = array ();
Var $ boundary;
Var $ dir_base;
Var $ page_first;
Function MhtFile ($ config = array ()){
}
Function SetHeader ($ header ){
$ This-> headers [] = $ header;
$ Key = strtolower (substr ($ header, 0, strpos ($ header ,':')));
$ This-> headers_exists [$ key] = TRUE;
}
Function SetFrom ($ from ){
$ This-> SetHeader ("From: $ from ");
}
Function SetSubject ($ subject ){
$ This-> SetHeader ("Subject: $ subject ");
}
Function SetDate ($ date = NULL, $ istimestamp = FALSE ){
If ($ date = NULL ){
$ Date = time ();
}
If ($ istimestamp = TRUE ){
$ Date = date ('d, d m y h: I: s O ', $ date );
}
$ This-> SetHeader ("Date: $ date ");
}
Function SetBoundary ($ boundary = NULL ){
If ($ boundary = NULL ){
$ This-> boundary = '--'. strtoupper (md5 (mt_rand (). '_ MULTIPART_MIXED ';
} Else {
$ This-> boundary = $ boundary;
}
}
Function SetBaseDir ($ dir ){
$ This-> dir_base = str_replace ("\", "/", realpath ($ dir ));
}
Function SetFirstPage ($ filename ){
$ This-> page_first = str_replace ("\", "/", realpath ("{$ this-> dir_base}/$ filename "));
}
Function AutoAddFiles (){
If (! Isset ($ this-> page_first )){
Exit ('not set the first page .');
}
$ Filepath = str_replace ($ this-> dir_base, '', $ this-> page_first );
$ Filepath = 'http: // mhtfile'. $ filepath;
$ This-> AddFile ($ this-> page_first, $ filepath, NULL );
$ This-> AddDir ($ this-> dir_base );
}
Function AddDir ($ dir ){
$ Handle_dir = opendir ($ dir );
While ($ filename = readdir ($ handle_dir )){
If ($ filename! = '.') & ($ Filename! = '..') & ("$ Dir/$ filename "! = $ This-> page_first )){
If (is_dir ("$ dir/$ filename ")){
$ This-> AddDir ("$ dir/$ filename ");
} Elseif (is_file ("$ dir/$ filename ")){
$ Filepath = str_replace ($ this-> dir_base, ''," $ dir/$ filename ");
$ Filepath = 'http: // mhtfile'. $ filepath;
$ This-> AddFile ("$ dir/$ filename", $ filepath, NULL );
}
}
}
Closedir ($ handle_dir );
}
Function AddFile ($ filename, $ filepath = NULL, $ encoding = NULL ){
If ($ filepath = NULL ){
$ Filepath = $ filename;
}
$ Mimetype = $ this-> GetMimeType ($ filename );
$ Filecont = file_get_contents ($ filename );
$ This-> AddContents ($ filepath, $ mimetype, $ filecont, $ encoding );
}
Function AddContents ($ filepath, $ mimetype, $ filecont, $ encoding = NULL ){
If ($ encoding = NULL ){
$ Filecont = chunk_split (base64_encode ($ filecont), 76 );
$ Encoding = 'base64 ';
}
$ This-> files [] = array ('filepath' => $ filepath,
'Metype '=> $ mimetype,
'Filecont' => $ filecont,
'Encoding' => $ encoding );
}
Function CheckHeaders (){
If (! Array_key_exists ('date', $ this-> headers_exists )){
$ This-> SetDate (NULL, TRUE );
}
If ($ this-> boundary = NULL ){
$ This-> SetBoundary ();
}
}
Function CheckFiles (){
If (count ($ this-> files) = 0 ){
Return FALSE;
} Else {
Return TRUE;
}
}
Function GetFile (){
$ This-> CheckHeaders ();
If (! $ This-> CheckFiles ()){
Exit ('No file was added .');
}
$ Contents = implode ("\ r \ n", $ this-> headers );
$ Contents. = "\ r \ n ";
$ Contents. = "MIME-Version: 1.0 \ r \ n ";
$ Contents. = "Content-Type: multipart/related; \ r \ n ";
$ Contents. = "\ tboundary = \" {$ this-> boundary} \ "; \ r \ n ";
$ Contents. = "\ ttype = \" ". $ this-> files [0] ['mimetype ']." \ "\ r \ n ";
$ Contents. = "X-MimeOLE: Produced By Mht File Maker v1.0 beta \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = "This is a multi-part message in MIME format. \ r \ n ";
$ Contents. = "\ r \ n ";
Foreach ($ this-> files as $ file ){
$ Contents. = "-- {$ this-> boundary} \ r \ n ";
$ Contents. = "Content-Type: $ file [mimetype] \ r \ n ";
$ Contents. = "Content-Transfer-Encoding: $ file [encoding] \ r \ n ";
$ Contents. = "Content-Location: $ file [filepath] \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = $ file ['filecont'];
$ Contents. = "\ r \ n ";
}
$ Contents. = "-- {$ this-> boundary} -- \ r \ n ";
Return $ contents;
}
Function MakeFile ($ filename ){
$ Contents = $ this-> GetFile ();
$ Fp = fopen ($ filename, 'w ');
Fwrite ($ fp, $ contents );
Fclose ($ fp );
}
Function GetMimeType ($ filename ){
$ Pathinfo = pathinfo ($ filename );
Switch ($ pathinfo ['extension']) {
Case 'htm': $ mimetype = 'text/html'; break;
Case 'html': $ mimetype = 'text/html'; break;
Case 'txt ': $ mimetype = 'text/plain'; break;
Case 'CGI ': $ mimetype = 'text/plain'; break;
Case 'php': $ mimetype = 'text/plain '; break;
Case 'css ': $ mimetype = 'text/css'; break;
Case 'jpg ': $ mimetype = 'image/jpeg'; break;
Case 'jpeg ': $ mimetype = 'image/jpeg'; break;
Case 'jpe': $ mimetype = 'image/jpeg '; break;
Case 'GIF': $ mimetype = 'image/GIF'; break;
Case 'PNG ': $ mimetype = 'image/png'; break;
Default: $ mimetype = 'application/octet-stream'; break;
}
Return $ mimetype;
}
}
?>
Bytes. Installed off...