PHP three methods to save a webpage as a word file _ php instance

Source: Internet
Author: User
Recently, I encountered a problem about generating word. now I will summarize the relevant information about the three methods used to generate word. For more information, see I. two ideas or principles for generating word in PHP

1. use com components in windows
2. use PHP to write content into the doc file
The specific implementation method is as follows.

2. use com components in windows

Principle: com as a PHP Extension class, installed on the office server will automatically call the word. application of com, can automatically generate documents, PHP official documentation manual: http://www.php.net/manual/en/class.com.php

Use official instances:

The code is as follows:

// Starting word
$ Word = new COM ("word. application") or die ("Unable to instantiate Word ");
Echo "Loaded Word, version {$ word-> Version} \ n ";

// Bring it to front
$ Word-> Visible = 1;

// Open an empty document
$ Word-> Documents-> Add ();

// Do some weird stuff
$ Word-> Selection-> TypeText ("This is a test ...");
$ Word-> Documents [1]-> SaveAs ("Useless test.doc ");

// Closing word
$ Word-> Quit ();

// Free the object
$ Word = null;
?>


Personal suggestion: the method after the com instance needs to find the official documentation to know what it means. the editor has no code prompt, which is very inconvenient. In addition, this efficiency is not very high and is not recommended.

3. use PHP to write content into the doc file
This method can be divided into two methods.

1. generate mht format (similar to HTML) and write it to word
2. write word in pure HTML format


1) generate mht format (similar to HTML) and write it into word

The code is as follows:

/**
* Get the word document content based on HTML code
* Create a document that is essentially mht. this function analyzes the file content and downloads image resources from the remote download page.
* This function depends on the MhtFileMaker class.
* This function analyzes the img label and extracts the src attribute value. However, the src property value must be enclosed by quotation marks; otherwise, it cannot be extracted.
*
* @ Param string $ content HTML content
* @ Param string $ absolutePath indicates the absolute path of the webpage. If the image path in the HTML content is relative, you need to fill in this parameter so that the function can automatically fill in the absolute path. This parameter must end with a slash (/).
* @ Param bool $ whether isEraseLink removes the link from the HTML content
*/
Function getWordDocument ($ content, $ absolutePath = "", $ isEraseLink = true)
{
$ Mht = new MhtFileMaker ();
If ($ isEraseLink)
$ Content = preg_replace ('/(\ s *.*? \ S *) <\/a>/I ',' $ 1', $ content); // remove the link

$ Images = array ();
$ Files = array ();
$ Matches = array ();
// This algorithm requires that the attribute values after src be enclosed in quotation marks.
If (preg_match_all ('// I', $ content, $ matches ))
{
$ ArrPath = $ matches [1];
For ($ I = 0; $ I {
$ Path = $ arrPath [$ I];
$ ImgPath = trim ($ path );
If ($ imgPath! = "")
{
$ Files [] = $ imgPath;
If (substr ($ imgPath, 0, 7) = 'http ://')
{
// Absolute link without prefix
}
Else
{
$ ImgPath = $ absolutePath. $ imgPath;
}
$ Images [] = $ imgPath;
}
}
}
$ Mht-> AddContents ("tmp.html", $ mht-> GetMimeType ("tmp.html"), $ content );

For ($ I = 0; $ I {
$ Image = $ images [$ I];
If (@ fopen ($ image, 'r '))
{
$ Imgcontent = @ file_get_contents ($ image );
If ($ content)
$ Mht-> AddContents ($ files [$ I], $ mht-> GetMimeType ($ image), $ imgcontent );
}
Else
{
Echo "file:". $ image. "not exist!
";
}
}

Return $ mht-> GetFile ();
}

The main function of this function is to analyze all the image addresses in the HTML code and download them one by one. After obtaining the image content, call the MhtFileMaker class to add the image to the mht file. The added details are encapsulated in the MhtFileMaker class.

Method 1: Remote Call

The code is as follows:

$ Url = http: // www. ***. com;

$ Content = file_get_contents ($ url );

$ FileContent = getWordDocument ($ content, "http://www.yoursite.com/Music/etc ");
$ Fp = fopen ("test.doc", 'w ');
Fwrite ($ fp, $ fileContent );
Fclose ($ fp );
Among them, the $ content variable should be the HTML source code, and the link below should be the URL address that can fill the relative path of the image in the HTML code


Among them, the $ content variable should be the HTML source code, and the link below should be the URL address that can fill the relative path of the image in the HTML code

Method 2: generate a local call

The code is as follows:


Header ("Cache-Control: no-cache, must-revalidate ");
Header ("Pragma: no-cache ");
$ WordStr = 'php tutorial website --php.net ';
$ FileContent = getWordDocument ($ wordStr );
$ FileName = iconv ("UTF-8", "GBK", 'php tutorial '.' _ '. $ intro.' _ '. rand (100,999 ));
Header ("Content-Type: application/doc ");
Header ("Content-Disposition: attachment; filename =". $ fileName. ". doc ");
Echo $ fileContent;

Note: Before using this function, you must first include the MhtFileMaker class, which can help us generate Mht documents.

The code is as follows:

/*************************************** ********************************
Class: Mht File Maker
Version: 1.2 beta
Date: 02/11/2007
Author: Wudi
Description: The class can make. mht file.
**************************************** *******************************/

Class MhtFileMaker {
Var $ config = array ();
Var $ headers = array ();
Var $ headers_exists = array ();
Var $ files = array ();
Var $ boundary;
Var $ dir_base;
Var $ page_first;

Function MhtFile ($ config = array ()){

}

Function SetHeader ($ header ){
$ This-> headers [] = $ header;
$ Key = strtolower (substr ($ header, 0, strpos ($ header ,':')));
$ This-> headers_exists [$ key] = TRUE;
}

Function SetFrom ($ from ){
$ This-> SetHeader ("From: $ from ");
}

Function SetSubject ($ subject ){
$ This-> SetHeader ("Subject: $ subject ");
}

Function SetDate ($ date = NULL, $ istimestamp = FALSE ){
If ($ date = NULL ){
$ Date = time ();
}
If ($ istimestamp = TRUE ){
$ Date = date ('d, d m y h: I: s O ', $ date );
}
$ This-> SetHeader ("Date: $ date ");
}

Function SetBoundary ($ boundary = NULL ){
If ($ boundary = NULL ){
$ This-> boundary = '--'. strtoupper (md5 (mt_rand (). '_ MULTIPART_MIXED ';
} Else {
$ This-> boundary = $ boundary;
}
}

Function SetBaseDir ($ dir ){
$ This-> dir_base = str_replace ("\", "/", realpath ($ dir ));
}

Function SetFirstPage ($ filename ){
$ This-> page_first = str_replace ("\", "/", realpath ("{$ this-> dir_base}/$ filename "));
}

Function AutoAddFiles (){
If (! Isset ($ this-> page_first )){
Exit ('not set the first page .');
}
$ Filepath = str_replace ($ this-> dir_base, '', $ this-> page_first );
$ Filepath = 'http: // mhtfile'. $ filepath;
$ This-> AddFile ($ this-> page_first, $ filepath, NULL );
$ This-> AddDir ($ this-> dir_base );
}

Function AddDir ($ dir ){
$ Handle_dir = opendir ($ dir );
While ($ filename = readdir ($ handle_dir )){
If ($ filename! = '.') & ($ Filename! = '..') & ("$ Dir/$ filename "! = $ This-> page_first )){
If (is_dir ("$ dir/$ filename ")){
$ This-> AddDir ("$ dir/$ filename ");
} Elseif (is_file ("$ dir/$ filename ")){
$ Filepath = str_replace ($ this-> dir_base, ''," $ dir/$ filename ");
$ Filepath = 'http: // mhtfile'. $ filepath;
$ This-> AddFile ("$ dir/$ filename", $ filepath, NULL );
}
}
}
Closedir ($ handle_dir );
}

Function AddFile ($ filename, $ filepath = NULL, $ encoding = NULL ){
If ($ filepath = NULL ){
$ Filepath = $ filename;
}
$ Mimetype = $ this-> GetMimeType ($ filename );
$ Filecont = file_get_contents ($ filename );
$ This-> AddContents ($ filepath, $ mimetype, $ filecont, $ encoding );
}

Function AddContents ($ filepath, $ mimetype, $ filecont, $ encoding = NULL ){
If ($ encoding = NULL ){
$ Filecont = chunk_split (base64_encode ($ filecont), 76 );
$ Encoding = 'base64 ';
}
$ This-> files [] = array ('filepath' => $ filepath,
'Metype '=> $ mimetype,
'Filecont' => $ filecont,
'Encoding' => $ encoding );
}

Function CheckHeaders (){
If (! Array_key_exists ('date', $ this-> headers_exists )){
$ This-> SetDate (NULL, TRUE );
}
If ($ this-> boundary = NULL ){
$ This-> SetBoundary ();
}
}

Function CheckFiles (){
If (count ($ this-> files) = 0 ){
Return FALSE;
} Else {
Return TRUE;
}
}

Function GetFile (){
$ This-> CheckHeaders ();
If (! $ This-> CheckFiles ()){
Exit ('No file was added .');
}
$ Contents = implode ("\ r \ n", $ this-> headers );
$ Contents. = "\ r \ n ";
$ Contents. = "MIME-Version: 1.0 \ r \ n ";
$ Contents. = "Content-Type: multipart/related; \ r \ n ";
$ Contents. = "\ tboundary = \" {$ this-> boundary} \ "; \ r \ n ";
$ Contents. = "\ ttype = \" ". $ this-> files [0] ['mimetype ']." \ "\ r \ n ";
$ Contents. = "X-MimeOLE: Produced By Mht File Maker v1.0 beta \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = "This is a multi-part message in MIME format. \ r \ n ";
$ Contents. = "\ r \ n ";
Foreach ($ this-> files as $ file ){
$ Contents. = "-- {$ this-> boundary} \ r \ n ";
$ Contents. = "Content-Type: $ file [mimetype] \ r \ n ";
$ Contents. = "Content-Transfer-Encoding: $ file [encoding] \ r \ n ";
$ Contents. = "Content-Location: $ file [filepath] \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = $ file ['filecont'];
$ Contents. = "\ r \ n ";
}
$ Contents. = "-- {$ this-> boundary} -- \ r \ n ";
Return $ contents;
}

Function MakeFile ($ filename ){
$ Contents = $ this-> GetFile ();
$ Fp = fopen ($ filename, 'w ');
Fwrite ($ fp, $ contents );
Fclose ($ fp );
}

Function GetMimeType ($ filename ){
$ Pathinfo = pathinfo ($ filename );
Switch ($ pathinfo ['extension']) {
Case 'htm': $ mimetype = 'text/html'; break;
Case 'html': $ mimetype = 'text/html'; break;
Case 'txt ': $ mimetype = 'text/plain'; break;
Case 'CGI ': $ mimetype = 'text/plain'; break;
Case 'php': $ mimetype = 'text/plain '; break;
Case 'css ': $ mimetype = 'text/css'; break;
Case 'jpg ': $ mimetype = 'image/jpeg'; break;
Case 'jpeg ': $ mimetype = 'image/jpeg'; break;
Case 'jpe': $ mimetype = 'image/jpeg '; break;
Case 'GIF': $ mimetype = 'image/GIF'; break;
Case 'PNG ': $ mimetype = 'image/png'; break;
Default: $ mimetype = 'application/octet-stream'; break;
}
Return $ mimetype;
}
}
?>

Comment: The disadvantage of this method is that it does not support batch download, because a page can only have one header (no matter whether it is used remotely or locally generated to generate a declaration header page, only one header can be output ), even if you generate the result cyclically, only one word is generated (you can modify the above method to implement it)

2. write word in pure HTML format

Principle:

Use ob_start to store html pages first (multiple headers can be generated in batches to solve the problem), and then use

Code:

The code is as follows:

Class word
{
Function start ()
{
Ob_start ();
Echo'Xmlns: w = "urn: schemas-microsoft-com: office: word"
Xmlns = "http://www.w3.org/TR/REC-html40"> ';
}
Function save ($ path)
{

Echo"";
$ Data = ob_get_contents ();
Ob_end_clean ();

$ This-> wirtefile ($ path, $ data );
}

Function wirtefile ($ fn, $ data)
{
$ Fp = fopen ($ fn, "wb ");
Fwrite ($ fp, $ data );
Fclose ($ fp );
}
}


The code is as follows:

$ Html ='













PHP10086 Http://www.php.net </a>
PHP10086 Http://www.php.net </a>

PHP10086

The most reliable PHP technology sharing website


';

// Batch generate
For ($ I = 1; $ I <= 3; $ I ++ ){
$ Word = new word ();
$ Word-> start ();
// $ Html = "aaa". $ I;
$ Wordname = 'php tutorial website --php.net'. $ I. ". doc ";
Echo $ html;
$ Word-> save ($ wordname );
Ob_flush (); // refresh the cache before each execution
Flush ();
}


Personal comment:This method works best for three reasons:

The first code is concise and easy to understand.
Second, support for batch generation of word (this is important)
Third, support for complete html code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.