How to Use PHP to export Word documents and instance

How to Use PHP to export Word documents and instance _ php instance

Last Update:2018-07-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many ways to operate Word documents in PHP. Here we will provide you with a method. Principle

Generally, there are two ways to export doc FILES. One is to use com and install it on the server as an extension library of php, and then create a com and call its method. The server that has installed the office can call a name called word. application com can generate Word documents, but I do not recommend this method because the execution efficiency is relatively low (I tested it, during code execution, the server will actually open a word client ). The ideal com should have no interface and perform data conversion in the background. This will produce better results, but these extensions are generally charged.
The 2nd method is to use PHP to directly write the content of our doc file into a file suffixed with doc. To use this method, you do not need to rely on third-party extensions, and the execution efficiency is high.
Word itself is very powerful, it can open html files, and can retain the format, even if the suffix is doc, it can recognize the normal open. This provides us with convenience. However, there is a problem: the image in an html file has only one address, and the real image is saved elsewhere. That is to say, if the HTML format is written to the doc, therefore, the doc cannot contain images. How can we create a doc file containing images? We can use mht format close to html.
The mht format is similar to html, but in the mht format, files that are externally linked, such as slice, Javascript, and CSS, are encoded and stored by base64. Therefore, a single mht file can store all the resources in a Web page. Of course, the size of a single mht file is larger than that of html.
Can the mht format be recognized by word? I save a webpage as mht, modify the suffix to doc, and use word to open it. OK and word can also identify the mht file and display images.
Now that doc can recognize mht, we will consider how to put the image into mht. Because the image addresses in html code are all written in the src attribute of the img tag, you only need to extract the src attribute value in html code to obtain the image address. Of course, it is possible that you get the relative path, it does not matter, add the URL prefix, change to the absolute path. With the image address, we can use the file_get_content function to obtain the specific content of the image file, and then call the base64_encode function to encode the file content into base64 encoding, insert it to the appropriate location of the mht file.
Finally, we have two methods to send files to the client. One is to generate a doc file on the server first, record the address of the doc file, and finally, you can use header ("location: xx.doc") to download the doc from the client. Another method is to directly send an html request, modify the header of the HTML protocol, set its content-type to application/doc, and set content-disposition to attachment, followed by the file name, after the html protocol is sent, you can directly send the file content to the client. You can also download the file to the client.

Implementation

Based on the above principles, I believe you should have a preliminary understanding of the implementation process. Next I will provide an export function, which can export HTML code into an mht document, there are three parameters, the last two of which are optional.
Content: HTML code to be converted
AbsolutePath: if the image addresses in the HTML code are relative paths, this parameter is the absolute path missing in the HTML code.
IsEraseLink: Remove HTML code hyperlinks
The content of a file whose return value is mht. You can use file_put_content to save it as a file whose suffix is doc.
The main function of this function is to analyze all the image addresses in the HTML code and download them one by one. After obtaining the image content, call the MhtFileMaker class to add the image to the mht file. The added details are encapsulated in the MhtFileMaker class.

The Code is as follows:

/**
* Get the Word Document Content Based on HTML code
* Create a document that is essentially mht. This function analyzes the file content and downloads image resources from the remote download page.
* This function depends on the MhtFileMaker class.
* This function analyzes the img label and extracts the src attribute value. However, the src property value must be enclosed by quotation marks; otherwise, it cannot be extracted.
*
* @ Param string $ content HTML content
* @ Param string $ absolutePath indicates the absolute path of the webpage. If the image path in the HTML content is relative, you need to fill in this parameter so that the function can automatically fill in the absolute path. This parameter must end with a slash (/).
* @ Param bool $ whether isEraseLink removes the link from the HTML content
* By www.php.net
*/
Function getWordDocument ($ content, $ absolutePath = "", $ isEraseLink = true)
{
$ Mht = new MhtFileMaker ();
If ($ isEraseLink)
$ Content = preg_replace ('/(\ s *.*? \ S *) <\/a>/I ',' $ 1', $ content); // remove the link

$ Images = array ();
$ Files = array ();
$ Matches = array ();
// This algorithm requires that the attribute values after src be enclosed in quotation marks.
If (preg_match_all ('// I', $ content, $ matches ))
{
$ ArrPath = $ matches [1];
For ($ I = 0; $ I {
$ Path = $ arrPath [$ I];
$ ImgPath = trim ($ path );
If ($ imgPath! = "")
{
$ Files [] = $ imgPath;
If (substr ($ imgPath, 0, 7) = 'HTTP ://')
{
// Absolute link without prefix
}
Else
{
$ ImgPath = $ absolutePath. $ imgPath;
}
$ Images [] = $ imgPath;
}
}
}
$ Mht-> AddContents ("tmp.html", $ mht-> GetMimeType ("tmp.html"), $ content );

For ($ I = 0; $ I {
$ Image = $ images [$ I];
If (@ fopen ($ image, 'R '))
{
$ Imgcontent = @ file_get_contents ($ image );
If ($ content)
$ Mht-> AddContents ($ files [$ I], $ mht-> GetMimeType ($ image), $ imgcontent );
}
Else
{
Echo "file:". $ image. "not exist!
";
}
}

Return $ mht-> GetFile ();
}

Usage:

The Code is as follows:

$ FileContent = getWordDocument ($ content, "http://www.php.net/Music/etc ");
$ Fp = fopen ("test.doc", 'w ');
Fwrite ($ fp, $ fileContent );
Fclose ($ fp );

Among them, the $ content Variable should be the HTML source code, and the link below should be the URL address that can fill the relative path of the image in the HTML code
Note: before using this function, you must first include the MhtFileMaker class, which can help us generate Mht documents.

The Code is as follows:

/*************************************** ********************************
Class: Mht File Maker
Version: 1.2 beta
Author: Wudi
Description: The class can make. mht file.
**************************************** *******************************/

Class MhtFileMaker {
Var $ config = array ();
Var $ headers = array ();
Var $ headers_exists = array ();
Var $ files = array ();
Var $ boundary;
Var $ dir_base;
Var $ page_first;

Function MhtFile ($ config = array ()){

}

Function SetHeader ($ header ){
$ This-> headers [] = $ header;
$ Key = strtolower (substr ($ header, 0, strpos ($ header ,':')));
$ This-> headers_exists [$ key] = TRUE;
}

Function SetFrom ($ from ){
$ This-> SetHeader ("From: $ from ");
}

Function SetSubject ($ subject ){
$ This-> SetHeader ("Subject: $ subject ");
}

Function SetDate ($ date = NULL, $ istimestamp = FALSE ){
If ($ date = NULL ){
$ Date = time ();
}
If ($ istimestamp = TRUE ){
$ Date = date ('d, d m y h: I: s O ', $ date );
}
$ This-> SetHeader ("Date: $ date ");
}

Function SetBoundary ($ boundary = NULL ){
If ($ boundary = NULL ){
$ This-> boundary = '--'. strtoupper (md5 (mt_rand (). '_ MULTIPART_MIXED ';
} Else {
$ This-> boundary = $ boundary;
}
}

Function SetBaseDir ($ dir ){
$ This-> dir_base = str_replace ("\", "/", realpath ($ dir ));
}

Function SetFirstPage ($ filename ){
$ This-> page_first = str_replace ("\", "/", realpath ("{$ this-> dir_base}/$ filename "));
}

Function AutoAddFiles (){
If (! Isset ($ this-> page_first )){
Exit ('not set the first page .');
}
$ Filepath = str_replace ($ this-> dir_base, '', $ this-> page_first );
$ Filepath = 'HTTP: // mhtfile'. $ filepath;
$ This-> AddFile ($ this-> page_first, $ filepath, NULL );
$ This-> AddDir ($ this-> dir_base );
}

Function AddDir ($ dir ){
$ Handle_dir = opendir ($ dir );
While ($ filename = readdir ($ handle_dir )){
If ($ filename! = '.') & ($ Filename! = '..') & ("$ Dir/$ filename "! = $ This-> page_first )){
If (is_dir ("$ dir/$ filename ")){
$ This-> AddDir ("$ dir/$ filename ");
} Elseif (is_file ("$ dir/$ filename ")){
$ Filepath = str_replace ($ this-> dir_base, ''," $ dir/$ filename ");
$ Filepath = 'HTTP: // mhtfile'. $ filepath;
$ This-> AddFile ("$ dir/$ filename", $ filepath, NULL );
}
}
}
Closedir ($ handle_dir );
}

Function AddFile ($ filename, $ filepath = NULL, $ encoding = NULL ){
If ($ filepath = NULL ){
$ Filepath = $ filename;
}
$ Mimetype = $ this-> GetMimeType ($ filename );
$ Filecont = file_get_contents ($ filename );
$ This-> AddContents ($ filepath, $ mimetype, $ filecont, $ encoding );
}

Function AddContents ($ filepath, $ mimetype, $ filecont, $ encoding = NULL ){
If ($ encoding = NULL ){
$ Filecont = chunk_split (base64_encode ($ filecont), 76 );
$ Encoding = 'base64 ';
}
$ This-> files [] = array ('filepath' => $ filepath,
'Metype '=> $ mimetype,
'Filecont' => $ filecont,
'Encoding' => $ encoding );
}

Function CheckHeaders (){
If (! Array_key_exists ('date', $ this-> headers_exists )){
$ This-> SetDate (NULL, TRUE );
}
If ($ this-> boundary = NULL ){
$ This-> SetBoundary ();
}
}

Function CheckFiles (){
If (count ($ this-> files) = 0 ){
Return FALSE;
} Else {
Return TRUE;
}
}

Function GetFile (){
$ This-> CheckHeaders ();
If (! $ This-> CheckFiles ()){
Exit ('no file was added .');
} // Www.php.net
$ Contents = implode ("\ r \ n", $ this-> headers );
$ Contents. = "\ r \ n ";
$ Contents. = "MIME-Version: 1.0 \ r \ n ";
$ Contents. = "Content-Type: multipart/related; \ r \ n ";
$ Contents. = "\ tboundary = \" {$ this-> boundary} \ "; \ r \ n ";
$ Contents. = "\ ttype = \" ". $ this-> files [0] ['mimetype ']." \ "\ r \ n ";
$ Contents. = "X-MimeOLE: Produced By Mht File Maker v1.0 beta \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = "This is a multi-part message in MIME format. \ r \ n ";
$ Contents. = "\ r \ n ";
Foreach ($ this-> files as $ file ){
$ Contents. = "-- {$ this-> boundary} \ r \ n ";
$ Contents. = "Content-Type: $ file [mimetype] \ r \ n ";
$ Contents. = "Content-Transfer-Encoding: $ file [encoding] \ r \ n ";
$ Contents. = "Content-Location: $ file [filepath] \ r \ n ";
$ Contents. = "\ r \ n ";
$ Contents. = $ file ['filecont'];
$ Contents. = "\ r \ n ";
}
$ Contents. = "-- {$ this-> boundary} -- \ r \ n ";
Return $ contents;
}

Function MakeFile ($ filename ){
$ Contents = $ this-> GetFile ();
$ Fp = fopen ($ filename, 'w ');
Fwrite ($ fp, $ contents );
Fclose ($ fp );
}

Function GetMimeType ($ filename ){
$ Pathinfo = pathinfo ($ filename );
Switch ($ pathinfo ['extension']) {
Case 'htm': $ mimetype = 'text/html'; break;
Case 'html': $ mimetype = 'text/html'; break;
Case 'txt ': $ mimetype = 'text/plain'; break;
Case 'cgi ': $ mimetype = 'text/plain'; break;
Case 'php': $ mimetype = 'text/plain '; break;
Case 'css ': $ mimetype = 'text/css'; break;
Case 'jpg ': $ mimetype = 'image/jpeg'; break;
Case 'jpeg ': $ mimetype = 'image/jpeg'; break;
Case 'jpe': $ mimetype = 'image/jpeg '; break;
Case 'gif': $ mimetype = 'image/gif'; break;
Case 'png ': $ mimetype = 'image/png'; break;
Default: $ mimetype = 'application/octet-stream'; break;
}
Return $ mimetype;
}
}
?>

The above discussed how to use the mht file to export the doc format in PHP. This method can solve a problem, that is, to include images in exported doc FILES. Of course, if you want to include more content, such as CSS style sheets, you only need to use regular expressions to analyze the link tag in HTML code, extract the address of the css style file, read and encode it into base64, and add it to the mht file.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to Use PHP to export Word documents and instance _ php instance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to Use PHP to export Word documents and instance _ php instance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support