PHP converts HTML pages into Word and saves the method

Source: Internet
Author: User
Tags create directory explode phpword
This article mainly introduces the PHP implementation of the HTML page into Word and save the method, combined with the example form analysis of the function and use of Phpword tools, with a certain reference value, the need for friends can refer to the next

This article is an example of how PHP implements the conversion of HTML pages into Word and saves them. Share to everyone for your reference, as follows:

Here's a tool for using PHP called: Phpword.

The idea of generating word is to compress the heap-defined XML into a ZIP package and change the suffix to Doc or docx.

So using Phpword, need your PHP environment to install zip.dll compression extension, I wrote a demo.

Function Description:

Access to <p> tags and <ol> list tags in 20150507-html
20150508-added the ability to get pictures in an article
20150509-Add line spacing and filter the wrong picture
20150514-Added table processing and changed the code to object-oriented
20150519-New GD library processing network image

Require_once ' phpword.php '; require_once ' SimpleHtmlDom.class.php '; class word{private $url; Private $LINETEXTARR = Array (); Public $CurrentDir; Public $error = Array (); Error array Public $filename = null; Public $Allowtag = "p,ol,ul,table"; /** Data Statistics **/public $DownImg = 0; Public $expendTime = 0; Public $HttpRequestTime = 0; Public $ContentLen = 0; Public $HTTPREQUESTARR = Array (); Public $expendmemory = 0; Public function __construct ($url) {$startTime = $this->_time (); $startMemory = $this->_memory (); $this->url = $u Rl $URLARR = Parse_url ($this->url); $this->host = $URLARR ["scheme"]. ":/ /". $URLARR [' Host ']; $this->currentdir = GETCWD (); $this->linetextarr["table"] = array (); $html = new Simple_html_dom ($this->url); $this->httprequestarr[] = $this->url; $this->httprequesttime++; foreach ($html->find ($this->allowtag) as $key = + $value) {if ($value->tag = = "Table") {$this->parsetable ( $value, 0, $this->linetextarr["table"]); } else {$this->anaLysishtmldom ($value); } $this->error[] = Error_get_last (); } $endTime = $this->_time (); $endMemory = $this->_memory (); $this->expendtime = Round (($endTime-$startTime), 2); microsecond $this->expendmemory = Round (($endMemory-$startMemory)/1000,2); Bytes $this->createworddom (); } Private Function _time () {return array_sum (Explode ("", Microtime ()));} Private Function _memory () {return memory_get_usage ();}/** * Parse table in HTML, taking into account the case of multi-layer table nesting * @param $value htmldom * @param $i Traversal level * **/Private Function parsetable ($value, $i, $ARR) {if ($value->firstchild () && In_array ($value- >firstchild ()->tag,array ("table", "Tbody", "Thead", "Tfoot", "tr")) {foreach ($value->children as $k + $v) {$this->parsetable ($v, $i + +, $ARR);} } else {foreach ($value->children as $k = + $v) {if ($v->firstchild () && $v->firstchild ()->tag! = "Ta Ble ") {$ARR [$i] = array (" tag "= + $v->tag," text "=>trim ($v->plaintext));} if (! $v->firstchild ()) {$Arr[$i] = [] = Array ("tag" + $v->tag, "text" =>trim ($v->plaintext)); }}}}/** * Parse the expression inside the HTML * @param $value htmldom * **/Private Function Analysishtmldom ($value) {$tmp = array (); if ($val Ue->has_child ()) {foreach ($value->children as $k = + $v) {$this->analysishtmldom ($v);}} else {if ($value-&gt Tag = = "a") {$tmp = array ("tag" = = $value->tag, "href" = + $value->href, "text" = $value->innertext);} else if ($value->tag = = "img") {$src = $this->unescape ($value->src); $UrlArr = Parse_url ($SRC); if (!isset ($UrlAr r[' Host ']) {$src = $this->host. $value->src; $UrlArr = Parse_url ($SRC);} $src = $this->getimagefromnet ($src, $U Rlarr);  Indicates that there is a network picture and needs to download if ($src) {$imgsArr = $this->gd ($SRC); $tmp = Array ("tag" and $value->tag, "src" + = $src, "text" = = $value->alt, "width" and "$imgsArr [' width ']," Height "= $imgsArr [' height ']); }} else {$tmp = array ("tag" = = $value->tag, "text" =>strip_tags ($value->innertext));} $this->linetextaRr[] = $tmp; }}/** * According to the GD library to get the picture if too much, to do proportional compression * **/private Function GD ($SRC) {list ($width, $height, $type, $attr) = getimagesize ($src ); if ($width > | | $height >) {$width = $width/2; $height = $height/2;} return Array ("width" = + $width, "Heig HT "= $height); /** * Transfer Uincode encoding back to the original character * **/public function unescape ($str) {$str = Rawurldecode ($STR); Preg_match_all ("/(?:%u.{4}) |. {4};|\d+;|.+/u ", $str, $r); $ar = $r [0]; foreach ($ar as $k + $v) {if (substr ($v, 0,2) = = "%u") {$ar [$k] = Iconv ("Ucs-2be", "UTF-8", Pack ("H4", substr ($v, -4)));} El Seif (substr ($v, 0,3) = = "") {$ar [$k] = Iconv ("Ucs-2be", "UTF-8", Pack ("H4", substr ($v, 3,-1))), ElseIf (substr ($v, 0,2) = = " ") {$ar [$k] = Iconv (" Ucs-2be "," UTF-8 ", Pack (" n ", substr ($v, 2,-1)));}} return join ("", $ar);} /** * Image Download * @param $Src target resource * @param $UrlArr target URL corresponding to the array * **/private function getimagefromnet ($SRC, $UrlArr) {$file = b Asename ($UrlArr [' path ']); $ext = Explode ('. ', $file); $this->imgdir = $this->currentdir. " /". $URLARR [' HOst ']; $_supportedimagetypes = array (' jpg ', ' jpeg ', ' gif ', ' PNG ', ' BMP ', ' TIF ', ' TIFF '); if (Isset ($ext [' 1 ']) && in_array ($ext [' 1 '],$_supportedimagetypes) {$file = file_get_contents ($SRC); $this- >httprequestarr[] = $SRC; $this->httprequesttime++; $this->_mkdir (); Create a directory, or collect errors $imgName = MD5 ($UrlArr [' Path ']). ".". $ext [' 1 ']; File_put_contents ($this->imgdir. " /". $imgName, $file); $this->downimg++; return $URLARR [' Host ']. /". $imgName; } return false; }/** * Create directory * **/private Function _mkdir () {if (!is_dir ($this->imgdir)) {if (!mkdir ($this->imgdir, "7777")) {$this ->error[] = Error_get_last (); }}}/** * Constructs worddom * **/private Function Createworddom () {$PHPWord = new Phpword (); $PHPWord->setdefaultfontname (' Song body '); $PHPWord->setdefaultfontsize ("11"); $styleTable = Array (' bordersize ' =>6, ' bordercolor ' = ' 006699 ', ' Cellmargin ' =>120); New Portrait Section $section = $PHPWord->createsection (); $section->addtext ($this->details (), Array (),Array (' spacing ' =>120)); Data processing foreach ($this->linetextarr as $key + $lineArr) {if (Isset ($lineArr [' tag ')]) {if ($lineArr [' tag '] = = "Li") {$section->addlistitem ($lineArr [' Text '],0, "", "", Array (' spacing ' =>120));} else if ($lineArr [' tag '] = = "img") {$section->addimage ($lineArr [' src '],array (' width ' = $lineArr [' width '], ' Height ' = ' $lineArr [' height '], ' align ' = ' center '); } else if ($lineArr [' tag '] = = "P") {$section->addtext ($lineArr [' Text '],array (), Array (' Spacing ' =>120)}} else if ($key = = "Table") {$PHPWord->addtablestyle (' Myowntablestyle ', $styleTable); $table = $section->addtable (" Myowntablestyle "); foreach ($lineArr as $key + $tr) {$table->addrow (); foreach ($tr as $ky + = $td) {$table->addcell AddText ($td [' text ']); }}}} $this->downfile ($PHPWord); The Public Function Details () {$msg = "altogether requested: {$this->httprequesttime} times, the total downloaded picture has {$this->downimg}, and the download is completed approximately using the time: {$ This->expendtime} seconds, the entire program executes about memory consumption is: {$this->expendmemory}kb,"; return $msg; Public Function Downfile ($PHPWord) {if (Empty ($this->filename)) {$URLARR = Parse_url ($this->url); $this filename = $UrlArr [' Host ']. Docx "; }//Save File $objWriter = Phpword_iofactory::createwriter ($PHPWord, ' Word2007 '); $objWriter->save ($this->filename); Header ("Pragma:public"); Header ("expires:0"); Header ("Cache-control:must-revalidate, Post-check=0, pre-check=0"); Header ("Cache-control:public"); Header ("Content-description:file Transfer"); Use the switch-generated content-type header (' Content-type:application/msword ');//output type//force the download $header = "content-disposition:attachment; Filename= ". $this->filename."; "; Header ($header); @readfile ($this->filename); }}

The above code focuses on the sense not of word generation, but rather the use of simplehtmldom, which is an open source HTML parser that was mentioned before, these days looking at his code,

Leads to two learning directions

① is in an expression

② This extended function collation

Look at the source code harvest:

PHP exceptions can be captured, and PHP errors can be captured.

Error_get_last ()//Use this function to capture PHP errors in the page, No.

Summary: The above is the entire content of this article, I hope to be able to help you learn.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.