PHP implementation converts HTML pages into Word and saves methods _php tips

Source: Internet
Author: User
Tags create directory explode pack php database php programming php regular expression phpword zip

This article is an example of how the PHP implementation converts HTML pages into Word and saves them. Share to everyone for your reference, specific as follows:

Here is a tool used to use a PHP called: Phpword.

The principle of generating word is to compress the heap-defined XML into a ZIP package, and change the suffix name to Doc or docx.

So using Phpword, you need to install Zip.dll compression extensions for your PHP environment, I wrote a demo.

Function Description:

Acquisition of <p> tags and <ol> list tags in 20150507-html
20150508-New to get the image features in the article
20150509-New line spacing, and filter the error picture
20150514-Add table processing and change code to object oriented
20150519-Add GD library processing network pictures

Require_once ' phpword.php ';
Require_once ' SimpleHtmlDom.class.php ';
 Class word{private $url;
 Private $LINETEXTARR = Array ();
 Public $CurrentDir; Public $error = Array ();
 Error array Public $filename = null;
 Public $Allowtag = "p,ol,ul,table";
 /** Data Statistics **/public $DownImg = 0;
 Public $expendTime = 0;
 Public $HttpRequestTime = 0;
 Public $ContentLen = 0;
 Public $HTTPREQUESTARR = Array ();
 Public $expendmemory = 0;
 Public function __construct ($url) {$startTime = $this->_time ();
 $startMemory = $this->_memory ();
 $this->url = $url;
 $URLARR = Parse_url ($this->url); $this->host = $URLARR ["scheme"]. ":/
 /". $URLARR [' Host '];
 $this->currentdir = GETCWD ();
 $this->linetextarr["Table" = Array ();
 $html = new Simple_html_dom ($this->url);
 $this->httprequestarr[] = $this->url;
 $this->httprequesttime++; foreach ($html->find ($this->allowtag) as $key => $value) {if ($value->tag = = "Table") {$this->parsetable ( $value, 0, $this->linetextarr["Table"]);
 else {$this->analysishtmldom ($value);
 } $this->error[] = Error_get_last ();
 $endTime = $this->_time ();
 $endMemory = $this->_memory (); $this->expendtime = Round (($endTime-$startTime), 2); microseconds $this->expendmemory = Round (($endMemory-$startMemory)/1000,2);
 Bytes $this->createworddom ();
 Private Function _time () {return array_sum (Explode ("", Microtime ());
 Private Function _memory () {return memory_get_usage (); /** * Parses the table in HTML, taking into account the situation of multiple table nesting * @param $value htmldom * @param $i Traversal level * **/Private function parsetable ($ Value, $i, $ARR) {if ($value->firstchild () && In_array ($value->firstchild ()->tag,array ("Table", "
 Tbody "," Thead "," Tfoot "," tr ")) {foreach ($value->children as $k => $v) {$this->parsetable ($v, $i + +, $ARR);  } else {foreach ($value->children as $k => $v) {if ($v->firstchild () && $v->firstchild ()->tag != "table" {$ARR [$i] = array ("tag" => $v-&Gt;tag, "text" =>trim ($v->plaintext));
 } if (! $v->firstchild ()) {$ARR [$i] = array ("tag" => $v->tag, "text" =>trim ($v->plaintext)); /** * Parsing the expression inside HTML * @param $value htmldom * **/Private Function Analysishtmldom ($value) {$tmp = array (
 );
 if ($value->has_child ()) {foreach ($value->children as $k => $v) {$this->analysishtmldom ($v); } else {if ($value->tag = = "a") {$tmp = array ("tag" => $value->tag, "href" => $value->href, "text" =>$
 Value->innertext);
 else if ($value->tag = "img") {$src = $this->unescape ($value->src);
 $URLARR = Parse_url ($SRC);
 if (!isset ($URLARR [' Host ']) {$src = $this->host. $value->src;
 $URLARR = Parse_url ($SRC); $SRC = $this->getimagefromnet ($src, $URLARR);
  Indicates that there is a network picture, you need to download if ($src) {$imgsArr = $this->gd ($SRC); $tmp = Array ("tag" => $value->tag, "src" => $src, "text" => $value->alt, "width" => $imgsArr [' width '], " Height "=> $imgsArr [' height ']); } else {$tmp = array ("tag" => $value->tag, "text" =>strip_tags ($value->innertext));
 } $this->linetextarr[] = $tmp; }/** * According to the GD library to get pictures if too much, do proportional compression * **/private Function GD ($SRC) {list ($width, $height, $type, $attr) = Getimagesi
 Ze ($SRC);
 if ($width > | | | $height >) {$width = $width/2;
 $height = $height/2;
 Return Array ("width" => $width, "height" => $height);
 /** * Transfers Uincode encoding back to the original character * **/public function unescape ($str) {$str = Rawurldecode ($STR);
 Preg_match_all ("/(?:%u.{4}) |& #x. {4};|&#\d+;|.+/u", $str, $r);
 $ar = $r [0];
 foreach ($ar as $k => $v) {if (substr ($v, 0,2) = = "%u") {$ar [$k] = Iconv ("Ucs-2be", "UTF-8", Pack ("H4", substr ($v,-4));
 ElseIf (substr ($v, 0,3) = = "& #x") {$ar [$k] = Iconv ("Ucs-2be", "UTF-8", Pack ("H4", substr ($v, 3,-1));
 ElseIf (substr ($v, 0,2) = = "&#") {$ar [$k] = Iconv ("Ucs-2be", "UTF-8", Pack ("n", substr ($v, 2,-1));
 } return Join ("", $ar);} /** * Image Download * @param $Src target resource * @param $URlarr the target URL corresponds to the array * **/private function getimagefromnet ($SRC, $UrlArr) {$file = basename ($UrlArr [' path ']);
 $ext = Explode ('. ', $file); $this->imgdir = $this->currentdir. "
 /". $URLARR [' Host '];
 $_supportedimagetypes = array (' jpg ', ' jpeg ', ' gif ', ' PNG ', ' BMP ', ' TIF ', ' TIFF ');
 if (Isset ($ext [' 1 ']) && in_array ($ext [' 1 '],$_supportedimagetypes)] {$file = file_get_contents ($SRC);
 $this->httprequestarr[] = $SRC;
 $this->httprequesttime++; $this->_mkdir (); Create the directory, or collect the error $imgName = MD5 ($UrlArr [' Path ']).
 $ext [' 1 ']; File_put_contents ($this->imgdir.)
 /". $imgName, $file);
 $this->downimg++; return $URLARR [' Host ']. "
 /". $imgName;
 return false; 
 /** * Create directory * **/private Function _mkdir () {if (!is_dir ($this->imgdir)) {if (!mkdir ($this->imgdir, "7777"))
 {$this->error[] = Error_get_last ();
 }}/** * Constructs worddom * **/private Function Createworddom () {$PHPWord = new Phpword ();
 $PHPWord->setdefaultfontname (' song Body '); $PHPWord->seTdefaultfontsize ("11");
 $styleTable = Array (' bordersize ' =>6, ' bordercolor ' => ' 006699 ', ' Cellmargin ' =>120);
 New Portrait Section $section = $PHPWord->createsection ();
 $section->addtext ($this->details (), array (), Array (' spacing ' =>120)); Data is processed by foreach ($this->linetextarr as $key => $lineArr) {if ($lineArr [' tag '])) {if ($lineArr [' tag '] = = "Li
 ") {$section->addlistitem ($lineArr [' Text '],0," "," ", Array (' spacing ' =>120)); else if ($lineArr [' tag '] = = "img") {$section->addimage ($lineArr [' src '],array (' width ' => $lineArr [' width '], '
 Height ' => $lineArr [' height '], ' align ' => ' center ');
 else if ($lineArr [' tag '] = = "P") {$section->addtext ($lineArr [' Text '],array (), Array (' spacing ' =>120));
 } else if ($key = = "Table") {$PHPWord->addtablestyle (' Myowntablestyle ', $styleTable);
 $table = $section->addtable ("Myowntablestyle");
 foreach ($lineArr as $key => $tr) {$table->addrow (); foreach ($tr as $ky => $td) {$tablE->addcell (->addtext) ($td [' text ']);
 $this->downfile ($PHPWord)}}}; The Public Function Details () {$msg = "altogether requests: {$this->httprequesttime} times, the total downloaded picture has {$this->downimg}, and the download completes approximately use time: {$
 This->expendtime} seconds, the entire program to execute approximately consumes memory is: {$this->expendmemory}kb, ";
 return $msg;
 The Public Function Downfile ($PHPWord) {if (Empty ($this->filename)) {$URLARR = Parse_url ($this->url); $this->filename = $UrlArr [' Host '].
 Docx ";
 }//Save File $objWriter = Phpword_iofactory::createwriter ($PHPWord, ' Word2007 ');
 $objWriter->save ($this->filename);
 Header ("Pragma:public");
 Header ("expires:0");
 Header ("Cache-control:must-revalidate, Post-check=0, pre-check=0");
 Header ("Cache-control:public");
 Header ("Content-description:file Transfer"); Use the switch-generated content-type header (' Content-type:application/msword ');/output type//force the download $header = "content-disposition:attachment;
 Filename= ". $this->filename."; ";
 Header ($header); @readfile ($This->filename);

 }
}

The above code focus does not feel word generated, but simplehtmldom use, this is an open source HTML parser, mentioned before, these days looking at his code,

Leads to two learning directions.

① is expressing

② This extended function collation

Look at the source code harvest:

PHP exceptions can be captured, and PHP errors can be captured.

Error_get_last ()///Use this function to capture PHP errors in the page, thank you.

More interested in PHP related content readers can view the site topics: "PHP operation Office Document Skills Summary (including word,excel,access,ppt)", "PHP Array" operation Skills Encyclopedia, "PHP Sorting algorithm Summary", " PHP commonly used traversal algorithm and skills summary, "PHP Data structure and algorithm tutorial", "PHP Programming Algorithm Summary", "PHP Mathematical Operation Skills Summary", "PHP Regular Expression Usage summary", "PHP operation and operator Usage Summary", "PHP string (String) Usage Summary" A summary of common PHP database operation techniques

I hope this article will help you with the PHP program design.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.