Absrtact: Dedecms collection function, although compared to other professional acquisition software, but compared with other acquisition procedures, in the performance is still very good. Many other programs are unable to capture the Web page, using Dedecms can be collected. For example, 58 the same city home page,
Dedecms collection function, although compared to other professional acquisition software, but compared to other acquisition procedures, in the performance is still very good. Many other programs are unable to capture the Web page, using Dedecms can be collected. For example, 58 of the same city home page, using the Discuz download function collected are a blank or warning content, but the use of Dedecms download can be downloaded completely.
Principle of Dede Acquisition program
Dedecms collection principle is very simple: through the PHP Program socket simulation HTTP request, download the entire page of HTML. But there is a deficiency in it--partial collection is not supported. If we just get the title of the other page, we download the entire page. One or two doesn't matter, but a lot of downloads are crowding out server resources and bandwidth. For example, the business of the mainland 35dalucom classified information Web site Daquan, the channel contains more than 600 classified information sites, the Web site program automatically regularly get the title of these sites to determine whether these sites can be opened normally, whether the content has changed. If you use the Dede program, directly download the entire page by default instead of just the HTML Head section of the page, it is conceivable how many server resources will be crowding in the long run. At this point we just need to get the title of the other page.
Modify File dedehttpdown.class.php
To make the dedecms realize part of the collection function is very simple, only need to modify the acquisition program file dedehttpdown.class.php 2 places can be. Open/include/dedehttpd.class.php using notepad++ or Dreamweaver:
(1) The 118th line $this->m_html = '; add $this->datalimit = 0 behind;
(2) The No. 285 line $this->m_html. = fgets ($this->m_fp,256); Back add if ($this->datalimit > 0 && strlen ($this->m_html) > $this->datalimit) break; Save it.
How to use:
$remoteURL = ' http://www.***.com/info/fabu/';
$DH = new Dedehttpdown ();
$DH->openurl ($remoteURL);
$DH->datalimit = 1024;
$remoteHTML = $dh->gethtml ();
We only need to $DH->openurl ($remoteURL); $dh->datalimit = 1024 (the byte size you want to collect). In this way, we can save server resources more.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.