Objective
To achieve acquisition, the default method is based on the allocation of good collection rules, in the browser to submit the corresponding parameters can be completed after the collection of all movements.
And the realization of timing collection, and manual in the browser to submit some differences, mainly in two major steps:
First, the collection of the URL and related parameters, access to this URL can achieve the desired collection mode. (this URL is submitted directly in the browser can also achieve the acquisition)
Second, the regular access to this URL to the function of the system to the timing of the task, the implementation of the unattended Time collection.
The specific implementation method please refer to the following content:
1, the acquisition of the configuration file interpretation
Any collection will use two acquisition profiles (corresponding to the background collection rule configuration) and can be opened with a text editor.
One of the/configs/article/collectsite.php is the total collection site configuration, recorded the altogether allow to collect which several sites.
It contains something similar to the following:
$jieqiCollectsite [' 1 '] [' name '] = ' collection site One ';
$jieqiCollectsite [' 1 '] [' config '] = ' abc_com ';
$jieqiCollectsite [' 1 '] [' url '] = ' http://www.abc.com ';
$jieqiCollectsite [' 1 '] [' subarticleid '] = ' floor ($articleid/1000) ';
$jieqiCollectsite [' 1 '] [' enable '] = ' 1 ';
$jieqiCollectsite [' 2 '] [' name '] = ' collect site two ';
$jieqiCollectsite [' 2 '] [' config '] = ' def_net ';
$jieqiCollectsite [' 2 '] [' url '] = ' http://www.def.net ';
$jieqiCollectsite [' 2 '] [' subarticleid '] = ';
$jieqiCollectsite [' 2 '] [' enable '] = ' 1 ';
The meaning of the parameter is explained as follows:
[' 1 ']-here the 1 indicates the number of the collection site, different collection station serial number can not be repeated.
[' Name ']-collection site name.
[' config ']-website English logo, this site collection rules configuration file, such as this value is abc_com, then the acquisition rule configuration file is/configs/article/site_abc_com.php.
[' URL ']-collect website URL.
[' Subarticleid ']-collection site, article sub-serial operation method, this project mainly to compatible with the previous program, the new version of the article sub serial number can be obtained through acquisition.
[' Enable ']-whether to allow acquisition, 1 for the permission, 0 for the prohibition, the default is 1.
As mentioned above, each collection site has a special collection rules configuration file,/configs/article/directory to site_ the beginning of the PHP file, such as/configs/article/site_abc_com.php.
Inside the content and background collection rules set corresponds to, the details are not explained. Need to understand that the contents of this document are divided into two parts, the previous content is the site content collection rules of the configuration, and the last $jieqiCollect [' Listcollect '] [' 0 '], $jieqiCollect [' Listcollect '] [' 1 '] This setting is the site "batch collection rules" configuration, such as by the recent update collection, according to the list collection, you can set up multiple. [' 0 '] The number 0 here means the serial number of the batch collection, and the same site cannot be duplicated.
2. Write the URL and parameter of the collection content
The collection here is aimed at a number of articles in bulk collection, divided into two modes:
A, according to the page bulk collection, such as collecting the latest update list or list, each link to collect a page.
The link format is as follows:
http://www.jb51.net/modules/article/admin/pagecollect.php?action=collect&siteid=1&collectname=0& startpageid=1&maxpagenum=1¬addnew=0&jieqi_username=admin&jieqi_userpassword=1234
The meaning of the parameter is explained as follows:
Www.jb51.net-refers to your URL.
Action-string, the action command executed by the program, the fixed value is collect.
SiteID-Digital type, to collect the site number, specific which site corresponds to what serial number see configuration file collectsite.php.
Collectname-Number type, by page batch collection of category ordinal number, see configuration file site_xxxx.php inside. The number configured $jieqiCollect [' Listcollect '] [' 0 '].
Startpageid-the page number sign, which indicates that the collection begins on the first page of the list. Typically numeric types, some sites may also be strings.
Maxpagenum--The number type, which indicates a total collection of pages. (The default is 1, if you want to collect more than one page, you need to jump the browser, only in Windows environment when the browser is effective, the Linux down with wget time can only collect a page, need to collect multiple pages can set up a number of acquisition commands.) )
Notaddnew--Number type, 0-means to collect all the articles, 1-indicates that only updates to the site already have articles.
Jieqi_username-String, user name (this user must be the site has permission to collect users).
Jieqi_userpassword-string, user password.
Second, according to the article Serial number collection
The link format is as follows:
http://www.jb51.net/modules/article/admin/batchcollect.php?action=bcollect&siteid=1&batchids= 123,234,345&jieqi_username=admin&jieqi_userpassword=1234
The meaning of the parameter is explained as follows:
Www.jb51.net-refers to your URL.
Action-string, the action command executed by the program, the fixed value is bcollect.
SiteID-Digital type, to collect the site number, specific which site corresponds to what serial number see configuration file collectsite.php.
Batchids-to collect each other's website article serial number (not the local article ordinal number), gathers several articles, the serial number is separated by the English comma, like 123,234,345.
Jieqi_username-String, user name (this user must be the site has permission to collect users).
Jieqi_userpassword-string, user password.
Note: A URL needs to be placed in IE browser submission, the entire URL maximum length not more than 2083 bytes, so the general recommendation here is not set to the URL too long, the article can be split into multiple URLs.
3, the use of system tasks to achieve timing collection
One, the Windows environment practices
Windows can use the system's mission plan to implement the timer execution program, but first need to make a batch file, in this file with commands to invoke the browser to execute the collection URL. Note that the command can only open the browser and will not be automatically closed after collection, to achieve automatic shutdown can be achieved through JavaScript. The JS code that closes this window automatically is:
<script language= "JavaScript" > Self.opener=null; SetTimeout ("Window.close ();", 3000); </script>
Parameter 3000 here refers to the delayed shutdown time, in milliseconds, and 3000 for a delay of 3 seconds.
This code can be added in two places:
One is added to the Cue information template/themes/style name/msgwin.html inside,<body> and </body> to join the above section of JS. The effect is that any hint information page for the entire system will automatically close after 3 seconds.
If you want to automatically turn off the prompt page after the successful capture, you can add the above JavaScript in the language pack that collects the message, which is/modules/article/lang/lang_collect.php, which $jieqiLang [' article '] [' Batch_collect_success '] is the message of the success of the collection, this value is originally:
' Congratulations, all the articles collection complete! ';
Switch to the following so you can automatically turn off
' Congratulations, all the articles collection complete! <script language= "JavaScript" > Self.opener=null; SetTimeout ("Window.close ();", 3000); </script> ';
Current 1/2 page
12 Next read the full text