Batchcollect pagecollect from the official Jieqi timing collection configuration method parameters details page 1/2

Source: Internet
Author: User

Preface

To achieve collection, the default method is to submit the corresponding parameters in the browser according to the configured collection rules to complete all subsequent collection and receiving operations.
The implementation of timed collection is different from manual submission in the browser, mainly in two steps:
1. Compile the collection URL and related parameters. Access this URL to implement the desired collection mode. (This URL can be directly submitted in the browser for collection)
2. Add the function of timed access to this URL to the system's scheduled tasks to implement unattended timed collection.

For specific implementation methods, see the following:

1. Explanation of the collection configuration file

Two collection configuration files (corresponding to the background collection rule configuration) are used for any collection, which can be opened and viewed in a text editor.
Specifically,/configs/article/collectsite. php configures the total collection site and records which sites are allowed to be collected in total.
It contains the following content:

$ Jieqicollectsite ['1'] ['name'] = 'site 1 ';
$ Jieqicollectsite ['1'] ['config'] = 'abc _ com ';
$ Jieqicollectsite ['1'] ['url'] = 'HTTP: // www.abc.com ';
$ Jieqicollectsite ['1'] ['subarticleid'] = 'floor ($ ArticleID/1000 )';
$ Jieqicollectsite ['1'] ['enable'] = '1 ';

$ Jieqicollectsite ['2'] ['name'] = 'site 2 ';
$ Jieqicollectsite ['2'] ['config'] = 'def _ net ';
$ Jieqicollectsite ['2'] ['url'] = 'HTTP: // www.def.net ';
$ Jieqicollectsite ['2'] ['barticleid'] = '';
$ Jieqicollectsite ['2'] ['enable'] = '1 ';

The parameter description is as follows:
['1']-1 indicates the number of the website to be collected. Different website numbers cannot be repeated.
['Name']-collect the website name.
['Config']-English website identity. This website collection rule configuration file is related. For example, if this value is abc_com, the collection rule configuration file is/configs/article/site_abc_com.php.
['Url']-collect website URLs.
['Subarticleid']-website collection,ArticleThe subserial number calculation method. This project is designed to be compatible with the previousProgramIn the new version, the document subnumbers can be obtained through collection.
['Enable']-Indicates whether collection is allowed. 1 indicates yes, and 0 indicates no. The default value is 1.

As mentioned above, each collection website has a special collection rule configuration file. php files starting with site _ in the/configs/article/directory, such as/configs/article/site_abc_com.php.

The content in it corresponds to the background collection rule settings. The specific details are not described in detail. You need to know that the content in this file is divided into two parts, the previous content is the configuration of website content collection rules, and the last side is $ jieqicollect ['listcollect '] ['0'], $ jieqicollect ['listcollect '] ['1'] is the configuration of "batch collection rules" for websites, such as collection by recent updates and collection by rankings, you can set multiple. ['0'] Here, the number 0 indicates the serial number of the batch collection type, and the same website cannot be repeated.

2. Compile the URL and parameters of the collected content

The collection here is for batch collection of multiple articles in two modes:
1. Collect data by page in batches, for example, collecting the latest update list or ranking list. Each link collects one page.
The URL format is as follows:

Http://www.jb51.net/modules/article/admin/pagecollect.php? Action = collect & siteid = 1 & collectname = 0 & startpageid = 1 & maxpagenum = 1 & notaddnew = 0 & jieqi_username = Admin & jieqi_userpassword = 1234

The parameter description is as follows:
Www.jb51.net-indicates your website.
Action-string, the Action Command executed by the program, with a fixed value of collect.
Siteid-number type: the number of the website to be collected. For details about the website serial number, see the configuration file collectsite. php.
Collectname-number type, which is collected by page in batches. For details, see the following section in the configuration file site_xxxx.php. $ Jieqicollect ['listcollect '] ['0.
Startpageid -- page number marker, which indicates the page number of the list. It is generally a number, and some websites may also be strings.
Maxpagenum-numeric type, indicating the total number of pages collected. (The default value is 1. If you want to collect multiple pages, you need to jump to the browser. this parameter is valid only when you call a browser in windows. In Linux, you can only collect one page when wget is called, you can set multiple collection commands to collect multiple pages .)
Notaddnew -- number type, 0-Indicates collecting all articles, 1-Indicates updating only existing articles on this site.
Jieqi_username-string, user name (this user must be the user with the permission to collect on this site ).
Jieqi_userpassword-string, user password.

2. Collect data in batches by document serial number
The URL format is as follows:
Http://www.jb51.net/modules/article/admin/batchcollect.php? Action = bcollect & siteid = 1 & batchids = 123,234,345 & jieqi_username = Admin & jieqi_userpassword = 1234

The parameter description is as follows:
Www.jb51.net-indicates your website.
Action-string, the Action Command executed by the program. The fixed value is bcollect.
Siteid-number type: the number of the website to be collected. For details about the website serial number, see the configuration file collectsite. php.
Batchids-the document serial number of the website to be collected (not the local document serial number). Multiple Articles are collected. The serial numbers are separated by commas (,), for example, 123,234,345.
Jieqi_username-string, user name (this user must be the user with the permission to collect on this site ).
Jieqi_userpassword-string, user password.

Note: When a URL needs to be submitted in the IE browser, the maximum length of the entire URL should not exceed 2083 bytes. Therefore, we recommend that you do not set the URL here to be too long, multiple articles can be split into multiple URLs.

3. Use System Tasks for Scheduled Collection

I. Practices in Windows

In Windows, you can use the system task plan to implement the scheduled execution program. However, you must first create a batch processing file. In this file, use commands to call the browser to execute the collection URL. Note that commands can only be opened in the browser, but will not be automatically closed after collection. To enable automatic close after collection, you can use JavaScript. Automatically close Js in this windowCodeIs:

<Script language = "JavaScript"> self. Opener = NULL; setTimeout ("window. Close ();", 3000); </SCRIPT>

Here, parameter 3000 refers to the delay closing time, in milliseconds, and 3000 indicates that the delay is 3 seconds.
This code can be added in two places:

One is added to the prompt information template/themes/style name/msgwin.html, and the above JS section is added between <body> and </body>. In this way, the system will automatically shut down any prompt page three seconds later.

If you want to automatically disable the prompt page after successful collection, you can add the above JavaScript to the Language Pack for the Collection prompt information. The configuration file is/modules/article/lang/lang_collect.php, $ jieqilang ['Article'] ['batch _ collect_success '] indicates a successful collection prompt. The original value is:

'Congratulations, all articles have been collected! ';

Change it to the following to automatically disable it.

'Congratulations, all articles have been collected! <Script language = "JavaScript"> self. Opener = NULL; setTimeout ("window. Close ();", 3000); </SCRIPT> ';

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.