Open-source java CMS and open-source javacms
Address: http://javaz.cn/site/javaz/site_study/info/2015/23312.html
Project address:Http://www.freeteam.cn/
Web Page Information Collection
Supported since FreeCMS 2.1
The target webpage information can be captured through simple configuration. incremental collection, keyword replacement, and timed collection are supported. The same collection rule can collect multiple pages (static and dynamic) and multiple information attributes, the static information page can be automatically reviewed.
Collection rule management
Click collection rules from the Management menu on the left to enter.
Add collection rules
Click "add" at the bottom of the collection rule list.
Enter relevant properties and click "save.
Collection rule attributes
Collection rule attributes include basic, set, collection address, collection attribute, and keyword replacement.
Generally, you only need to fill in relevant properties on the basic tab. If you need more advanced settings, you can use the following tabs.
The following describes the main attributes.
Name: the name of the collection rule.
Collected to topic: the collected information must be added to that topic.
Page encoding: The page encoding of the target page, which defaults to the UTF-8.
Collection address: the address of the target webpage. You can set only one entry on the basic tab. You can set multiple entries on the collection address tab.
Collection Scheduling: it is very important to set a Scheduled Collection operation. Only the Collection Scheduling system is configured to perform the collection operation.
Content list begins to end html: because the system extracts information attributes by intercepting keywords on the content of the target webpage, it is very important to set the start and end html of the Target attribute, you must set it to a relatively unique starting and ending html so that the system can correctly intercept the target attribute. This attribute is mainly used to capture the html of the target page information list.
Content address begins to end html: after obtaining the content list html based on the preceding attributes, use this attribute to intercept the content addresses.
Content title begins to end html: After the content address is obtained based on the preceding attribute, the system captures the webpage content of the content address and then intercepts the content title based on this attribute. The setting of content-related attributes is similar to that of this attribute.
Status: the collection rule in the enabled status is executed by the system.
Capture images: Download images from the information content to your local device.
Automatic approval: sets the collected information directly to the audited status.
Click volume of collected information: by default, the number of clicks on the collected information is 0. After this attribute and the number of clicks on the Content start to end html, the system intercepts the clicks on the target information, set to the number of clicks of the collected information.
Maximum number of content to be collected: This attribute is not restricted by default. If this attribute is set, the system will collect statistics on the number of information collected by this collection rule. If the maximum number of content to be collected is exceeded, the system will no longer collect data.
Set the first image as the title image: if there is an image in the information, extract the first image as the title image and set the information as the image information.
Clear html tags in the content: Clear the html tags in the information content and retain the plain text.
Whether to collect when the content is empty: You can set whether to collect this information when the content is empty.
Add time of collection information: by default, the add time of the collected information is the current time. After setting the attribute and content add time to end html, the system will intercept the add time of the target information, set to the time when the information is added after collection.
Add time format for collection information: the default format is yyyy-MM-dd. If the time format for adding the target page is different, set it to the correct date format here.
Collection start time: the default value is the current time. If the collection start time is not reached, the system does not collect data.
Collection End Time: The default value is "Never End". If the collection End Time is exceeded, the system does not collect data.
Content address completion url: because some webpages use relative or absolute paths, you can set the content address prefix.
Image url completion url: because some web pages use relative or absolute paths, you can set the prefix of the image link address.
Link address of tag A in the content to complete the url: because some webpages use relative or absolute paths, you can set the prefix of the url of tag A in the content.
Collection addresses are classified into static and dynamic addresses. Static addresses are fixed addresses. Dynamic addresses generally refer to addresses that can be paged. {page} is used to represent paging variables, you can set to collect the page from that page, such.
Generally, we only collect the title and content of the information. The system also provides the collection content description, click volume, author, source, and add time attribute functions.
Using the keyword replacement function, you can replace the keywords in the collected information with the desired keywords.
Edit collection rules
Select the collection rule to be edited and click "edit.
Note: Only one collection rule can be edited at the same time.
Enter relevant properties and click "save.
Collection
Select the collection rules to be collected and click "collect.
Note: At the same time, only one collection rule can be collected.
Delete collection rules
Select the collection rule to be deleted and click "delete.
Tip: You can delete multiple collection rules at the same time.
To prevent misoperation, the system will prompt you whether to delete it. Click "OK" to complete the deletion operation.
View Collection records
Click Collection records from the left-side navigation pane.
You can view the collection records on all web pages. You can delete a specified collection record, but it does not delete the collected information data. Select the collection record to be deleted, click "delete.
Tip: You can delete multiple Collection records at the same time.
To prevent misoperation, the system will prompt you whether to delete it. Click "OK" to complete the deletion operation.