The content collection system is a good assistant for content-based websites. In addition to original content, other content needs to be collected and sorted by editors or collection systems, then add it to your website. Discuz dvbbs CMS and other products all provide a content collection function internally to collect specified related content. The locomotive collector of a single client can also collect specified content very well. These tools are intended to replace machines with humans, free editors from the work of content handling, and do some higher-end work, such as fine-tuning the content of the collection results, Seo optimization, set precise collection rules to make the collected content more in line with the needs of your website.
The following content collection system is developed from this idea. It consists of two parts:
1. editing of the collection rules used by the editors and the websites used for reviewing, fine-tuning, and publishing the collection results.
2. Timing collectors and time transmitters deployed on servers.
The editor first sets the site to be collected through a collection rule generator (nicecollectoer.exe). After the collection is complete, the editor then uses a web site (pickweb) review, fine-tune, and optimize the collected results and publish them to your website. What Editors need to do is set collection rules and optimize collection results. Other work is done by machines.
Nicepicker is an HTML analyzer used to extract URLs. Both nicecollector and hostcollector use nicepicker to analyze html. nicecollectoer is the collection rule configurator. A target website is set only once:
Similar to the earliest locomotive collector, here we use the blog garden as the target collection site to set up articles in the collection essence area. The collection rules are very simple: when the editors set collection rules, these rules will be saved to the setting. MDB in the same directory as nicecollector.exe. Generally, after the collection rules are set, you do not need to change any more. You need to fine-tune the collection rules again only when the HTML Dom structure of the target website changes. Nicecollector is also used to set and add new target collection sites.
After editing the collection rule settings, put setting. mdb under hostcollector.exe, hostcollector will perform real collection based on the setting. mdb settings, and save the collection results to the database.
At this step, the content collection is completed. Editors can open pickweb, fine-tune and optimize the collection results, and then pass the review and send it to their websites:
The result of sending a collection result to the Taobao website is not completed by pickweb. after the editors complete the internal audit, posttoforum.exe reads the database and sends the collection result that passes the review to its own website. Of course, one is required on its own website. the ashx method is used to receive the collection result. If you do not recommend that posttoformu.exe directly operate the database on your website, it is best to use an API on your website to receive the collection result.
Nicecollectoer, hostcollector, pickweb, and posttoforum have basically completed collection and sending. hostcollector, pickweb, and posttoforum collector are a Windows service, it is used to periodically call hostcollector and run installutil/I hostrunnerservice.exe on the console as an administrator to install the Windows Service:
The hostrunnerservice configuration is also simple:
In runtime.txt, set the number of times that are collected on a daily basis:
After the new content is collected, the editors need to regularly log on to pickweb to optimize, fine-tune, and review the new content. You can also set the default review to pass. Similarly, posttoforum is also called cyclically. It is used for sending and reviewing new content. callsenderservice.exe is similar to hostrunnerservice.exe. It is also a Windows serviceand posttoformu.exe is used for future calls.
The entire system is basically finished here. In addition, there are two other things: selfchecker.exe and healthchecker.exe. Selfcheck.exe is used to check whether the rule set in setting. mdb is a valid rule, for example, whether the collection rule sets the content collection item. Healthchecker.exeused to collect logs generated by hostcollector.exe and posttoforum.exe, and then send the logs to the specified system maintenance personnel.
This content collection system still needs to be improved and optimized in many aspects. The current status can only be a prototype. For example, nicepick needs to be further abstracted and reconstructed to provide more interfaces, plug-ins are used to analyze all aspects of HTML. In each analysis step, you can load your own analyzer. More comprehensive collection rule settings are required on nicecollector. You can add some default Seo optimization rules on pickweb, such as batch Seo optimization title content and other aspects.
Executable File Download:
08_453455_if8l_nroutput.rar (the link has been updated)
Source code download:
08_234324_if8l_nicecollector.rar (the link has been updated)