Project name |
Crawling is going on |
Project version |
Beta version |
Head |
NEWBE software Team of Computer College of Beihang University |
Contact information |
Http://www.cnblogs.com/newbe |
Request Release date |
2014-12-27 |
1 Update content1.1 Fixing defects
A) No prior consideration of the crawl process, an exception causes the crawl thread to terminate abnormally, but the resource is occupied. As the exception thread increases, the available resources are all occupied, the entire software stops working, and in the new version the exception is considered to release resources to ensure the uninterrupted nature of the crawl.
b) Change the method of updating the database to asynchronous mutually exclusive update mode, ensure that only one thread occupies the database at the same time, ensure the correctness of database data, and avoid sqlexception.
c) Accurate positioning of the target page, reducing the crawl failure rate and missus rate, to ensure the efficiency and correctness of the crawler.
1.2 New features
A) to crawl the quiz page, users can choose to crawl different websites according to their interests or crawl all the sites given in the current version to meet the pipeline group requirements.
b) The interface is typeset and optimized.
c) New analysis function, statistics of the type and number of crawl files in the database, in the form of a pie chart display. The crawl process is also displayed in a dynamic bar chart form.
2 Environmental requirements
Operating system requirements |
WINDOWS Xp,windows 7,windows 8 |
Operational environment Requirements |
Need to install the latest version of the JRE |
Database requirements |
Direct connection to the server's database in a networked environment with no special requirements for the local database |
3 Installation Instructions
Copy the Jar software locally and run it.
4 Known defects and limitations
When crawling a quiz site, some sites are small, and when all pages are crawled, it still hinders other web site threads from taking up resources, causing the crawl to slowly slow down.
5 Release Address
This version of the Code and program published on the server 219.224.191.25, you can download the trial itself.
Release Notes for beta release