Heritrix clicks: 3822
Heritrix is an open-source and scalable Web Crawler project. Heritrixis designed to strictly follow the exclusion instructions and meta robots labels in the robots.txt file.
Websphinx clicks: 2205
Websphinx is an interactive development environment for Java class packages and web crawlers. Web Crawlers (also known as robots or spider) can automatically browse and process web pages. Websphinx consists of two parts: the crawler platform and websphinx class package.
Weblech click count: 1146
Weblech is a powerful web site download and image tool. It supports downloading web sites based on functional requirements and can imitate the behavior of standard Web browsers as much as possible. Weblech has a function console and uses multithreading.
Arale clicks: 995
Arale is mainly designed for personal use, and does not focus on page indexing like other crawlers. Arale can download the entire web site or some resources from the Web site. Arale can also map dynamic pages to static pages.
J-spider clicks: 1432
J-spider: a fully configurable and customized web spider engine. you can use it to check website errors (internal server errors, etc.), check internal and external links of the website, analyze the website structure (you can create a website map), and download the entire website, you can also write a jspider plug-in to expand the functions you need.
Number of clicks on the spindle: 1046
Spindle is a Web index/search tool built on the Lucene toolkit. It includes an HTTP spider used to create indexes and a search class used to search for these indexes. The spindle project provides a set of JSP tag libraries so that JSP-based sites can add search functions without developing any Java classes.
Arachnid clicks: 899
Arachnid: a Java-based web spider framework. it contains a simple HTML Parser capable of analyzing input streams containing HTML content. by implementing the arachnid subclass, you can develop a simple web spiders and add several lines of code to call after each page on the web site is parsed. The Arachnid download package contains two spider application examples to demonstrate how to use the framework.
Larm clicks: 1387
Larm can provide a pure Java search solution for users of the Jakarta Lucene search engine framework. It contains the methods for indexing files, database tables, and web sites.
Jobo clicks: 1091
Jobo is a simple tool for downloading the entire web site. It is essentially a web spider. Compared with other download tools, the main advantage is the ability to automatically fill the form (such as automatic logon) and use cookies to Process sessions. Jobo also has flexible download rules (such as the URL, size, and Mime Type of a webpage) to restrict download.
Snoics-reptile clicks: 454
Snoics-reptile is a Java-only tool used to capture website images. You can use the URL entry provided in the preparation file, capture all the resources on the website that can be obtained by using a browser through get to the local device, including webpages and various types of files, such: images, Flash files, MP3 files, zip files, RAR files, and exe files. The entire website can be completely stored in the hard disk, and the original website structure can be kept accurate. You only need to put the captured website on a Web server (such as APACHE) to implement a complete website image.
:
Snoics-reptile2.0.part1.rar
Snoics-reptile2.0.part2.rar
Snoics-reptile2.0-doc.rar
Web-harvest clicks: 128
Web-harvest is a Java open source web data extraction tool. It can collect specified web pages and extract useful data from these pages. Web-harvest mainly uses technologies such as XSLT, XQuery, and regular expressions to perform text/XML operations.