2. Incremental crawler (Incremental Crawler): incremental crawler and batch crawler different, will maintain continuous crawl, for crawling to the webpage, to be updated regularly, because the Internet's Web pages are constantly changing, new pages, Web pages are deleted or Web content changes are common, and incremental crawlers need to reflect this change in a timely manner, so in the continuous crawl process, not crawling new pages, is to update existing Web pages. Generic commercial search engine crawlers are basically this category.
3. Vertical crawler (Focused crawter): vertical crawler focus on specific topics or industry-specific pages, for example, for health sites, only need to find health-related page content from the Internet page, other industry content is not considered. One of the biggest features and difficulties of vertical crawler is how to identify whether the Web content belongs to a specific industry or topic. From the point of view of saving system resources, it is not likely to download all the Internet pages after the screening, so waste of resources is too much, often need crawler in the crawl stage to dynamically identify whether a URL is related to the theme, and try not to catch the pier unrelated pages, in order to achieve the purpose of saving resources. Vertical search sites or vertical industry sites often require this type of crawler.
Incremental crawler, Vertical crawler