How to install Nutch in Windows

Source: Internet
Author: User
Tags tomcat

There would have been a chance to catch a search engine project, but eventually the price of the two sides to talk about collapse. To this end, I feel deeply regret that I have lost an excellent opportunity to practice. But I do not want to give up on the search engine learning and practice, on the internet to hear a lot of people recommend Nutch. So I am going to learn nutch, to learn Nutch, or first from the installation and use Nutch start. The following is a record of my installation of Nutch in the XP SP2 environment.

Install the environment required by Nutch

jdk1.4.x or jdk1.5

tomcat4.x above

Cygwin

Software Download Address:

j2se5.0 http://java.sun.com/javase/downloads/index.html

Tomcat5.5 http://tomcat.apache.org/download-55.cgi

Cygwin http://www.cygwin.com/

Nutch-0.7.2 http://lucene.apache.org/nutch/

Installation steps: (The specific installation directory can be arbitrary)

1, install JDK, I see online nutch support is jdk1.4, but I installed is, jdk1.5, in order to install tomcat5.5

My installation path: F:\project\java\jdk5

2, installation Cygwin, methods on the Internet a lot, I recommend the installation of local installation version

My installation path: E:\Program files\cygwin\

3, the installation of Tomcat,nutch instructions to support Tomcat 4.3, I installed is tomcat5.5

My installation path: F:\project\Tomcat 5.5

4. Installation Nutch-0.7.1.zip

Unzip the downloaded compressed package to: F:\project\nutch-0.7.2\

Configuration steps:

1, configure the environment in the Cygwin

E:\Program Files\cygwin\etc\profile

Path= "/usr/local/bin:/usr/bin:/bin: $PATH:/cygdrive/f/project/java/jdk5"

Export Nutch_java_home=/cygdrive/f/project/java/jdk5

Export Java_home=/cygdrive/f/project/java/jdk5

2. Configure Nutch

1 Configure the crawl filter to determine the site address to crawl

Open F:\project\nutch-0.7.2\conf\crawl-urlfilter.txt

# Accept hosts in my. Domain.name

+^http://([a-z0-9]*\.) *gucas.ac.cn/

Change the above gucas.ac.cn to the domain name you need to search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.