(2.1) Installation of Nutch1.7 under Windows

Source: Internet
Author: User

Hotel Reviews Affective Analysis System (II.)-- Nutch installation

First, the demand part

    1. Nutch is Java-developed, so Java JDK needs to be downloaded.

http://java.sun.com/javase/downloads/index.jsp

2. Nutch's Demo search page is JSP and requires Tomcat to do the server.

:http://jakarta.apache.org/tomcat/

3. Nutch scripts are written in the Linux shell, so a shell interpreter is required on the Windows platform. Cygwin is a simulated Linux system program under Windows. (Note that you do not need to download this program under Linux)

: http://www.cygwin.com/

4. Nutch:http://lucene.apache.org/nutch/

Second, the environment

    1. Operating system: WINDOWS7,X86,32 bit
    2. Java JDK1.6
    3. Tomcat 7.0
    4. Cygwin2.850
    5. Nutch1.7

Third, installation steps

1. Java JDK Installation

Note : The path name does not take Chinese, the recommended path does not have a space, the first time I chose the path with a space C:\Program Files, the execution of the crawl command when the error occurred:

The C:\Program directory is not found, the reason for this problem is because: C:\Program files\ in the middle of a space, so as to enter the program Files, and can only enter the program, but there is no program folder in the C drive.

After installation, set environment variables, win7 environment variables and XP, in the system variables or user variables are OK. Assuming your JDK is installed in c:\jdk1.6, configure it as follows:

java_home=c:\jdk1.6

Classpath=. ;%java_home%\lib\dt.jar;%java_home%\lib\tools.jar; Must not be less because it represents the current path)

Path=%java_home%\bin

After the variable is installed, enter "CMD" in the Run to open the command line, enter "Java" Separately, "java–version" if the specific information is displayed without error, then the installation succeeds, such as:


If you do not print out this sentence, you need to carefully check your configuration situation.

2. Tomcat-Free Installation

Here's a question to note:

You need to download the version of Tomcat that matches the JDK, such as:

My JDK version is 1.6, and then before loading Tomcat8.0, configure the path, the point startup.bat when the flash-off phenomenon.

To extract tomcat to a directory without Chinese, set environment variables:

(1) Variable name: Tomcat_home Variable Value:

H:\tomcat7.0 (Tomcat extracted to the directory)

(2) Variable name: Catalina_home Variable Value:

H:\tomcat7.0

(3) Modify variable: Path variable Value:

Add the following at the end;%catalina_home%\bin;%catalina_home%\lib

Run Tomcat7.0, start, run, input cmd, enter the following path

Enter Startup.bat at the command prompt, and the Tomcat command box will pop up to output the boot log;

Then open the browser input http://localhost:8080/ , if you enter the Tomcat welcome interface, then congratulations, the configuration is successful.

Tomcat's running and stopping files are Startup.bat and Shutdown.bat, respectively.

3. Cygwin installation

After you run the Setup program, such as:

You can choose a Web address casually:

This step, we choose to download the installed component package, in order to enable our installed Cygwin to compile the program, we need to install the GCC compiler, by default, GCC will not be installed, we need to select it to install. In order to install GCC, we use the mouse to click on the "Devel" branch in the component list, where there are many components, we must:

binutils , GCC , Gcc-mingw, GDB

Binutils components:

GCC components:

GDB components:

GCC-MINGW components:

When you are finished, choose Next:

The time of installation depends on the components you select and the network conditions.

4. Nutch Installation

Nutch is a Java-implemented web crawler, and the results of crawling are stored in database (a series of files and directories under a specified file path) for SOLR or Lucene indexing and retrieval.

A list of basic features of common search-related frameworks:

Crawl

Index

Retrieval

Nutch

Solr

Lucene

Download the installation apache-nutch-1.7-bin.zip and set it up. :http://archive.apache.org/dist/nutch/

After the download is complete, unzip the Nutch binary bundle, (I unzipped in: H:\nutch\nutch1.7) directory as follows:

L Bin directory, contains only one executable file Nutch

L conf directory, configuration parameters for nutch command execution

L Docs catalogue, Javadoc Help

L Lib directory, related Jar class Library

L plugins directory, related plugin library

Set Environment variables:

Variable name Nutch_java_home

Variable Value%java_home% "its value is set to the JDK's installation directory"

Run Cygwin, go to the decompression path where the nutch1.7 is located, and in the input bin/nutch,

Nutch installation was successful.

(2.1) Installation of Nutch1.7 under Windows

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.