Nutch notes (i): Quick Start

Source: Internet
Author: User

Recently used Nutch, the purpose is to target some of the site to crawl its content, and then do analysis.

Nutch Notes is my use of the Nutch process a series of summaries, write down their own learning and share with you, also hope to get everyone's advice

Okay, cut the crap, get to the end, first article: Quick Start, our goal is to run fast and retrieve the results we want.

The first thing to understand is what Nutch is?

Nutch is an open source search engine based on Lucene, which includes all the things you want and is a complete solution.

One: Install JDK

If you already have the JDK installed and you have set the Java_home, skip this step

Installing JDK

Java code

sudo apt-get install sun-java5-jdk

or download the bin file from the Sun company website to perform the installation

Set the Java_home

Java code

sudo vi ~/.bashrc

On the last side increase

Java code

export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun
export PATH=$PATH:$JAVA_HOME/bin

II: Download the latest version of Nutch nutch0.8.1

Java code

wget http://apache.justdn.org/lucene/nutch/nutch-0.8.1.tar.gz

You can release it.

Java code

tar zxvf nutch-0.8.1.tar.gz

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.