Nutch the best guide to the installation and use of related frameworks

Source: Internet
Author: User
Tags modify solr tomcat

First, nutch1.2

The steps are much the same as two, and you need two more actions in step 5, the Configuration build path: Right-click > Build path > Configure built path on the left Package Explorer's nutch1.2 folder ... > Select the source option > Default output folder: Modify Nutch1.2/bin to Nutch1.2/_bin, right-click the Bin folder on the left Package Explorer nutch1.2 folder > Team > Restore

The yellow background part is the difference of the version number, the red part is not the 1.2 version, the green part is not the same place, as follows:

1, Add JARs ... > nutch1.2 > lib, select All. jar Files > OK

2, Crawl-urlfilter.txt

3, will crawl-urlfilter.txt.template renamed as Crawl-urlfilter.txt

4, modify the Crawl-urlfilter.txt, will

# Accept hosts in my. Domain.name

+^http://([a-z0-9]*\.) *my. domain.name/

# Skip Everything Else

-.

5, cd/home/ysc/workspace/nutch1.2

nutch1.2 is a complete search engine, nutch1.5.1 is just a reptile. nutch1.2 can either submit the index to SOLR or directly generate the Lucene index, nutch1.5.1 can only submit the index to SOLR:

1, CD/HOME/YSC

2, wget http://mirrors.tuna.tsinghua.edu.cn/apache/tomcat/tomcat-7/v7.0.29/bin/apache-tomcat-7.0.29.tar.gz

3, TAR-XVF apache-tomcat-7.0.29.tar.gz

4. On the Build.xml file in the nutch1.2 folder of the left Package Explorer, right-click > Run as > Ant build ... > select war target > Run

5, Cd/home/ysc/workspace/nutch1.2/build

6, Unzip nutch-1.2.war-d nutch-1.2

7, Cp-r Nutch-1.2/home/ysc/apache-tomcat-7.0.29/webapps

8, Vi/home/ysc/apache-tomcat-7.0.29/webapps/nutch-1.2/web-inf/classes/nutch-site.xml

Add the following configuration:

<property>

<name>searcher.dir</name>

<value>/home/ysc/workspace/nutch1.2/data</value>

<description>

Path to root of crawl. This directory was searched (in

Order) for either the file Search-servers.txt, containing a list of

Distributed search servers, or the directory "index" containing

Merged indexes, or the directory "segments" containing segment

Indexes.

</description>

</property>

9, Vi/home/ysc/apache-tomcat-7.0.29/conf/server.xml

Will

<connector port= "8080" protocol= "http/1.1"

connectiontimeout= "20000"

redirectport= "8443"/>

To

<connector port= "8080" protocol= "http/1.1"

connectiontimeout= "20000"

Redirectport= "8443" uriencoding= "Utf-8"/>

10, Cd/home/ysc/apache-tomcat-7.0.29/bin

11./startup.sh

12. Visit: http://localhost:8080/nutch-1.2/

For more bug fixes and information on nutch1.2, please refer to the resources I released in CSDN: Http://download.csdn.net/user/yangshangchuan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.