First, nutch1.2
The steps are much the same as two, and you need two more actions in step 5, the Configuration build path: Right-click > Build path > Configure built path on the left Package Explorer's nutch1.2 folder ... > Select the source option > Default output folder: Modify Nutch1.2/bin to Nutch1.2/_bin, right-click the Bin folder on the left Package Explorer nutch1.2 folder > Team > Restore
The yellow background part is the difference of the version number, the red part is not the 1.2 version, the green part is not the same place, as follows:
1, Add JARs ... > nutch1.2 > lib, select All. jar Files > OK
2, Crawl-urlfilter.txt
3, will crawl-urlfilter.txt.template renamed as Crawl-urlfilter.txt
4, modify the Crawl-urlfilter.txt, will
# Accept hosts in my. Domain.name
+^http://([a-z0-9]*\.) *my. domain.name/
# Skip Everything Else
-.
5, cd/home/ysc/workspace/nutch1.2
nutch1.2 is a complete search engine, nutch1.5.1 is just a reptile. nutch1.2 can either submit the index to SOLR or directly generate the Lucene index, nutch1.5.1 can only submit the index to SOLR:
1, CD/HOME/YSC
2, wget http://mirrors.tuna.tsinghua.edu.cn/apache/tomcat/tomcat-7/v7.0.29/bin/apache-tomcat-7.0.29.tar.gz
3, TAR-XVF apache-tomcat-7.0.29.tar.gz
4. On the Build.xml file in the nutch1.2 folder of the left Package Explorer, right-click > Run as > Ant build ... > select war target > Run
5, Cd/home/ysc/workspace/nutch1.2/build
6, Unzip nutch-1.2.war-d nutch-1.2
7, Cp-r Nutch-1.2/home/ysc/apache-tomcat-7.0.29/webapps
8, Vi/home/ysc/apache-tomcat-7.0.29/webapps/nutch-1.2/web-inf/classes/nutch-site.xml
Add the following configuration:
<property>
<name>searcher.dir</name>
<value>/home/ysc/workspace/nutch1.2/data</value>
<description>
Path to root of crawl. This directory was searched (in
Order) for either the file Search-servers.txt, containing a list of
Distributed search servers, or the directory "index" containing
Merged indexes, or the directory "segments" containing segment
Indexes.
</description>
</property>
9, Vi/home/ysc/apache-tomcat-7.0.29/conf/server.xml
Will
<connector port= "8080" protocol= "http/1.1"
connectiontimeout= "20000"
redirectport= "8443"/>
To
<connector port= "8080" protocol= "http/1.1"
connectiontimeout= "20000"
Redirectport= "8443" uriencoding= "Utf-8"/>
10, Cd/home/ysc/apache-tomcat-7.0.29/bin
11./startup.sh
12. Visit: http://localhost:8080/nutch-1.2/
For more bug fixes and information on nutch1.2, please refer to the resources I released in CSDN: Http://download.csdn.net/user/yangshangchuan