Run Nutch2.3 in eclipse

Source: Internet
Author: User
Tags svn


Reference Http://wiki.apache.org/nutch/RunNutchInEclipse


First, the Environment preparation

1. Download nutch2.3 Source code

wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz
or download the latest version in development
SVN Co https://svn.apache.org/repos/asf/nutch/branches/2.x


2. Select the type of database to use, take hbase as an example
In Conf/nutch.xml, add the following properties:

<property>  <name>storage.data.store.class</name>  <value> Org.apache.gora.hbase.store.hbasestore</value>  <description>default class for storing data</ Description> </property>


3. Add hbase-related dependencies in Ivy/ivy.xml, this item already exists, but is commented out, remove the comment

<dependency org= "Org.apache.gora" name= "gora-hbase" rev= "0.5" conf= "*->default"/>
Note that the rev=0.5 corresponds to the hbase0.94,rev=0.3 hbase0.90.4


4. Add the following 3 properties in Nutch.xml

<property>   <name>http.agent.name</name>   <value>my Nutch spider</value> < /property><property>   <name>http.robots.agents</name>   <value>none</value > </property><property>   <name>plugin.folders</name>   <value>/users/ Liaoliuqing/0_search/1_nutch/1_official/apache-nutch-2.3/build/plugins</value> </property>
Where the value of Plugin.folders is $nutch_home/build/plugins


5. Execute Ant Eclipse


Second, import project

1. Import Project


2, in Build path, put apche-nutch-2.3/conf on top, that is, click the top button



Third, run the program

1. Run as----> Run configuration, select Project and Main class


2. Fill in the parameters

/users/liaoliuqing/downloads/seed.txt

-dhadoop.log.dir=logs-dhadoop.log.file=hadoop.log


3. Click Run and the output is as follows:

Injectorjob:starting at 2015-01-28 16:27:43
Injectorjob:injecting Urldir:/users/liaoliuqing/downloads/seed.txt
Injectorjob:using class Org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Injectorjob:total number of URLs rejected by filters:0
Injectorjob:total number of URLs injected after normalization and filtering:1
Injector:finished at 2015-01-28 16:27:47, elapsed:00:00:04


Note that before running the program, the machine needs to start hbase first.


4. View data in HBase

HBase (main):003:0> scan ' webpage ' ROW Column+cell                          com.163.www:http/                                                                    Column=f:fi, timestamp=1422433667377, value=\x00 ' \x8d\x00                                                          Com.163.www:http/column=f:ts, timestamp=1422433667377, value=\x00\x00\x01k/\xa7:\x14 Com.163.www:http/column=mk:_injmrk_, Timest                          amp=1422433667377, Value=y com.163.www:http/                                                                             Column=mk:dist, timestamp=1422433667377, value=0 Com.163.www:http/column=mtdt:_csh_, timestamp=1422433667377, value=?\x80\                 x00\x00                                             Com.163.www:http/column=s:s, timestamp=142243366737 7, value=?\x80\x00\x00 1 row (s) in 0.2970 seconds






Run Nutch2.3 in eclipse

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.