Myeclipse+nutch2.2.1+mysql Construction under WINDOWS7 environment

Source: Internet
Author: User
Tags file url

First, the Environment preparation

The first must be the configuration development environment, which is not described in detail here for the time being.

The required environment is jdk1.7,myeclipse,svn,ant, as well as the two plugins under MyEclipse subclipse and ivyde,http://subclipse.tigris.org/update_1.8.x and HTTP ://www.apache.org/dist/ant/ivyde/updatesite.

Second, check out the project from SVN

And then next to the next step.

Finish importing.

Third, modify the Ivy directory Ivysetting.xml address http://mirrors.ibiblio.org/maven2/(this address access is normal, the rest of the address I try to access not)

Iv. Modify the Ivy.xml in the Ivy directory (add MySQL access dependent Java package)

Modify the Gora-core version to 0.2.1 and dismiss the annotations Gora-sql and Mysql-connector-java

V. Cd to directory execution ant Eclipse (directly under Eclipse ant build looks like a problem)

Six, back to Eclipse Project, refresh the project, you will find that the directory structure has changed

Seven, see there is another error, is the problem of coding, engineering right click Properties-Resource->utf-8

Viii. Project Right-click Build path->config Build Path->order and Export Select Conf folder pinned

Nine, modify the Conf folder gora.properties configuration MySQL

#Default MySQL Properties        ############################### #gora. datastore.default= Org.apache.gora.sql.store.sqlstoregora.datastore.autocreateschema=truegora.sqlstore.jdbc.driver= Com.mysql.jdbc.drivergora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createdatabaseifnotexist=true &useunicode=true&characterencoding=utf8&autoreconnect=true&zerodatetimebehavior= converttonullgora.sqlstore.jdbc.user=rootgora.sqlstore.jdbc.password=123456

Ten, the project directory under the new folder Urls,urls directory to create a new file URL, enter the Root_url to crawl, such as http://www.qq.com

Xi. Configuring the Conf directory Nutch-site.xml

<property> <name>http.agent.name</name> <value>YourNutchSpider</value> </ Property> <property> <name>http.accept.language</name> <value>ja-jp, En-us,en-gb,en,  Zh-cn,zh-tw;q=0.7,*;q=0.3</value> <description>value of the "Accept-language" Request header field.  This allows selecting Non-english language as the default one to retrieve. It's a useful setting for search engines build for certain national group.</description> </property> &LT;PR Operty> <name>parser.character.encoding.default</name> <value>utf-8</value> < description>the character encoding to fall back to when no other information is available</description> </pro perty> <property> <name>plugin.folders</name> <value>src/plugin</value> < Description>directories where Nutch plugins is located. Each element is a relative or absolute path. If Absolute, IT is used as is. If relative, it is searched for on the classpath.</description> </property>
</pre><pre name= "code" class= "HTML" ><!--to solve null pointer problems in the Utf-8 class--><property><span></ span><name>generate.batch.id</name><span></span><value>*</value></ property>   <property>  <name>storage.data.store.class</name>  <value> Org.apache.gora.sql.store.sqlstore</value>  <description>the Gora DataStore class for storing and Retrieving data.  Currently the following stores is available: ....</description> </property>



12. After configuring the above steps, configure the command to run the running configuration select Org.apache.nutch.crawl.Crawler, parameter settings urls-depth 3-TOPN 5 and-dhadoop.log.dir=logs- Dhadoop.log.file=hadoop.log

At this point the run encounters an error exception in thread "main" java.io.IOException:Failed to set permissions of path: \tmp\hadoop-administrator\map Red\staging\administrator606301699\.staging to 0700

13, the above error is generally only under Windows will encounter, So our general practice is to find hadoop-core-1.2.0 source in the Org.apache.hadoop.fs under the Fileutil.java modify the Checkreturnvalue method, comment out the contents of it

private static void Checkreturnvalue (Boolean rv, File p,                                        fspermission permission                                       ) throws IOException {//    if (!RV) {//      throw new IOException ("Failed to set permissions of path:" + p +//                            "to" +//                            String.Format ("%04o", per Mission.toshort ()));/    }  }


Then compile the Java package to replace the Hadoop-core-1.2.0.jar under our engineering build/lib.

Another way is to find the Fileutil.java compiled class file, replace the corresponding class file in the jar package Fileutil.clas and Fileutil$cygpathcommand.clas (with the compression software to open the line)

Attach the modified compiled file Http://files.cnblogs.com/e-life/hadoop-core-1.2.0.rar

14, the next run will be no problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.