NUTCH2.3 hadoop2.7.1 hbase1.0.1.1 solr5.2.1 deployment (3), hadoop2.7 Installation

Source: Internet
Author: User
Tags apache solr solr

NUTCH2.3 hadoop2.7.1 hbase1.0.1.1 solr5.2.1 deployment (3), hadoop2.7 Installation
Zookeeper

Precondition:

Hadoop 2.7.1
Hbase 0.98.13
Solr 5.2.1/Apache Solr 4.8.1
Http://archive.apache.org/dist/lucene/solr/4.8.1/
Gora 0.6.1


Gora compilation and Nutch compilation and deployment

1. Download Gora

The latest version of gora is 0.6.1. Download or use git to obtain git clonehttps: // github.com/apache/gora.git.

2. Modify gora pom. xml

The following may be the key to the final running of Nutch2.3, without 1.0.1.1-hadoop2 :)

3. Compile gora

Mvn clean install-DskipTests
Mvn install-DskipTests

4. Modify $ NUTCH_HOME/conf/nutch-site.xml

<configuration><property><name>storage.data.store.class</name><value>org.apache.gora.hbase.store.HBaseStore</value><description>Default class for storing data</description></property><property><name>http.agent.name</name><value>My Nutch Spider</value></property><property><name>plugin.includes</name><value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value></property></configuration>

5. Modify $ NUTCH_HOME/ivy. xml

Modify the rev involved in "org. apache. gora" to 0.6, for example:

<dependency org="org.apache.gora" name="gora-hbase" rev="0.5" conf="*->default" /> =><dependency org="org.apache.gora" name="gora-hbase" rev="0.6" conf="*->default" />

Delete "org. apache. hadoop" and add:

<dependency org="org.apache.hadoop" name="hadoop-client" rev="2.7.1" conf="*->default"/> 

6. Modify $ NUTCH_HOME/ivy/ivysettings. xml

<ivysettings>     <settings defaultResolver="default"/>     <property name="m2-pattern" value="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]" override="false" />     <resolvers>         <chain name="default">             <filesystem name="local-maven2" m2compatible="true" >                 <artifact pattern="${m2-pattern}"/>                 <ivy pattern="${m2-pattern}"/>             </filesystem>             <ibiblio name="central" m2compatible="true"/>         </chain>     </resolvers> </ivysettings> 

7. Add $ NUTCH_HOME/conf/gora. properties

Gora. datastore. default = org. apache. gora. hbase. store. HBaseStore
8. Modify $ NUTCH_HOME/conf/regex-urlfilter.txt $ NUTCH_HOME/conf/nutch-default.xml as needed

No need to change

9. Compilation takes a long time

Ant runtime

10. Copy hadoop *. jar under gora to runtime/local/lib/

Cp/disk/gora-core/lib/hadoop */disk2/nut/nutch-2.3/runtime/local/lib/

11. Create a search url

Mkdir urls
Echo http://nutch.apache.org/> urls/seek.txt

12. Test Run

Cd runtime/local/

Bin/nutch inject urls/seek.txt


Solr5.2.1 deployment and operation

1. Download and decompress

2. example/example-DIH contains the complete solr home configuration, which is copied to server/solr.

Cp-rf/disk2/solr/solr-5.2.1/example-DIH/solr/*/disk2/solr/solr-5.2.1/server/solr/

3. Solve the Error 404: Prob accessing/solr/update. Reason: Not Found

Cd/disk2/solr/solr-5.2.1/server/solr

Cp/disk2/solr/solr-5.2.1/example/exampledocs/monitor. xml.

CurlHttp: // 127.0.0.1: 8983/solr/update -- data-binary @ monitor. xml-H 'content-type: application/xml'

3. Run for the nutch crawl and also modify/disk2/solr/solr-5.2.1/server/solr/conf/schema. xml:

<field name="host" type="string" stored="false" indexed="true"/><field name="site" type="string" stored="false" indexed="true"/><field name="cache" type="string" stored="true" indexed="false"/><field name="digest" type="string" stored="true" indexed="false"/><field name="segment" type="string" stored="true" indexed="false"/><field name="boost" type="float" stored="true" indexed="false"/><field name="tstamp" type="date" stored="true" indexed="false"/><field name="stamp" type="date" stored="true" indexed="false"/>  <field name="anchor" type="string" stored="true" indexed="true" multiValued="true"/>  
4. bin/solr start

5. http: // 192.168.1.106: 8983/solr

6. bin/crawl urls/seek.txt TestCrawl http: // 192.168.1.106: 8983/solr 2


FAQ

The following is what makes people angry during the process...

1. Error: unable to find or load main class org. apache. nutch. crawl. InjectorJob:
No ant runtime

2. Exception in thread "main" java. lang. NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

You must use hbase-comm *. jar/hbase-client *. jar/hbase-protocol *. jar of hbase 0.98.13. Do not use hbase1.0.1.1.
Cd/disk2/hbase/hbase-0.98.13-hadoop2/lib
Cp hbase-common */disk2/nutch/nutch-2.3/runtime/local/lib/

Cp hbase-client-0.98.13-hadoop2.jar/disk2/nutch/nutch-2.3/runtime/local/lib/
Cp hbase-protocol */disk2/nutch/nutch-2.3/runtime/local/lib/

3. Exception in thread "main" java. lang. NoSuchFieldError: HBASE_CLIENT_PREFETCH_LIMIT
The reason is the same as above. hbase and nutch do not match

4. 13:53:53, 238 WARN util. NativeCodeLoader-Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Let him do native.
Mkdir-p/disk2/nut/nutch-2.3/runtime/local/lib/native/Linux-amd64-64/
Cd/disk2/hadoop/hadoop-2.7.1/lib/native/
Cp */disk2/nutrition/nutch-2.3/runtime/local/lib/native/Linux-amd64-64/
Cp */disk2/nutrition/nutch-2.3/runtime/local/lib/native/


There are already too many other

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.