NUTCH2.3 hadoop2.7.1 hbase1.0.1.1 solr5.2.1 deployment (3), hadoop2.7 Installation
Zookeeper
1. Download Gora
The latest version of gora is 0.6.1. Download or use git to obtain git clonehttps: // github.com/apache/gora.git.
2. Modify gora pom. xml
The following may be the key to the final running of Nutch2.3, without 1.0.1.1-hadoop2 :)
3. Compile gora
Mvn clean install-DskipTests
Mvn install-DskipTests
4. Modify $ NUTCH_HOME/conf/nutch-site.xml
<configuration><property><name>storage.data.store.class</name><value>org.apache.gora.hbase.store.HBaseStore</value><description>Default class for storing data</description></property><property><name>http.agent.name</name><value>My Nutch Spider</value></property><property><name>plugin.includes</name><value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value></property></configuration>
5. Modify $ NUTCH_HOME/ivy. xml
Modify the rev involved in "org. apache. gora" to 0.6, for example:
<dependency org="org.apache.gora" name="gora-hbase" rev="0.5" conf="*->default" /> =><dependency org="org.apache.gora" name="gora-hbase" rev="0.6" conf="*->default" />
Delete "org. apache. hadoop" and add:
<dependency org="org.apache.hadoop" name="hadoop-client" rev="2.7.1" conf="*->default"/>
6. Modify $ NUTCH_HOME/ivy/ivysettings. xml
<ivysettings> <settings defaultResolver="default"/> <property name="m2-pattern" value="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]" override="false" /> <resolvers> <chain name="default"> <filesystem name="local-maven2" m2compatible="true" > <artifact pattern="${m2-pattern}"/> <ivy pattern="${m2-pattern}"/> </filesystem> <ibiblio name="central" m2compatible="true"/> </chain> </resolvers> </ivysettings>
7. Add $ NUTCH_HOME/conf/gora. properties
Gora. datastore. default = org. apache. gora. hbase. store. HBaseStore
8. Modify $ NUTCH_HOME/conf/regex-urlfilter.txt $ NUTCH_HOME/conf/nutch-default.xml as needed
No need to change
9. Compilation takes a long time
Ant runtime
10. Copy hadoop *. jar under gora to runtime/local/lib/
Cp/disk/gora-core/lib/hadoop */disk2/nut/nutch-2.3/runtime/local/lib/
11. Create a search url
Mkdir urls
Echo http://nutch.apache.org/> urls/seek.txt
12. Test Run
Cd runtime/local/
Bin/nutch inject urls/seek.txt
Solr5.2.1 deployment and operation
1. Download and decompress
2. example/example-DIH contains the complete solr home configuration, which is copied to server/solr.
Cp-rf/disk2/solr/solr-5.2.1/example-DIH/solr/*/disk2/solr/solr-5.2.1/server/solr/
3. Solve the Error 404: Prob accessing/solr/update. Reason: Not Found
Cd/disk2/solr/solr-5.2.1/server/solr
Cp/disk2/solr/solr-5.2.1/example/exampledocs/monitor. xml.
CurlHttp: // 127.0.0.1: 8983/solr/update -- data-binary @ monitor. xml-H 'content-type: application/xml'
3. Run for the nutch crawl and also modify/disk2/solr/solr-5.2.1/server/solr/conf/schema. xml:
<field name="host" type="string" stored="false" indexed="true"/><field name="site" type="string" stored="false" indexed="true"/><field name="cache" type="string" stored="true" indexed="false"/><field name="digest" type="string" stored="true" indexed="false"/><field name="segment" type="string" stored="true" indexed="false"/><field name="boost" type="float" stored="true" indexed="false"/><field name="tstamp" type="date" stored="true" indexed="false"/><field name="stamp" type="date" stored="true" indexed="false"/> <field name="anchor" type="string" stored="true" indexed="true" multiValued="true"/>
4. bin/solr start
5. http: // 192.168.1.106: 8983/solr
6. bin/crawl urls/seek.txt TestCrawl http: // 192.168.1.106: 8983/solr 2
FAQ
The following is what makes people angry during the process...
1. Error: unable to find or load main class org. apache. nutch. crawl. InjectorJob:
No ant runtime
2. Exception in thread "main" java. lang. NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
You must use hbase-comm *. jar/hbase-client *. jar/hbase-protocol *. jar of hbase 0.98.13. Do not use hbase1.0.1.1.
Cd/disk2/hbase/hbase-0.98.13-hadoop2/lib
Cp hbase-common */disk2/nutch/nutch-2.3/runtime/local/lib/
Cp hbase-client-0.98.13-hadoop2.jar/disk2/nutch/nutch-2.3/runtime/local/lib/
Cp hbase-protocol */disk2/nutch/nutch-2.3/runtime/local/lib/
3. Exception in thread "main" java. lang. NoSuchFieldError: HBASE_CLIENT_PREFETCH_LIMIT
The reason is the same as above. hbase and nutch do not match
4. 13:53:53, 238 WARN util. NativeCodeLoader-Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Let him do native.
Mkdir-p/disk2/nut/nutch-2.3/runtime/local/lib/native/Linux-amd64-64/
Cd/disk2/hadoop/hadoop-2.7.1/lib/native/
Cp */disk2/nutrition/nutch-2.3/runtime/local/lib/native/Linux-amd64-64/
Cp */disk2/nutrition/nutch-2.3/runtime/local/lib/native/
There are already too many other
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.