nutch在eclipse上執行階段錯誤

來源:互聯網
上載者:User
solrUrl is not set, indexing will be skipped...
crawl started in: crwal
rootUrlDir = urls
threads = 10
depth = 2
solrUrl=null
topN = 2
Injector: starting at 2012-04-20 14:39:30
Injector: crawlDb: crwal/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

java.lang.RuntimeException: Error in configuring object

    atorg.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

    atorg.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

    atorg.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

    atorg.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)

    atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

    atorg.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: java.lang.reflect.InvocationTargetException

    atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    atsun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

    atsun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

    atjava.lang.reflect.Method.invoke(Unknown Source)

    atorg.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)

    ...5 more

Caused by: java.lang.RuntimeException: Error in configuring object

    atorg.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

    atorg.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

    atorg.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

    atorg.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)

    ...10 more

Caused by: java.lang.reflect.InvocationTargetException

    atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    atsun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

    atsun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

    atjava.lang.reflect.Method.invoke(Unknown Source)

    atorg.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)

    ...13 more

Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined

    atorg.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)

    atorg.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)

    atorg.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)

    atorg.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)

    atorg.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)

    ...18 more

12/04/20 10:14:44 INFOmapred.JobClient:  map 0% reduce 0%

12/04/20 10:14:44 INFOmapred.JobClient: Job complete: job_local_0001

12/04/20 10:14:44 INFOmapred.JobClient: Counters: 0

Exception in thread"main" java.io.IOException: Job failed!

    atorg.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)

    atorg.apache.nutch.crawl.Injector.inject(Injector.java:217)

    atorg.apache.nutch.crawl.Crawl.run(Crawl.java:127)

    atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

    atorg.apache.nutch.crawl.Crawl.main(Crawl.java:55)

首先不要怪我貼了這麼多的錯誤資訊,只是為了讓大家更容易找到這裡而已。

解決這個問題就是將nutch-default.xml中的

<property>
  <name>plugin.folders</name>
  <value>./src/plugin</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

紅色處改一下就可以了。

祝大家好運哦。


補充一下將nutch運行在eclipse上的步驟,搞了一天才搞通,不過要謝謝北北同學。哈哈

http://wiki.apache.org/nutch/RunNutchInEclipse  英語權威 處

做好準備工作

1、安裝subeclpse外掛程式,安裝ivyDE外掛程式,安裝maven外掛程式

2、check出代碼 https://svn.apache.org/repos/asf/nutch/trunk

3、刪除src,然後將src/bin,src/java,src/test,src/testsource,src/plugin/xx/src/java,src/plugin/xx/src/test作為folder

4、加上兩jar包,看英文能看懂的

5、在libraries分頁上,右邊點擊Add Class Floder 選中nutch的conf.

6、還是在libraries分頁上,右擊Add Library > IvyDE Managed Dependencies > 選ivy/ivy.xml

7、build.xml----ant一下

8、重新整理一下nutch工程,在conf下增加了nutch-site.xml,regex-urlfilter.xml,配置內容

9、在nutch-default.xml中修改

<property>
  <name>plugin.folders</name>
  <value>./src/plugin</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

很關鍵

10、在根目錄下建一個檔案夾urls,檔案夾下seed.txt,seed.txt中寫要抓取頁面的網址

11、build.xml 再次編譯(ant)

12、執行

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.