How to import myeclipse and eclipse

Source: Internet
Author: User

How to import myeclipse and eclipse

 

Today, I spent a whole day on this. At the beginning, I used nutch1.1 to import data to myeclipse and eclipse. I kept trying, debugging, and reading too much online.ArticleThey found that errors could occur in each of their methods. After finally communicating with others, they tried it in their own way. okay, it's not a waste of time. next, let's talk about how to import nutch1.0 without errors.

 

Preparations and import steps:

1. Download cygwin tool: http://www.cygwin.com and install cygwin. After installation is successful, remember to configure environment variables in my computer properties:

Edit the path attribute and addD: cygwin/binTo path.

2. Download The nutch software package:

Http://labs.renren.com/apache-#//nutch/general binpackage can be downloaded.

3. decompress the downloaded nutch package. For example, the decompression location is:D: \ nutch-1.04. Create a new one in eclipse or myeclipseJava ProjectProject, name defined by yourself (nutch). Select"Create project from existing source", Pointing to yourselfNutch-1.0 directory.

 

4. ClickNext step, Switch"Libraries"Select"Add class folder..."Button, select from the list"Conf". -->Default output floder-->Brower-À CongCreate new folder... Create a new folder output. (This step is different from all the methods on the Internet .).

 

Note:: ModifyOutputFolderNutch-site.xmlFile:

 

 

 

XML Code
  1. <Property>
  2. <Name> HTTP. Agent. Name </Name>
  3. <Value> HD nutch agent </value>
  4. <Description> </description>
  5. </Property>
  6. <Property>
  7. <Name> HTTP. Agent. Description </Name>
  8. <Value> hpjianhua </value>
  9. <Description> </description>
  10. </Property>
  11. <Property>
  12. <Name> HTTP. Agent. url </Name>
  13. <Value> http://www.163.com </value>
  14. <Description> </description>
  15. </Property>
  16. <Property>
  17. <Name> HTTP. Agent. Email </Name>
  18. <Value> hpjianhua@163.com </value>
  19. <Description> </description>
  20. </Property>

 

 

 

5. ClickFinishTo complete the import of nutch1.0.

 

InMyeclipseOrEclipseModerateNutch1.0Modify it to remove the errors prompted by the project.:

 

1.ModifyConfFiles under the folder:

 

1.1 modifyNutch-site.xmlFile:

 

 

XML Code
  1. <Configuration>
  2. <Property>
  3. <Name> HTTP. Agent. Name </Name>
  4. <Value> nutch </value>
  5. <Description> </description>
  6. </Property>
  7. <Property>
  8. <Name> HTTP. Agent. Description </Name>
  9. <Value> hpjianhua </value>
  10. <Description> </description>
  11. </Property>
  12. <Property>
  13. <Name> HTTP. Agent. url </Name>
  14. <Value> http://www.163.com </value>
  15. <Description> </description>
  16. </Property>
  17. <Property>
  18. <Name> HTTP. Agent. Email </Name>
  19. <Value> hpjianhua@163.com </value>
  20. <Description> </description>
  21. </Property>
  22. </Configuration>

 

1.2 modifyNutch-defaul.xmlFile:

 

 

 

XML Code
  1. <Property>
  2. <Name> HTTP. Agent. Name </Name>
  3. <Value> HD nutch agent </value>
  4. <Description> HTTP 'user-agent' request header. Must not be empty-
  5. Please set this to a single word uniquely related to your organization.
  6. Note: You shoshould also check other related properties:
  7. HTTP. Robots. Agents
  8. HTTP. Agent. Description
  9. HTTP. Agent. url
  10. HTTP. Agent. Email
  11. HTTP. Agent. Version
  12. And set their values appropriately.
  13. </Description>
  14. </Property>

 

 

1.3 modifyCrawl-urlfilter.txtFile:

 

 

 

XML Code
    1. # Accept hosts in my. domain. Name
    2. + ^ Http: // ([a-z0-9] * \.) * 163.com/

 

 

 

If yesNutch1.1You can skip the following2, 3, 4Step,Directly go to5Step 2!

 

2.Download the jar files of MP3 and RTF

Http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib,

Http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/

Copy to src/plugin/parse-mp3/lib and src/plugin/parse-RTF/lib/respectively

 

 

3.Refresh a few times, right-click the project folder, select build path-> Configure build path... In the displayed window, switch to libraries and select Add jars ..., Add the downloaded JAR file to the project.

 

4In this step, there will be two errors in the general project. In the official 1.0 Release Version of nutch, these two problems are not fixed because the licensing issues.

 

The next step is the most important part..

 

 

 

Java code
  1. Modify rtfparsefactory. Java under -- SRC \ plugin \ parse-RTF \ SRC \ Java \ org \ apache \ nutch \ parse \ RTF
  2. Add -- import org. Apache. nutch. parse. parseresult;
  3. -- Public parse getparse (content ){
  4. Change to -- Public parseresult getparse (content ){
  5. -- Return New parsestatus (parsestatus. failed,
  6. Parsestatus. failed_exception,
  7. E. tostring (). getemptyparse (CONF );
  8. Change to -- return New parsestatus (parsestatus. failed,
  9. Parsestatus. failed_exception,
  10. E. tostring (). getemptyparseresult (content. geturl (), getconf ());
  11. -- Return New parseimpl (text,
  12. New parsedata (parsestatus. STATUS_SUCCESS,
  13. Title,
  14. Outlinkextractor. getoutlinks (text, this. conf ),
  15. Content. getmetadata (),
  16. Metadata ));
  17. Change to -- Return parseresult. createparseresult (content. geturl (),
  18. New parseimpl (text,
  19. New parsedata (parsestatus. STATUS_SUCCESS,
  20. Title,
  21. Outlinkextractor. getoutlinks (text, this. conf ),
  22. Content. getmetadata (),
  23. Metadata )));
  24. Modify testrtfparser. Java under -- SRC \ plugin \ parse-RTF \ SRC \ test \ org \ apache \ nutch \ parse \ RTF
  25. -- Parse = new parseutil (CONF). parsebyextensionid ("parse-rtf", content );
  26. Change to -- parse = new parseutil (CONF). parsebyextensionid ("parse-rtf", content). Get (urlstring). At this step, the eclipse project will not be wrong.

 

5. Create a new folder named URLs in the directory of nutch1.0, and then create a text file URL in URLs. Write the link. Note that "/" will be available later.

 

6. Run nutch1.0:

 

Choose run-> Run as-> JAVA application on the pop-up select Java application select Crawl-org.apache.nutch.crawl. Next,

 

Choose run> run commands... In the left-side Java application, there will be the crawl item. Select it,

 

Switch to arguments. The content of program arguments is the parameter to be set. Fill in URLs-Dir crawl-depth 3-topn 50 (the URL is the link based on your actual situation)

 

Enter-dhadoop. log. dir = logs-dhadoop. log. File = hadoop. log.

 

Run it directly: Check whether there is any information on your console.

Note:: For Java heap size problems, check logs/hadoop. log or console output. If a statement similar to Java. Lang. outofmemoryerror: Java heap space appears,

 

Solution:

 

SetEclipse->Window->Preferences->Java->Installed jres->Edit->Default VM arguments

 

7. Set to-xms256m-xmx1024m. XMS indicates the minimum memory, and xmx indicates the maximum memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.