How to configure Heritrix under Eclipse
On other posts you see articles with Eclipse configuration Heritrix 1.14.4, and here are a lot of things to refer to from there. such as http://extjs2.javaeye.com/blog/699751
However, there is some further explanation for the configuration.
The configuration process for Eclipse configuration Heritrix 1.14.4 is as follows:
1. First download from http://sourceforge.net/projects/archive-crawler/
Heritrix-1.14.4.zip and Heritrix-1.14.4-src.zip (Windows)
2. Create a project for Java project in eclipse (can be named Heritrix)
3. Copy the COM, org, and St Three folders from the Src/java in the Heritrix-1.14.4-src.zip decompression to the project SRC.
4. Copy the Conf folder from Heritrix-1.14.4-src.zip decompression in src to the project root directory.
5. Copy the Heritrix-1.14.4-src.zip unzip in the Lib folder to the project root directory.
6. Copy the Tlds-alpha-by-domain.txt file from the Src/resources/org/archive/util in the Heritrix-1.14.4-src.zip decompression to the Org.archive.util package in the project.
7. Copy the WebApps folder from the Heritrix-1.14.4.zip decompression to the project root directory.
If the folder name is not WebApps, you need to make the appropriate changes in Heritrix.java.
Java code:/** * @throws IOException * @return Returns the directory under which reside the WAR files * we ' re to load I Nto the servlet container. * /public static File Getwarsdir () throws IOException { return getsubdir (" WebApps "); }
/** * @throwsIOException * @return Returns the directory under which reside the WAR files *we ' re to load into the servlet Container. */public static File Getwarsdir () throws IOException {return Getsubdir ("WebApps");}
8. Modify the configuration file to find the Heritrix.properties file under Conf
Java code://Set version heritrix.version= 1.14.4
Set User Password Heritrix.cmdline.admin = admin:admin
Set Port Heritrix.cmdline.port = 8080
9. Introduce the jar package to the project and bring all the jar packages below the Lib into the project.
10.Eclipse Import Heritrix, error cannot find the class Sun.net.www.protocol.file.fileurlconnection,sun package is a protected package that is only available by default for sun company software. Eclipse will make an error and use warning for protection. Compiler, errors/warnings-> Deprecated andtrstricted API, Windows->preferences, Java, Forbidden Reference (access rules): Change towarning
11. Add the Configuration folder. If you run Heritrix, there are no options available in the configuration page, this step resolves the issue. In the project found Org.archive.crawler.Heritrix.java right-click Run mode configuration, select Classpath, select User Entries-Advanced, select Add Folders, add the Conf folder.
Click Run to start running
Java code:
16:17:09.500event starting jetty/4.2.23
16:17:09.843 EVENT Started Webapplicationcontext[/,heritrix Console]
16:17:09.968 EVENT Started SocketListener on 127.0.0.1:8080
16:17:09.968 EVENT Started
Heritrix version:1.14.4
Http://www.cnblogs.com/sl-shilong/articles/2829411.html
Meet the problem and fix:
Heritrix.java code file in the statement: "Import sun.net.www.protocol.file.FileURLConnection;"
The error is as follows:
"The type fileurlconnection is not accessible due to restriction onrequired library C:\Programe Files\java\jre6\lib\rt.jar ”
How can I resolve this?
Add the Heritirx version is 1.14.4
Programming Xiao Qiang answered on 2012-03-07 11:31
This is the JRE access limit resulting in an error, right-click on the Myheritrix project to select the BuildPath? Configure Build Path ..., then select the Library tab, remove the JRE System Library and then re-import it to fix it. (OK)
or select Windows? Preferences? Java? Compiler? Errors/warnings "Find" Forbidden reference (access rules) under Deprecated and restricted API, change the default setting "Error" to "Warning" or " Ignore ".
"Go" How to configure Heritrix under Eclipse