How to install nutch in Windows
Recently, the term "nutch" has been widely used in the Internet. However, due to its busy work, the term "nutch" has not been carefully studied. It is known that "nutch" is an open-source project organized by Apache, with this feature, you can create your own search engine on the Intranet or a search engine for the entire network. Fortunately, during the Spring Festival holiday, we finally had time to interpret and test it. Before using nutch, you must install it first. I searched for related content using the search engine and found that most of the articles on how to install nutch are based on Linux, while some articles on Windows-based installation are very simple. Because running the script commands that come with nutch requires a Linux environment, cygwin must be installed first to simulate this environment. The installation and use of cygwin itself is not a simple task. Next, let me explain how to install nutch in windows!
1. Install cygwin
First, we go to http://www-inst.eecs.berkeley.edu /~ Instcd/ISO/download to the cygwin software's ISO file, use the Daemon software to set it as a virtual optical drive, double-click the setup file, and the wizard interface for program installation appears (1 ).
After clicking "Next", the installation wizard requires you to select the installation method of cygwin, as shown in Figure 2:
There are three installation methods in the figure:
(1) install from Internet: download and install software from Internet;
(2) download without installing: Download the installed file from the Internet, but it is not installed yet;
(3) install from local directory: Install from a local directory containing installation files.
Select the third item "Install from local directory" and click "Next", as shown in 3:
The Installation Wizard requires you to select the installation path of cygwin. You can change the installation path in the "root directory" text box and click "Next", as shown in 4:
The Installation Wizard requires you to select the local storage path where the cygwin Installation File is located. You can set it in "local package directory" and click "Next", as shown in Figure 5:
The Installation wizard displays the list of content to be installed. You can decide which programs to install based on your actual needs. Click the text behind the "cycle arrow" icon to change the installation method. Common installation methods include default (only the default installation items are installed) and install (all programs are installed, large Space Requirements), reinstall (re-install the program ). We recommend that you use the "Install" method in one step to avoid further interference. However, you must ensure that at least 2 GB of space is available. Click "Next" to start the installation (6 ).
The window shown in 7 is displayed. Click "finish" and then cygwin is installed.
So far, I have to say a few more to cygwin. Cygwin is an environment for simulating Unix running on Windows. You can use it to familiarize yourself with and learn operations on UNIX systems. Readers who are not familiar with Unix systems can refer to the articles written by the author in the "getting started and basics of UNIX operating systems" and the "easy and practical Unix" series, the Unix commands involved in use will not be explained in detail below.
2. Install nutch
Go to renewal.
3. Test the nutch command
Before running the script command of nutch, you need to set some environment variables. Cygwin provides a file named cygwin. bat, which can be used to automatically set required environment variables. This file can be found in the root directory where cygwin is located. Interested readers can also open the file through ultraedit and other editors to check whether the file is located. In fact, after cygwin is installed, an icon is generated on the Windows system desktop, as shown in Figure 8:
This icon is the shortcut for the cygwin. BAT file under the cygwin root directory. Double-click this icon to open a DOS-like window. Since the author previously decompressed the compressed package of nutch to I: \ nutch-0.7.1, so in this command window enter the command "CD/cygdrive/I/nutch-0.7.1 ", you can modify your installation path accordingly, and then use the command "ls-L" to view all subdirectories and file information in the nutch-0.7.1. Run the "bin/nutch" command. If the reader can see the prompt shown in "9", congratulations! The installation of "nutch" in Windows has been completed!
As for the use of nutch, and will be decomposed later :)
References:
1. Getting nutch running with Windows
2. first taste of nutch
3. nutch in WINXP
4. cygwin User Guide