Nutch is an open source search engine fully written in Java. It uses Lucene as a full-text search tool and hadoop as a distributed system platform. In fact, these three projects were all created by Doug cutting, and hadoop was originally only part of the nutch.
The previous version of nutch was the 0.9 version released two years ago. Since then, someone has been asking when to release the official 1.0 version. It wasn't until more than a month ago that someone revealed that the version of the program will be released in 1.0. But it still jumps for a while. Now, the developer of nutch is voting on their email list to determine whether to officially release the current RC1 version. According to the current voting situation, if there are no more accidents (the previous RC0 was an accident), we will soon be able to see the officially released Version 1.0.
If you cannot wait, you can download this RC1 version. It should be the same as the official version.
By the way, Apache has a hadoop-based machine learning platform mahout, but it is still quite imperfect. In addition, a young man from Jinan University created redpoll, which is also based on hadoop's open source code parallel machine learning.AlgorithmPlatform.
=== Updated on January 1, March 19 ===
One of the new features of nutch 1.0 is that it can be conveniently used with SOLR. The developer of nutch wrote a tutorial on how to configure it.