Nutch2.3
Nutch was born in August 2002 and is a Java-implemented open source search engine project by Apache, Since the Nutch1.2 version, Nutch has evolved from a search engine to a web crawler, then nutch further evolved into two major branches: 1.X and 2.X, the biggest difference between these two branches is 2.X of the underlying data storage is abstracted to support a variety of underlying storage technology.
Apache Nutch v2.3 has been released and it is recommended that all users and developers using the 2.X series be upgraded to this version.
This version provides an Apache wicket-based Web management interface that addresses 143 issues, provides MAVEN dependencies, upgrades to Gora v0.5, and supports the underlying storage as:
- Apache Hadoop 1.0.1 & 2.4.0
- Apache Cassandra 2.0.2
- Apache HBase 0.94.14
- Apache Accumulo 1.5.1
- MongoDB 2.12.2
- Apache SOLR 4.8.1
- Apache Avro 1.7.6
Also, note that Gora support for SQL is obsolete.
Mongodb
Mongo DB is a non-relational database (NOSQL) that is currently very popular in the IT industry, and its flexible data storage methods are highly favored by current it practitioners. Mongo DB is a good implementation of object-oriented thinking (Oo idea), in Mongo db each record is a document object. The biggest advantage of Mongo DB is that all data persistence requires no developers to write SQL statements manually, and it is easy to invoke methods to implement CRUD operations.
ElasticSearch
Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. Elasticsearch was developed in Java and published as an open source under the Apache license terms, and is the second most popular enterprise search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use.
Kibana
Kibana is a WEB interface that provides log parsing for Logstash and ElasticSearch. It can be used to efficiently search, visualize and analyze logs.
Native configuration
CentOS 6.5 64-bit
1. JDK, ant installation
1$ mkdir/Download2$ CD/Download3$ wget http://download.oracle.com/otn-pub/java/jdk/8u40-b26/jdk-8u40-linux-x64.tar.gz4$ wget http://Mirror.tcpdiag.net/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz5$ tar xzf jdk-8u40-linux-x64.tar.gz6$ tar xzf apache-ant-1.9.4-bin.tar.gz7$ MV apache-ant-1.9.4//opt/Ant8$ mv jdk-8u40-linux-x64//opt/jdk1.8. 0_409$ vim/etc/ ProfileTen One#jdk1.8.0 AExport JAVA_HOME=/USR/JAVA/JDK1.8. 0_40 -Export JRE_HOME=/USR/JAVA/JDK1.8.0_40/JRE -Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar: $JRE _home/Lib theExport path= $PATH: $JAVA _home/bin: $JRE _home/bin - - #ant Install path -Export ant_home=/usr/local/Ant +Export path= $ANT _home/bin: $PATH
To test whether the installation was successful:
[Email protected] ~]# antBuildfile:build.xml does not exist! ~]# java-"1.8.0_40"1.13. 6) (rhel-1.13. 6.1. el6_6-23.25-b01, Mixed mode)
2. MongoDB download, install, start
http:
1 //downloads.mongodb.org/linux/mongodb-linux-i686-2.6.7-rc0.tgz
2 $ tar xzf/download/mongodb-linux-x86_64-2.6. 7 . tgz 3 $ mv mongodb-linux-x86_64-2.6. 7//opt/mongodb/4 $ cd/opt/mongodb/5 $ mkdir log/conf/data/
Starting with version 2.6, MongoDB uses the yaml-based configuration file format. Refer to the following configuration can be found here.
$ vim Conf/se.yml
1 Net:2Port270173Bindip:127.0.0.14 Systemlog:5 Destination:file6Path"/opt/mongodb/log/mongodb.log"7Logappend:true8 processmanagement:9ForktrueTenPidfilepath:"/opt/mongodb/log/mongodb.pid" One Storage: ADbPath:"/opt/mongodb/data" -Directoryperdb:true -Smallfiles:true
Start MongoDB
$ cd/opt/mongodb$ bin/mongod-f conf/se.yml
Enter MongoDB to check if MongoDB started successfully
$ bin/MONGO>0. 031GB> exitbye
3, Elasticsearch download, install
$ wget/download https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz$ CD/download$ tar xzf/download/elasticsearch-1.4.4. tar.gz$ MV Elasticsearch-1.4.4/opt/Elasticsearch $ cd/opt/elasticsearch$ vim config/Elasticsearch.ymlcluster.name:histnode.name:"Hist-node1"Node.master:trueNode.data:truepath.conf:/opt/elasticsearch/Configpath.data:/opt/elasticsearch/datahttp.enabled:true
Background boot Elasticsearch
$ cd/opt/-D
Terminating the Elasticsearch process
-xpost http://localhost:9200/_cluster/nodes/_shutdown#关闭节点BlrmMvBdSKiCeYGsiHijdgcurl – Xpost http://Localhost:9200/_cluster/nodes/blrmmvbdskiceygsihijdg/_shutdown
Detect if Elasticsearch is running successfully
$ curl-xget'http://localhost:9200'{ "Status": $, "name":"Hist-node1", "cluster_name":"hist", "version" : { " Number":"1.4.4", "Build_hash":"c88f77ffc81301dfa9dfd81ca2232f09588bd512", "Build_timestamp":"2015-02-19t13:05:36z", "Build_snapshot":false, "lucene_version":"4.10.3" }, "tagline":"Know, for Search"}
4, Kibana download, install
$ wget/download https://download.elasticsearch.org/kibana/kibana/kibana-4.0.1-linux-x64.tar.gz $ cd//download kibana-4.0. 1-linux-x64.tar.gz $ mv Kibana-4.0. 1-linux-x64//opt/kibana//opt/kibana/$ bin/kibana
Below you will be able to access the http://127.0.0.1:5601 port via the
5, Nutch2.3 download, install
Nutch2.3+mongodb+elasticsearch