Nutch2.3+mongodb+elasticsearch

Source: Internet
Author: User
Tags apache solr mongodb download kibana

Nutch2.3

Nutch was born in August 2002 and is a Java-implemented open source search engine project by Apache, Since the Nutch1.2 version, Nutch has evolved from a search engine to a web crawler, then nutch further evolved into two major branches: 1.X and 2.X, the biggest difference between these two branches is 2.X of the underlying data storage is abstracted to support a variety of underlying storage technology.

Apache Nutch v2.3 has been released and it is recommended that all users and developers using the 2.X series be upgraded to this version.
This version provides an Apache wicket-based Web management interface that addresses 143 issues, provides MAVEN dependencies, upgrades to Gora v0.5, and supports the underlying storage as:

    • Apache Hadoop 1.0.1 & 2.4.0
    • Apache Cassandra 2.0.2
    • Apache HBase 0.94.14
    • Apache Accumulo 1.5.1
    • MongoDB 2.12.2
    • Apache SOLR 4.8.1
    • Apache Avro 1.7.6

Also, note that Gora support for SQL is obsolete.

Mongodb

Mongo DB is a non-relational database (NOSQL) that is currently very popular in the IT industry, and its flexible data storage methods are highly favored by current it practitioners. Mongo DB is a good implementation of object-oriented thinking (Oo idea), in Mongo db each record is a document object. The biggest advantage of Mongo DB is that all data persistence requires no developers to write SQL statements manually, and it is easy to invoke methods to implement CRUD operations.

ElasticSearch

Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. Elasticsearch was developed in Java and published as an open source under the Apache license terms, and is the second most popular enterprise search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use.

Kibana

Kibana is a WEB interface that provides log parsing for Logstash and ElasticSearch. It can be used to efficiently search, visualize and analyze logs.

Native configuration

CentOS 6.5 64-bit

1. JDK, ant installation

1$ mkdir/Download2$ CD/Download3$ wget http://download.oracle.com/otn-pub/java/jdk/8u40-b26/jdk-8u40-linux-x64.tar.gz4$ wget http://Mirror.tcpdiag.net/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz5$ tar xzf jdk-8u40-linux-x64.tar.gz6$ tar xzf apache-ant-1.9.4-bin.tar.gz7$ MV apache-ant-1.9.4//opt/Ant8$ mv jdk-8u40-linux-x64//opt/jdk1.8. 0_409$ vim/etc/ ProfileTen  One#jdk1.8.0 AExport JAVA_HOME=/USR/JAVA/JDK1.8. 0_40 -Export JRE_HOME=/USR/JAVA/JDK1.8.0_40/JRE -Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar: $JRE _home/Lib theExport path= $PATH: $JAVA _home/bin: $JRE _home/bin -  - #ant Install path -Export ant_home=/usr/local/Ant +Export path= $ANT _home/bin: $PATH

To test whether the installation was successful:

[Email protected] ~]# antBuildfile:build.xml does not exist!  ~]# java-"1.8.0_40"1.13. 6) (rhel-1.13.  6.1. el6_6-23.25-b01, Mixed mode)

2. MongoDB download, install, start

 http: 1 //downloads.mongodb.org/linux/mongodb-linux-i686-2.6.7-rc0.tgz 2 $ tar xzf/download/mongodb-linux-x86_64-2.6. 7 . tgz 3 $ mv mongodb-linux-x86_64-2.6. 7//opt/mongodb/4 $ cd/opt/mongodb/5 $ mkdir log/conf/data/

Starting with version 2.6, MongoDB uses the yaml-based configuration file format. Refer to the following configuration can be found here.

$ vim Conf/se.yml

1 Net:2Port270173Bindip:127.0.0.14 Systemlog:5 Destination:file6Path"/opt/mongodb/log/mongodb.log"7Logappend:true8 processmanagement:9ForktrueTenPidfilepath:"/opt/mongodb/log/mongodb.pid" One Storage: ADbPath:"/opt/mongodb/data" -Directoryperdb:true -Smallfiles:true

Start MongoDB

$ cd/opt/mongodb$ bin/mongod-f conf/se.yml

Enter MongoDB to check if MongoDB started successfully

$ bin/MONGO>0. 031GB> exitbye

3, Elasticsearch download, install

$ wget/download https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz$ CD/download$ tar xzf/download/elasticsearch-1.4.4. tar.gz$ MV Elasticsearch-1.4.4/opt/Elasticsearch $ cd/opt/elasticsearch$ vim config/Elasticsearch.ymlcluster.name:histnode.name:"Hist-node1"Node.master:trueNode.data:truepath.conf:/opt/elasticsearch/Configpath.data:/opt/elasticsearch/datahttp.enabled:true

Background boot Elasticsearch

$ cd/opt/-D

Terminating the Elasticsearch process

-xpost http://localhost:9200/_cluster/nodes/_shutdown#关闭节点BlrmMvBdSKiCeYGsiHijdgcurl – Xpost http://Localhost:9200/_cluster/nodes/blrmmvbdskiceygsihijdg/_shutdown

Detect if Elasticsearch is running successfully

$ curl-xget'http://localhost:9200'{  "Status": $,  "name":"Hist-node1",  "cluster_name":"hist",  "version" : {    " Number":"1.4.4",    "Build_hash":"c88f77ffc81301dfa9dfd81ca2232f09588bd512",    "Build_timestamp":"2015-02-19t13:05:36z",    "Build_snapshot":false,    "lucene_version":"4.10.3"  },  "tagline":"Know, for Search"}

4, Kibana download, install

$ wget/download https://download.elasticsearch.org/kibana/kibana/kibana-4.0.1-linux-x64.tar.gz $ cd//download kibana-4.0. 1-linux-x64.tar.gz $ mv Kibana-4.0. 1-linux-x64//opt/kibana//opt/kibana/$ bin/kibana

Below you will be able to access the http://127.0.0.1:5601 port via the

5, Nutch2.3 download, install

Nutch2.3+mongodb+elasticsearch

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.