Hadoop technology insider HDFS-Note 1

Source: Internet
Author: User

Book learning-dong sicheng's hadoop technology insider in-depth analysis of hadoop common and HDFS Architecture Design and Implementation Principles

High Fault Tolerance and scalability of HDFS

Lucene is an engine development kit that provides a pure Java high-performance full-text search that can be easily embedded into various applications for full-text search/indexing.

Nutch is a search engine application implemented based on Lucene. Lucene provides text search and index APIs for nutch, but it cannot support networks with hundreds of millions of web pages (a large number of file storage needs are generated during network crawling and indexing ).

Hadoop advantages:

Convenience: it can be applied to large clusters of general commercial machines.

Elasticity: You can scale up or down Nodes Based on the Cluster load to efficiently use resources.

Robust:

Simple:


1. hadoop common provides some common tools for other hadoop projects, including: System Configuration tool configuration, Remote Procedure Call RPC, serialization mechanism and hadoop abstract File System filesystem

2. Avro data serialization system for network transmission

3. zookeeper solves consistency problems in distributed systems, such as Uniform Naming Service, State Synchronization Service, cluster management, and distributed application configuration item management.

4. HDFS data management and storage

5. mapreduce


6. hbase: Provides random and real-time read/write access to large-scale data. The stored data can be processed by mapreduce, which perfectly combines data storage and parallel computing.

7. hive is a data warehouse architecture built on hadoop, including data ETL (extraction, conversion, and loading) tools, data storage management, and query and analysis capabilities for large datasets, SQL-like language.

8. Pig simplifies the task code and converts Pig Latin scripts into hadoop task chains.

9. The main goal of mahout is to create the implementation of some classic algorithms in the Extensible machine learning field and create intelligent applications (clustering, classification, and recommendation engine (coordinated filtering) more quickly) and frequent set mining and other data mining algorithms)

X-RIME, social network analysis tools


12. lume massive log collection systems, data streams, And customizable data senders, which support data of different protocols and provide simple processing capabilities for log data, such as filtering and format conversion, logs can be written to various data targets.

13. sqoop: exchange data between structured data storage and hadoop (hive). mapreduce is used for parallel processing.

14. In the oozie workflow engine, hadoop computing jobs are abstracted as actions to build dependencies between them and form a Directed Acyclic workflow.

 

Eclipse shortcut:

CTRL + T view the class structure

CTRL + Shift + t find a class

CTRL + ALT + F view the call RELATIONSHIP OF THE METHOD

Crygwin does not match.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.