International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Hadoop technology insider HDFS-Note 1

Last Update:2014-06-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Book learning-dong sicheng's hadoop technology insider in-depth analysis of hadoop common and HDFS Architecture Design and Implementation Principles

High Fault Tolerance and scalability of HDFS

Lucene is an engine development kit that provides a pure Java high-performance full-text search that can be easily embedded into various applications for full-text search/indexing.

Nutch is a search engine application implemented based on Lucene. Lucene provides text search and index APIs for nutch, but it cannot support networks with hundreds of millions of web pages (a large number of file storage needs are generated during network crawling and indexing ).

Hadoop advantages:

Convenience: it can be applied to large clusters of general commercial machines.

Elasticity: You can scale up or down Nodes Based on the Cluster load to efficiently use resources.

Robust:

Simple:

1. hadoop common provides some common tools for other hadoop projects, including: System Configuration tool configuration, Remote Procedure Call RPC, serialization mechanism and hadoop abstract File System filesystem

2. Avro data serialization system for network transmission

3. zookeeper solves consistency problems in distributed systems, such as Uniform Naming Service, State Synchronization Service, cluster management, and distributed application configuration item management.

4. HDFS data management and storage

5. mapreduce

6. hbase: Provides random and real-time read/write access to large-scale data. The stored data can be processed by mapreduce, which perfectly combines data storage and parallel computing.

7. hive is a data warehouse architecture built on hadoop, including data ETL (extraction, conversion, and loading) tools, data storage management, and query and analysis capabilities for large datasets, SQL-like language.

8. Pig simplifies the task code and converts Pig Latin scripts into hadoop task chains.

9. The main goal of mahout is to create the implementation of some classic algorithms in the Extensible machine learning field and create intelligent applications (clustering, classification, and recommendation engine (coordinated filtering) more quickly) and frequent set mining and other data mining algorithms)

X-RIME, social network analysis tools

12. lume massive log collection systems, data streams, And customizable data senders, which support data of different protocols and provide simple processing capabilities for log data, such as filtering and format conversion, logs can be written to various data targets.

13. sqoop: exchange data between structured data storage and hadoop (hive). mapreduce is used for parallel processing.

14. In the oozie workflow engine, hadoop computing jobs are abstracted as actions to build dependencies between them and form a Directed Acyclic workflow.

Eclipse shortcut:

CTRL + T view the class structure

CTRL + Shift + t find a class

CTRL + ALT + F view the call RELATIONSHIP OF THE METHOD

Crygwin does not match.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop technology insider HDFS-Note 1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support