Book learning-dong sicheng's hadoop technology insider in-depth analysis of hadoop common and HDFS Architecture Design and Implementation Principles
High Fault Tolerance and scalability of HDFS
Lucene is an engine development kit that provides a pure Java high-performance full-text search that can be easily embedded into various applications for full-text search/indexing.
Nutch is a search engine application implemented based on Lucene. Lucene provides text search and index APIs for nutch, but it cannot support networks with hundreds of millions of web pages (a large number of file storage needs are generated during network crawling and indexing ).
Hadoop advantages:
Convenience: it can be applied to large clusters of general commercial machines.
Elasticity: You can scale up or down Nodes Based on the Cluster load to efficiently use resources.
Robust:
Simple:
1. hadoop common provides some common tools for other hadoop projects, including: System Configuration tool configuration, Remote Procedure Call RPC, serialization mechanism and hadoop abstract File System filesystem
2. Avro data serialization system for network transmission
3. zookeeper solves consistency problems in distributed systems, such as Uniform Naming Service, State Synchronization Service, cluster management, and distributed application configuration item management.
4. HDFS data management and storage
5. mapreduce
6. hbase: Provides random and real-time read/write access to large-scale data. The stored data can be processed by mapreduce, which perfectly combines data storage and parallel computing.
7. hive is a data warehouse architecture built on hadoop, including data ETL (extraction, conversion, and loading) tools, data storage management, and query and analysis capabilities for large datasets, SQL-like language.
8. Pig simplifies the task code and converts Pig Latin scripts into hadoop task chains.
9. The main goal of mahout is to create the implementation of some classic algorithms in the Extensible machine learning field and create intelligent applications (clustering, classification, and recommendation engine (coordinated filtering) more quickly) and frequent set mining and other data mining algorithms)
X-RIME, social network analysis tools
12. lume massive log collection systems, data streams, And customizable data senders, which support data of different protocols and provide simple processing capabilities for log data, such as filtering and format conversion, logs can be written to various data targets.
13. sqoop: exchange data between structured data storage and hadoop (hive). mapreduce is used for parallel processing.
14. In the oozie workflow engine, hadoop computing jobs are abstracted as actions to build dependencies between them and form a Directed Acyclic workflow.
Eclipse shortcut:
CTRL + T view the class structure
CTRL + Shift + t find a class
CTRL + ALT + F view the call RELATIONSHIP OF THE METHOD
Crygwin does not match.