What is the development process of Hadoop?
Hadoop originally came from a Google programming model package called MapReduce. Google's MapReduce framework can decompose an application into many parallel computing instructions, running very large datasets across a large number of compute nodes. A typical example of using this framework is the search algorithm that runs on the network data.
Hadoop was initially associated with web indexing and rapidly developed into a leading platform for analyzing large data. Cloudera is an enterprise software company that began providing software and services based on Hadoop in 2008.
Gogrid is a cloud computing infrastructure company that, in 2012, worked with Cloudera to accelerate the adoption of a Hadoop based application. Dataguise, a data security firm, also launched a data protection and risk assessment for Hadoop in 2012.
Apache Hadoop Support Project
The Apache Software Foundation maintains several supporting projects for Hadoop:
· Apache Cassandra is a database management system designed for large data-volume scenarios. Its key features are fault tolerance, scalability, Hadoop integration, and replication support.
· HBase is a non relational, fault-tolerant, distributed database designed to store large amounts of sparse data.
· Hive is a data warehouse system designed for Hadoop that supports simple data aggregation.
· The Apache pig consists of high-level languages that create data analysis programs, as well as the basis for evaluating those applications.
· Apache Zookeeper is a centralized service for distributed applications. It maintains configuration information and provides naming registrations, distributed synchronization, and group services.
· Chukwa is a data collection system that can monitor large distributed systems, including a toolkit for analyzing results.
· The Apache Mahout project is designed to generate rich implementations in the Hadoop platform, an extensible machine learning algorithm.