KeywordsBig data big data you don't at the same time this
Now Apache Hadoop has become the driving force behind the development of the big data industry. Techniques such as hive and pig are often mentioned, but they all have functions and why they need strange names (such as Oozie,zookeeper, Flume).
Hadoop has brought in cheap processing of large data (large data volumes are usually 10-100GB or more, with a variety of data types, including structured, unstructured, etc.) capabilities. But what's the difference?
Today enterprise data warehouses and relational databases are good at handling structured data and can store large amounts of data. But the cost is somewhat expensive. This requirement for data limits the types of data that can be processed, and the drawbacks of this inertia affect the search for agility in data warehouses when confronted with massive amounts of heterogeneous data. This usually means that valuable data sources are never mined within the organization. This is the biggest difference between Hadoop and traditional data processing methods.
This paper focuses on the components of the Hadoop system and explains the functions of each component.
The core of Mapreduce--hadoop
Google's web search engine, while benefiting from the algorithm's role, MapReduce played a huge role in the background. The MapReduce framework has become the most influential "engine" behind today's large data processing. In addition to Hadoop, you'll also find MPP on MapReduce (Sybase IQ launches a listing database) and NoSQL (such as Vertica and MongoDB).
The important innovation of MapReduce is that when processing a large dataset query, it decomposes its task and processes it in multiple nodes running. When the data volume is very large can not solve the problem on a server, at this time distributed computing advantage is reflected. Combining this technology with a Linux server is a highly cost-effective way to replace a large scale array of calculations. Yahoo saw the future potential of Hadoop in 2006, and invited Hadoop founder Doug Cutting to develop Hadoop technology, which in 2008 Hadoop was already a certain size. The Hadoop project absorbs a number of other components at the same time as it matures in the early stages of development to further improve usability and functionality.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.