Hadoop is here, are you ready?

Source: Internet
Author: User
Keywords nbsp large data disk is
Hadoop is here, are you ready? Blog Category: reprinted Hadoop Data distributed development Framework

Reprinted from It Learning Community: http://bbs.itcast.cn/forum-122-1.html

Now there is a notebook, the configuration is Core i5, 4G memory, 500G hard drive. It's hard to imagine that your first computer was configured with Pentium 3, 512M memory, and 20G hard drives. At that time, my 20G hard drive has a lot of spare. Now, a variety of software, movies, music, instructional video so that 500G hard disk space can not fit. With the development of the Internet, more and more data are produced, which includes not only the structured data which can be stored in the database, but also the semi-structured and unstructured data, such as Web page, email, SMS, microblog, log, etc. On the Internet, Twitter posted about 340 million daily, Sina Weibo users more than 100 million, Baidu to deal with about billions of search requests, Taobao station transactions up to tens of millions of, Unicom's users online record one day to 10TB (1TB=1024GB). All this suggests that the big data age has come!

    What is big data? Look at the 4V features of the big data.     Volume capacity is large. Only a few (10) gigabytes of data cannot be called large data, and such data can be handled in a traditional RDBMS. An RDBMS or even a data warehouse cannot be processed when the data reaches hundreds of GB, or even TB levels. This is the big data.     produced types are diverse. The heterogeneity of the data (different structures) and the characteristics of large data, such as log, text, Word, PDF, PPT, Excel, JPG, GIF, AVI and other graphics, text, audio, video files. These file types are not handled by traditional RDBMS and cannot be retrieved or analyzed.     Velocity access quickly. Data is the lifeblood of an enterprise, and data must be processed quickly, which is the advantage of a traditional RDBMS. But in the face of massive data, the RDBMS is powerless.     value density is low. The most valuable data has been transformed into structured data stored in databases, data warehouses. For large data with low value density, it is not always the object of database attention. But huge amounts of data are not worthless, such as the Long Tail theory, "beer and diapers," based on the commercial value of large data. Therefore, the business value of refining large data is a new growth point of enterprises, more and more enterprises attach importance to.     Above 4V brings large data difficult to store, difficult to manage, difficult to exploit. What to do? Hadoop's Out!     data is stored in disk media, the vast amount of data must be stored in a huge amount of disk. So many disks have exceeded the file management capabilities of operating systems such as Windows and Linux, resulting in a distributed file management system, the DFS (Distributed File system). A Distributed file management system is used to manage data distributed across many disks. Distributed file system needs to consider distributed reading, writing, retrieval, data consistency, disk failure, redundancy and so on. The HDFs of Hadoop is a distributed DFS dedicated to storing massive amounts of data in a fragmented disk.     Data is stored, that's what the archives do, and that's not what companies want to do. It makes sense that data can only be exploited to produce commercial value. Then the need for large data retrieval, query, make a variety of transformations, which are called "calculation." The most common calculation is to go heavy and sort. Some people think that it is difficult to find a high-performance server running on the line. It's not that simple, because disk addressing time, disk I/O, network I/O, relativeFor big data, it's a huge expense. We thought of a way: to divide the mass of data into small chunks, to allow a machine to process a small piece of data, and all machines to work at the same time. Finally, the results are summed up. This is "parallel computing". MapReduce in Hadoop is a parallel processing framework specifically designed for distributed computing. Hadoop is used to solve the storage and computation of large data.     Now, internet giants such as Google, Yahoo, Twitter, Facebook and so on are already using big data. Google is the originator. In China, the application of Hadoop is more and more, internet companies such as Baidu, Taobao, Tencent, Sina, Sohu has been processing large data years ago. Traditional industries, such as telecoms, finance and banking, have also begun to value the business of large data.     With so many companies using big data, the thirst for big data is getting stronger, but there are very few people who know Hadoop. So the salaries of such people are quite high.





the figure below is the search results from the worry recruitment website for Hadoop jobs, and you can see that most jobs have a monthly salary of more than 10k. Posts with a monthly salary exceeding 20k have a large proportion.


above is a screenshot of the results of the query, you can see that the Hadoop engineer's treatment is very high, almost all in the annual salary of more than 20W.





when Hadoop talent is scarce, you master the technology and think about what the results are.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.