Large data technology hot shortage of professional talent into development constraints

Source: Internet
Author: User
Keywords Well said the hottest now
Tags analysis apache blog business click data data management data mining

Guide: While there are security issues, Hadoop is ready for large projects deployed in large enterprises. Hadoop, the top open source project for Apache, is mainly used to analyze large datasets, which are now widely used by internet companies such as ebay, Facebook, Yahoo, AOL and Twitter. Last month Microsoft, IBM and Oracle also embraced Hadoop.

More and more companies are starting to grope for Hadoop technology to deal with the data streams and social media brought by blogs, clicks. Hadoop can be used for storage and analysis to enhance business insights for customers in large data.

Large data rapidly expanding in the enterprise related talent shortage

At present, the shortage of related talents is quite serious. It executives from JP Morgan Chase and ebay have expressed the same view this month in New York at the Hadoop global Assembly. Hugh Williams, vice president of search and platform at ebay, said at the conference that they are currently recruiting Hadoop professionals. Larry Feinsmith, general manager of JPMorgan Chase, said half joking that they were not only willing to hire qualified professionals, but would also offer a 10% higher salary than ebay.

Larry Feinsmith says JPMorgan Chase is still heavily reliant on traditional relational database systems for transactional processing. But as more and more fraud detection and it risk management and self service, the past system has been unable to meet the existing needs, and the characteristics of Hadoop technology can just adapt to today's business.

JP Morgan Chase now has 150PB of data stored online, 30,000 databases. The total amount of user account records amounted to USD 35 billion. These figures amply illustrate the lifeline of JPMorgan Chase. The advantage of Hadoop is that it is suitable for storing large amounts of unstructured data, which enables organizations to effectively collect and store blogs, as well as transaction data and social media data. Larry Feinsmith said.

And ebay's vice president for search and platform Hugh Williams says ebay is now using Hadoop technology and HBase databases. For real-time data analysis. It also uses Hadoop technology to build a new search engine for its web site. According to him, ebay active buyers and sellers over 97 million, the site has a daily close to 2 billion page views, while bringing 250 million times a day search or query and tens of billions of of the database calls. He also said ebay now has 9PB of data stored in Hadoop and Teradata clusters, and data volumes are growing rapidly.

Hadoop exists hard standard data mining and other field talents as potential objects

James Kobielus, analyst at Forrester Research, believes that in today's business, Hadoop is a new generation of data warehouses and should be seen as a source of data. Compared to today's traditional relational database management systems, Hadoop enables enterprises to store, manage massive amounts of structured data and unstructured data volumes.

James Kobielus says more and more companies are hot for Hadoop-related workers as demand for technology such as Hadoop analysis is increasing. People who can harness Hadoop are great contributors to the business, and they deserve to be paid accordingly. Hadoop requires practitioners with relevant work experience in the advanced analysis field, such as the ability to process predictive and statistical modeling with a new generation of technology solutions, such as MapReduce and R languages. People with technical backgrounds in multiple statistical analysis, data mining, predictive modeling, natural language processing, content analysis, text analysis and social field analysis are potential targets for Hadoop.

Hadoop has been widely watched by businesses and has brought hard demands on the Hadoop platform's professional managers. Their job responsibilities include Hadoop clustering, security, and management, and are optimized to ensure the availability of the cluster to the enterprise. Previous database administrators responsible for managing Teradata and Oracle Exadata are now trying to transform the role of Hadoop cluster management. They will realize that this is a new world. At the same time, storage management professionals are also essential, and what they are now doing is helping to integrate the Hadoop environment with existing traditional database technologies.

Hadoop professionals are divided into three main categories

Martin Hall, president of Karmasphere, says the current demand for Hadoop professionals is divided into three broad categories: data analysts (also known as data scientists), data engineers, and it data management specialists. Karmasphere is now mainly engaged in the project is the development of software products for Hadoop environment.

Martin Hall believes that the role of data management experts is to select, install, manage, standardize, and expand the large-scale Hadoop cluster. These professionals determine whether Hadoop should be based on a cloud or a preset model, including how the vendor chooses and uses the Hadoop distribution, the size of the cluster, and its use for running production applications or quality testing. This position should have skills similar to those previously responsible for traditional relational databases and database environment classes.

At the same time, Hadoop data engineers are responsible for the creation of data processing and the creation of distributed MapReduce algorithms for use by data analysts. Professionals who are more skilled in areas such as Java and C + + are more easily able to get more opportunities in the enterprise's massive deployment of Hadoop.

The third type of professional demand is in the SAS, SPSS and in the R language as the representative of the programming language and other aspects of the rich experience of data scientists. These professionals are able to centralize build, analyze, share, and intelligent integration and store them in a Hadoop environment.

For now, the shortage of talent in the Hadoop sector means that businesses are more reliant on the deployment technology provided by service providers. One of the big signs underpinning this argument is that in the professional consulting and systems integration industry, the revenue from the actual application of Hadoop is much greater than the revenue generated by the sales of Hadoop products.

Now companies such as Cloudera, MAPR, Hortonworks, and IBM are already offering training courses on Hadoop, and people should make the most of the resources by building the Hadoop center of Excellence to maximize the benefits for their businesses.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.