Data scientists don't need too much to make big data easy enough

Source: Internet
Author: User
Keywords Data scientists big data can we these

Editor's note: The New York Times has written Pro Data age has come, http://www.aliyun.com/zixun/aggregation/13768.html "> Data scientists have been dubbed the most sexy occupation, but E-commerce consulting company Jack's founder and CTO Scott Brave says we don't need more data scientists to make big data easier to use. Here's what he thinks:

The big numbers are so hot this year that the New York Times and other media have declared that the Big Data era is coming. Digging large data can yield insights and the incentives and structures needed to make informed decisions and actions with large data. The miners digging these gold mines are data scientists, so the miners are also dubbed the hottest careers in the future. However, any article on big data today inevitably concludes that there is a serious shortage of data scientists. A survey by McKinsey in 2011 suggested that many organizations generally lacked such skilled talent.

But there is little discussion about how to circumvent this bottleneck and make big data available directly to business leaders. The software industry has done something like this before, and now we can follow it.

In order to achieve this goal, first of all, we must understand the role of data scientists in large data. For now, large data is a melting pot of distributed data architectures and tools like Hadoop, NoSQL, Hive, and R. In this High-tech environment, data Scientists act as information providers and intermediaries between these systems and business-side domain experts.

Overall, data scientists have three main roles: Data architecture, machine learning, and analytics. While these roles are important, not all companies need a highly professional data team like Google and Facebook. The power of large data can be delivered directly to business users as long as they can develop a product that is consistent with the purpose and keep the complexity of the technology as low as possible.

As an example, we can review the Web content Management revolution at the turn of the century. The site has been rage, but field experts have repeatedly hit the wall, because it is the bottleneck. Whenever new content is added, it needs to be choreographed, and sometimes IT needs to be hard-coded into IT elites. How did this problem come out later? We generalize and abstract these basic needs into content management systems, and then make them simple enough to be used by people who don't know the technology. The bottleneck was thus broken.

Next, we look at the three roles of data scientists in the context of online trade.

Data architecture

The key to reducing complexity is limiting the scope. Almost every electric dealer is concerned with capturing the user's behavior-activities, shopping, offline transactions, and social data-and almost everyone has a product catalog and a customer profile.

By limiting the scope to this basic functionality, you can create templates for standard data entry, which greatly simplifies data capture and pipeline connections. Under the 2/8 principle (80% of the large data use cases can be implemented using 20% of the technology), we do not need to package all the different data architectures and tools (Hadoop, Hbase, Hive, Pig, Cassandra, and Mahout).

Machine learning

Well, the data architecture seems to be system-ready. Perhaps data scientists are necessary if demand is highly customizable. Many of these things can be abstracted, like recommendation engines and personalized systems. For example, a large part of the work of data scientists is to make "feature" patterns that combine input data so that machines can learn effectively. The process is pretty much the data. The scientists work with the data and then shove it into the machine and click "Start", and the job of the data scientist is just to help the machine look at the world in a meaningful way.

However, if you follow a single domain, feature creation can also be templated. For example, each E-commerce site has the concept of purchase flow and user segmentation. If domain experts can directly encode their ideas into the system, the field into the system, then data scientists this translation and intermediary can be omitted?

Analysis

It's never easy to automatically analyze the most valuable things from the data. But it is possible to provide a perspective mirror for a single area-this allows business experts to experiment, just like data scientists do. This seems to be one of the easiest problems to solve, because the market has already had a variety of specific areas of analysis products.

But these products are too restrictive for field experts to approach. There is definitely room for improvement in interface friendliness. We also need to consider how the machine learns from the results of the analysis. This is the key feedback loop that business experts want to be able to modify this loop. This is another opportunity to provide a template interface.

As in the CMS area, these solutions are not a panacea. But using technical solutions for a set of data problems with generalization can ease the bottleneck of data scientists. Once domain experts are able to collaborate directly with machine learning systems, we can enter a new era of large data-a century in which people and machines can learn from each other. Perhaps at that time, big data can solve more problems than it creates.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.