The contention of data scientists and the establishment of the Graduate School of American Analytical Science

Source: Internet
Author: User
Keywords Data scientists US cents
Tags analysis business company compared compared to the computer computer science cost
Benefits of manual free chain Ivy-technet Ivy about our company links to sell cheap high quality soft links good things Google optimization SEO optimization baidu included increased links to learn SEO

Data scientists with high demand

From a technical point of view, the price of hard disk, NoSQL database technology, so that compared with the past, a large number of data can be low-cost and efficient way to store. In addition, the advent of distributed processing technologies, such as Hadoop, which can work on a general-purpose server, makes it much faster and cheaper to make statistical processing of large, unstructured data.

However, even if the tools are perfect, it is impossible for the data to produce value in itself. And then we need people who can use these tools to find gold in the mountains of data and to communicate the value of the data in understandable form to decision makers and ultimately to business. Those who possess these skills are the "data scientists" who are currently in the midst of a huge wave of data in the United States.

The focus on data scientists stems from the gradual realization that there is a group of professionals behind Google, Amazon and Facebook. These web companies are not just storing large amounts of data, but turning them into valuable gold mines-for example, search results, targeted advertising, accurate product referrals, possible friends lists, and more.

Data science is a term that existed a long time ago, but data scientists (scientist) are a new word that popped up a few years back. There is a mixed description of the origins of the word, including in the book "The Beauty of data" (interfere Data,toby Segaran, Jeff hammerbacher, O ' Reilly), for Facebook's data scientists.

"On Facebook, we find that traditional titles like business Analysts, statisticians, engineers, and research scientists don't exactly define the role of our team." The role's work is varied: on any given day, a member of the team can implement a multi-stage process pipeline flow in Python, design hypothesis testing, use of tool R to perform regression testing on data samples, and design and implement algorithms for data-intensive products or services on Hadoop, Or show the results of our analysis in a clear and concise manner to other members of the business. We created the role of ' data scientist ' in order to master the technology needed to accomplish this multifaceted task. ”

Just a few years ago, data scientists were not a formalized profession, but in the blink of an effort, the profession has been hailed as "the most important talent in the IT industry for the next 10 years".

Varian,1947~, chief economist at the University of California, Berkeley, Professor Hal Varian, in a dialogue with Mr. James Manyika, director of McKinsey in October 2008, I have spoken the following paragraph (Chinese version is from the McKinsey quarterly official Chinese manuscript). "I always say that the most interesting job in the next 10 years will be statisticians." People think I'm joking. But who in the past would have thought that computer engineers would be the most interesting job of the 90 's? In the next 10 years, get the data--to understand it, the ability to deal with it, extract value from it, visualize it, and deliver it will become an extremely important skill, not only at the professional level, but also at the educational level (including education for primary and secondary school students, high school students, and undergraduates). Since we now have real free and ubiquitous data, the rare element that complements it is the ability to understand and extract value from it. ”

Professor Varian used the term "statisticians" in his original dialogue, although he did not use the word "data scientist" at the time, but the point here is the data scientist we are talking about.

The skills required by data scientists

Data scientists do not have a fixed definition of the profession, but in general it refers to such talent.

"The so-called data scientists, refers to the use of statistical analysis, machine learning, distributed processing technology, from a large number of data to extract the business meaningful information, in an understandable form to the decision-makers, and create new data application services. "The skills required by data scientists are as follows.

(1) Computer science

In general, most data scientists require a professional background in programming and computer science. In short, it is the skills necessary to handle large scale data such as Hadoop, Mahout, and machine learning.

(2) Mathematics, statistics, data mining, etc.

In addition to mathematics, statistical literacy, but also the use of SPSS, SAS and other mainstream statistical analysis software skills. Among them, the open source programming language of statistical analysis and its operating environment "R" has attracted much attention recently. The strength of R is not only that it contains a wealth of statistical analysis libraries, but also that it has a high quality chart generation capability to visualize results and can be run with simple commands. In addition, it has a package extension mechanism called Cran (the comprehensive R Archive receptacle), which enables you to use functions and datasets that are not supported in standard state by importing expansion packs.

(3) Data visualization (visualization)

The quality of information is largely dependent on the way it is expressed. It is one of the most important skills for data scientists to analyze the meaning contained in the figures, develop web prototypes, and use external APIs to unify graphs, maps, dashboard, and other services so that the results of the analysis can be visualized.

(Responsible editor: admin)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.