It's no secret that data scientists are in short supply. Data explosions and corresponding explosion-proof tools, along with Moore's and Metcalfe's laws, have led to more data, links, and technology than ever before. In last year's Hadoop world, there was a frenzy of data scientists who barely met the needs of technology-oriented data architects. This means:
1. Potential MacArthur grant recipients, who need to be enthusiastic and perceptive about data, math and statistics skills, understand the algorithms, understand the artistry of painting pictures, and understand the orientation of all data. That's what data scientists mean.
2. These people can learn about the side of a large data platform, which is the data structure division or the digital engineer.
The data structure Division will be the one who faces more difficulties. Learn about large data platforms (Hadoop, MongoDB, Riak) and emerging Advanced SQL products (Exadata, Netezza, Greenplum, Vertica, and a recently emerging technology, such as Calpont), which is a technical skill, Professors can be taught through explicit courses. The rules of supply and demand will solve this problem-just as the 1999 bubble created the demand for Java programmers.
Behind all the calls for Hadoop programmers, there is a similar, but very quiet, rush to recruit data scientists. Like some data scientists say data scientists are a popular language, this demand is real.
However, data science will have many difficulties to overcome. All of this is related to connection points, and it's not as easy as it sounds. The v--capacity, variety, speed, and value of large data need to be discovered by some people based on their insights into the data, traditionally the role is performed by data developers. But data developers can only handle a limited number of problems, as well as bounded (known) datasets, which makes the problem even more two-dimensional.
A wide variety of large data--the introduction of an unknown element in form and source. The interpretation of large data requires shrewd investigation, communication skills, creativity/art, and the ability to think very intuitively about numbers. And don't forget that all of this is built on a solid statistical and machine learning background, coupled with technical knowledge of tools and trade programming languages.
Sometimes it seems that we are looking for Einstein or some wise man.
Nature hates vacuum.
Just as nature hates vacuum, now people are not only eager to define what kind of people are data scientists, but are also considering developing programs to teach through these programs, to some extent, through software packages, to include the information inside, or throw them elsewhere. EMC and other vendors are ramping up the development sector to provide training, not only on platforms but also on data science. Kaggle provides an innovative cloud-based, the crowdsourcing approach to data science provides a predictive modeling platform, and then a 24-hour race for potential training data scientists to develop the best solution for specific problems (this conjures up a 1 million dollar bonus system for Netflix, Devise a smarter algorithm to predict the audience's tastes.
With the scarcity of data science talent, we expect consultants to buy more talent and then "rent" it to multiple clients. With the exception of a handful of foreign companies, few system integrators (SI) have stepped up to launch large data practices (where logical data scientists will reside), but we expect that to change quickly.
Opera's solution, which has been involved in the predictive Analysis consulting competition since 2004, is next in line with the downward package. Adding $84 million trillion in series a last year, the company, which has nearly 200 data scientists, has become one of the biggest talent combinations on Google's side. Opera's predictive analytics solution is designed for a variety of platforms, SQL and Hadoop, and today they are joining the sapphire of the SAP announcement, while releasing their quotes for the Hana memory database. Andrew? Brewster has made a good in-depth analysis of the details of this bulletin.
From the perspective of SAP, Opera's predictive analytics solutions are logically Hana, because they involve a variety of complex issues (for example, one calculation triggers other calculations) and their new in-memory database platforms are designed specifically for them.
The expectation that opera will continue to be the only large aggregated data scientist that can be rented by other companies is of great value to opera. Ironically, however, market entry barriers will make the space for competition very narrow and highly concentrated. Of course, as market demand increases, there will inevitably be a downward trend in the definition of data scientists, so that more and more companies can claim that they have one or more data scientists.
The law of supply and demand will be biased in data scientists, but the pace of supply will not rise as quickly as the platform-focused data architect or engineer. Inevitably, the supply of data scientists will be enhanced by software that automatically interprets what the machine learns, but the software only works, and you can create creative and counter-intuitive insights on the machine.
(Responsible editor: The good of the Legacy)