Several novice programmers won the Kaggle Predictive modeling contest after enrolling for a few days of "machine learning" courses on Coursera for free.
The big data talent scare that the industry has made in it is the initiator--has raised expectations and demands for big data and advanced analytics talent, and data scientists have become the sexiest career of the night, with its halo chasing sports stars. Data scientists are portrayed as God's characters, who are proficient in mathematics, computing, sociology, physics and other disciplines, while at the same time insight into the world, well versed in the operation of the enterprise, the number of rare protected animals. All of these have gaoshanyangzhi it practitioners who are interested in big data analytics.
But more and more evidence shows that even junior programmers can be good data scientists.
It turns out that only a cloud server is needed to teach Andrew Ng a few days of "machine learning" in Coursera's public class, and even novice programmers can become the Golden Phoenix that digs king data.
The above is not a fantasy, the Predictive Analysis Contest website Kaggle provides us with many examples. In September this year, a lawyer-born insurance risk Model designer, Carter S, took the project bonus for the first time in the Kaggle competition. Carter uses the original "Violence Analysis" (Overkill Analytics), the so-called violence analysis is to abandon the complexity of big data analysis model, the combination of a large number of simple models, using today's hardware systems (such as the Hadoop cluster) powerful processing power "brute force" analysis of large data sets. Carter's case shows that big data does not necessarily mean "big model", which means that the technology threshold for big data analytics is not as high as it should be. In spite of this, Carter taught himself the knowledge of natural language processing and social analysis at work and was not unfamiliar with linear regression analysis.
If Carter had some knowledge and experience of data analysis before he became a big data scientist, then? With no data analysis experience at all, can only junior people at the university level become data scientists?
The answer is yes. New Orleans University student Luis Tandalla last year on the Coursera online platform after listening to a few lessons, in the master gathered in the Kagge competition in a swoop, won the Hewlett Foundation set up the contest prize. Tandalla developed a model that accurately assesses the score of a short question test. You know, Tandalla didn't know about AI and machine learning until Coursera signed up for it.
After tasting the sweetness, Tandalla's enthusiasm for learning was completely ignited, and he enrolled more courses on Coursera, including natural language processing and probabilistic analysis models, and began learning other data analysis knowledge. Tandalla will graduate in May 2013, but Tandalla says he will consider continuing to graduate from the machine-learning profession. Today, the creation of its own predictive analytics software company has become the direction and dream of Tandalla.
Tandalla was not a case, and in the Heritage Foundation contest, the second and third were learning Coursera's machine learning courses, in which the 39-year-old Xavier Conort from Singapore was transformed into a data scientist last year, It is now the top player on the Kaggle.
The successful gene of Courera
Andrew Ng
The reason why the Coursera course is so productive is inextricably linked to Stanford professor Andrew Ng's philosophy of scholarship. Ng thinks it's about the atmosphere in Silicon Valley, and if it's not a close communication with Silicon Valley's best scientists, Coursera's curriculum can't be so successful. In addition, Ng's curriculum focuses on practical applications, learning techniques in the process of solving practical problems, and he spends more time on technology applications than on algorithms themselves.
In Ng's opinion, isolated learning algorithms are not desirable, which is like simply learning the syntax of programming languages rather than trying to make useful programs. In another famous online free course platform Udacity, we can also see similar teaching concepts. Google vice President, Stanford professor Sebastian Thrun, in the course of computing Science 101, is about how to develop a usable search engine to explain the Python programming language.
Via GigaOM
It's not hard to be a data scientist