The last half month began to study Spark's machine learning algorithm, because of the work, in fact, there is no real start of machine learning algorithm research, but did a lot of preparation, now the early learning, learning and will learn to do a comb, finishing a spark machine learning complete process. The books recommended in this paper focus on popular and actual combat. Basic Knowledge Linux Basics and Combat
Linux learning recommended "bird Brother's Linux private Dishes" Foundation, this book is the introduction of thousands of Linux learners of books, humorous, humorous, deep, note the actual combat. In my early years, I looked at the third edition, which was updated to the fourth edition this June.
Traditional website: http://cn.linux.vbird.org/
Simplified website: https://wizardforcel.gitbooks.io/vbird-linux-basic-4e/content/ Network Knowledge
In the cluster operation, especially in the production environment, the network between the nodes, it is important to recommend the "Bird's Linux Private dishes" server and "Wireshark network analysis is so simple," the latter author is the EMC Network Storage Department director engineer, Mainly in their own work encountered a variety of problems, through the way of ridicule, introduced the basic knowledge of the network (unfortunately, I have not finished reading). basic knowledge and principles of Hadoop
Hadoop Learning my early reading is the "authoritative Guide to Hadoop," This book is not recommended, because it is not good to chew, the level of translation is limited, for beginners, easy to give up, I began to learn Hadoop, Should be 1.0 times, then the three carriage is HDFS and mapreduce,hbase, these years with the rapid development of Hadoop in the industry, should enter the 2.0 era, integrated into the Yahoo Yarn resource Manager. Of course, regardless of how it develops, HDFs and MapReduce are at the heart of Hadoop, and it's a good idea to build a Hadoop cluster (the previous Linux learning will work here). basic knowledge and principles of spark
Spark's learning recommendations, "spark fast Big Data analysis" and the official website guide, are written by several core contributors to the SAPRK open source community, and are read in a process where the RDD chapter is the core, and the data is written to HDFs in relation to each of the MapReduce intermediate processes. The RDD is put in memory, and the speed speaks for itself. Of course, the best to build a cluster, here can refer to the blog I wrote earlier
Cluster Construction: http://blog.csdn.net/iigeoxiaoyang/article/details/53020066
Development example: http://blog.csdn.net/iigeoxiaoyang/article/details/53260101 Development language
Language in the field of machine learning must be a functional programming language, followed by a powerful third-party scientific Computing Library. Python
In the field of scientific computing, Python is undoubtedly the first language, and Spark also supports Python, and Python's third-party library has numpy (numerical processing Library), SciPy (mathematical symbol calculation Library), Matplotlib, and so on. Scala
Python, as the first language in the field of scientific computing, has a huge library of scientific computations, but there are two reasons why I choose the Scala language: first, the functional design of the Scala language is better, two, Scala is running on the JVM, and the speed advantage is obvious in the generation of bad environment.
Learning recommendations for basic grammar: http://twitter.github.io/scala_school/zh_cn/(Twitter's Scala classroom)
Recommended Video 1:http://www.imooc.com/learn/613 (MU lesson network Scala video tutorial), this video is a simplified version of the following English version of the video, each episode of about 7 minutes, the main understanding of the idea of Scala functional programming.
Functional Design idea: https://www.zhihu.com/question/28292740 (Heart Court), and the above video together, basically is the essence of Scala functional programming.
Recommended video 2:https://www.coursera.org/specializations/scala (Scala functional programming Principles), the video is a Scala language designers Martin Odersky, with Chinese subtitles, This is an in-depth study of Scala video. Theoretical Knowledge linear algebra
If you are not familiar with the concept of linear algebra, to learn the natural sciences, now seems to be almost illiterate ———— Swedish mathematician Lars Garding
This may be a bit too much, but at least it is the basis of machine learning. Recommended by the MIT Gilbert Strang professor of linear algebra,
Video address: http://open.163.com/special/opencourse/daishu.html (seen in 19 episodes), many concepts not understood at the school stage, such as matrix column space, 0 space, and line space and linear transformations, this course is very good.
Professor David C.lay's "linear algebra and its Applications," a book that explains the linearity of the Matrix, is a good reference and can be turned on at any time. Spark Advanced data analytics and machine learning
Below is the real machine learning start. It is recommended that spark advanced data analysis and spark machine learning
are the authors of Cloudera's data scientists, mainly in the current industry case studies. On the basis of understanding the best to do hands-on practice, I personally in the cluster run the eighth chapter of the case, you can refer to the blog I wrote before: http://blog.csdn.net/iigeoxiaoyang/article/details/53020066. The latter is not yet read.
Machine learning is just beginning and will be updated later.