This article is a computer Quality Pre-sale recommendation >>>>Spark machine learning
When machine learning meets the most popular parallel computing framework spark ...
Editor's recommendation
Apache Spark is a distributed computing framework optimized to meet the needs of low latency tasks and memory data storage. Apache Spark is a rare framework in the existing parallel computing framework where there is little to take into account speed, scalability, memory processing, and fault tolerance, while simplifying programming and providing a powerful, flexible and expressive API.
This book introduces the basics of spark, from using the Spark API to load and manipulate data, to input data as a variety of machine learning models. It also explains common machine learning models through detailed examples and practical applications, including recommendation systems, classification, regression, clustering, and dimensionality reduction. Finally, we introduce some high-level content, such as the processing of large-scale text data, and the method of online machine learning and model evaluation under spark streaming.
If you're a Scala, Java, or Python developer interested in machine learning and data analytics, and want to use the spark framework for large-scale application of common machine learning technologies, this book is written for you. It's a good idea to have the basics of spark, but it doesn't require you to have practical experience.
by studying this book, you will be able to:
Write your first spark program in the Scala, Java, or Python language;
Create and configure a spark development environment on your native and Amazon EC2;
get a public machine learning dataset and use Spark to load, process, clean, and transform data;
use the Spark machine learning Library to write programs using common machine learning models such as collaborative filtering, classification, regression, clustering, and dimensionality reduction;
write the Spark function to evaluate the performance of your machine learning model;
understand the processing methods of large-scale text data, including feature extraction and the input of text data as machine learning model;
Explore online learning methods and use spark streaming for online learning and model evaluation.
Content Introduction
Each chapter of the book has designed case studies, taking machine learning algorithms as the main line, and combining examples to explore the practical application of spark. The book does not have a maddening data formula, but from the preparation and correct understanding of the data began, comprehensively covers the recommendation system, regression, clustering, dimensionality reduction and other classical machine learning algorithms and their practical application.
As a translator
Nick Pentreath
Graphflow is co-founder of the company. Graphflow is a big data and machine learning company focused on user-centric referral systems and customer service intelligence technologies. Nick has a background in financial markets, machine learning and software development, worked for Goldman Sachs, then went on to work as a research scientist at Cognitive Match Limited (London), an online advertising marketing startup, and went on to lead the data science and Analysis team in Africa's largest social network, MXit. Nick is a member of the Apache Spark Project Management Committee.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Spark Machine Learning-Interactive Publishing network