Introduction to Apache Spark Mllib

Source: Internet
Author: User
Tags lapack spark mllib

  MLlib is a distributed machine learning library built on spark that leverages Spark's in-memory computing and the benefits of iterative computing to dramatically improve performance. At the same time, because of the rich expressive force of Spark operator, the algorithm development of large-scale machine learning is no longer complex.
MLlib is the implementation of some commonly used machine learning algorithms and libraries on the spark platform. MLlib is the underlying component of Amplab's mlbase machine learning project.

   Mlbase is a machine learning platform , see http://www.cnblogs.com/zlslch/p/5726445.html in detail

MLI is an interface layer that provides many constructs, and MLlib is the underlying algorithm implementation layer, shown in 1.
  

Figure 1 Mlbase

The MLlib includes classification and regression, clustering, collaborative filtering, data reduction components, and the underlying optimization library, as shown in 2.

                  Figure 2 MLlib component diagram

With Figure 2, we can have a macro grasp of MLlib's overall components and dependent libraries.
  

A brief introduction to the underlying components:
  blas/lapack layer : LAPACK is written in Fortran algorithm library, as the name implies, Linearalgebra package, is to solve the general problem of linear algebra. In addition, the algorithm package that must be mentioned is BLAS (Basic Linear Algebra subprograms), in fact LAPACK the bottom is used BLAS library. Many computer vendors have provided blas/lapack algorithm packages optimized for different processors.
  Netlib-java(official website: https://github.com/fommil/netlib-java/) is a Java interface layer for the underlying blas,lapack encapsulation.
  Breeze(official website: https://github.com/scalanlp/breeze) is a Scala-written numerical processing library that provides APIs such as vectors, matrix operations, and so on.

Library Dependency : The MLlib is used in Scala's linear algebra library Breeze, Breeze the underlying Netlib-java library. Netlib-java is dependent on the native Fortran routines. Therefore, when the user needs to use
Pre-Install the Gfortran Runtime Library on the node (download address: https://github.com/mikiobraun/jblas/wiki/Missing-Libraries). Due to the license (license) issue, the official MLlib relies on concentration without
Introduce the dependency of the Netlib-java native repository. If the runtime environment does not have a native library available, the user will see a warning message. If you need to use Netlib-java libraries in your program, you will need to introduce com.github.fommil.netlib:all:1.1.2 dependencies or reference guides to your project (URL: https://github.com/fommil/ Netlib-java/blob/master/readme.md#machine-optimised-system-libraries) to build the user's own project. If the user needs to use the Python interface, it requires a 1.4 or later version of NumPy (note: MLlib source notes Experimental/developerapi API may be adjusted and changed in the future release, the official will be published in different versions Provide migration Guide).

Introduction to Apache Spark Mllib

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.