11 Open Source machine learning project worth Mark

Source: Internet
Author: User
Keywords Large data machine learning open source Hadoop shogun golearn h2o mllib
Tags .net analysis apache data developers framework free software hadoop

Spam filtering, face recognition, recommendation engine-when you have a large dataset and want to use them to perform predictive analysis and pattern recognition, machine learning is the only way. In this science, computers can learn, analyze and manipulate data independently without prior planning, and more and more developers are now concerned with machine learning.

The rise of machine learning technology is also important not only because hardware costs are getting cheaper and more powerful, but free software surges that machine learning is easily deployed on stand-alone or large-scale clusters The diversity of machine learning libraries means that whatever language or environment you like, it's possible to get what you like.

1. Scikit-learn


Python has become the preferred programming language for mathematics, natural science and statistics because of its ease of use and rich library of functions. Scikit-learn by building--numpy scipy and matplotlib--on existing Python packages for math and science. The resulting library can either use an interactive workbench application or be embedded in other software and reuse. The toolbox can be obtained under BSD license, so it is completely open and reusable.

Project:scikit-learn
Github:https://github.com/scikit-learn/scikit-learn

2. Shogun


In the oldest and most respectable machine learning library, Shogun was created in 1999, written in C + +, but not limited to C + + work. Because of the Swig library, Shogun can be easily used in Java, Python, C #, Ruby, R, Lua, Octave, Matlab languages and environments.

Although respected, Shogun has other competitors. Another C + + machine learning Library, Mlpack, appeared in 2011, claiming to be faster and easier to use than other competitors (a more complete set of APIs).

Project:shogun
Github:https://github.com/shogun-toolbox/shogun

3. Accord framework/aforge.net


Accord, one. NET machine learning and signal processing framework, is similar to an earlier project Aforge.net expansion. By the way, signal processing here refers to a series of machine learning algorithms for images and audio, such as seamless stitching of pictures or the execution of face detection. Contains a set of visual processing algorithms, it acts on the image stream (such as video), and can be used to achieve the tracking of moving objects and other functions. Accord also provides a common library of machine learning from neural networks to decision tree systems.

Project:accord framework/aforge.net
github:https://github.com/accord-net/framework/

4. Mahout


The Mahout framework has always been associated with Hadoop, but many of its algorithms can be run out of Hadoop. They are useful for projects that might eventually migrate to Hadoop or to detach from Hadoop into stand-alone applications.

One of the drawbacks of Mahout: There are currently few algorithms that support high-performance spark frameworks, instead using increasingly outdated mapreduce frameworks. The project currently does not accept mapreduce based algorithms, and developers who want to achieve higher performance instead use Mllib instead.

Project:mahout

5. Mllib


Apache's own spark and Hadoop machine Learning Library, designed for large-scale and high-speed mllib, claims to have all the common algorithms and useful data types. Like any Hadoop project, Java is the basic language on Mllib, but Python users can use Mllib NumPy library connections (also for Scikit-learn), and Scala users can write code for mllib. If you cannot set up a Hadoop cluster, mllib can be deployed on spark without Hadoop-as well as on EC2 or Mesos.

Project:mllib

The National large Data Innovation Project selection activity is now in full swing, details click here.

The 2014 China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Hotel, New Yunnan, December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. There are a few discount tickets for the current ticket purchase.

Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!

CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.