Machine Learning Classification

Source: Internet
Author: User

From a macro perspective, machine learning can be categorized from different angles.

    • Whether to train under human intervention/supervision. (Supervised,unsupervised,semisupervised and reinforcement learning)
    • Is it possible to learn incrementally (online learning, bulk learning)
    • Whether to compare the new data with the known data, or to find some rules in the training data to build a predictive model (instance-based, model-based learning).

The above classifications are not mutually exclusive. Here's a detailed description of each machine learning category.

Supervised/unsupervised Learning

As parents supervise their children, different parents have different means of supervision, some are inseparable, and some are stocked. This is divided into four categories supervised, unsupervised, semisupervised and reinforcement learning.

Supervised learning

In supervised learning, the dataset you provide to the algorithm (trainng data) already contains the expected results (Labels).

A typical supervised learning task is the classification (classfication). such as spam filtering, the training set contains the message itself and their labels (classification, whether spam), the training of the model will be the new message classification, to adopt a filtering strategy.

Another typical task is to predict a target value, such as predicting a car's price by providing an associated set of feature/attribute (such as kilometer, brand, and factory date). These feature are called predictors (predictors). This task is called regression. In order to train the system, we need to provide a large amount of data containing these predictors and tags (car price).

Some regression algorithms can also be used to do classfication, and vice versa. For example, we can use a logistic regression algorithm to give a value that represents the likelihood of belonging to a category (20% probability is spam)

The following are some commonly used supervised learning algorithms:

    1. K-nearest Neighbors
    2. Linear Regression
    3. Logistic Regression
    4. Support Vector machines (SVMs)
    5. Decision Trees and Random forests
    6. Neural networks
Unsupervised learning

It is easy to guess that unsupervised learning does not need to label data (labels), which can be achieved without parental supervision of autonomous learning. The following is a common unsupervised learning algorithm.

    • Clustering
      • K-means
      • Hierarchical Cluster Analysis (HCA)
      • Expectation maximization
    • Visualization and dimensionality reduction
      • Principal Component Analysis (PCA)
      • Kernel PCA
      • Locally-linear Embedding (LLE)
      • t-distributed Stochastic Neighbor Embedding (T-sne)
    • Association Rule Learning
      • Apriori
      • Eclat

For example, you have a lot of data about blog visitors, you can use clustering algorithms to detect similar behavior of visitors. You don't need to tell the algorithm how to group, and the algorithm will automatically find some associations. For example, it will find that 40% of visitors are male, they like comic books and are mostly reading your blog at night, while 20% are young science fiction enthusiasts who often visit blogs on weekends, and so on. If you use the Hierachical clustering algorithm, it will continue to divide each group again into more detailed groups. This helps your blog to target different groups of users.

For the visualization algorithm, although the input of seemingly complex non-tagged data, but it can be based on the data to draw the corresponding 2D or 3D graphics. These algorithms try to preserve more type structures and avoid overlapping of independent clusters, and we can understand how the data is organized and perhaps discover the potential unknown laws.

dimensionality reduction is used to simplify data (without losing a lot of information). One use is to combine multiple associated feature into one. For example, the car's mileage and its age are closely related, so the dimensionality reduction algorithm will combine the two as a feature to reflect the loss of the vehicle. This is called feature extraction (feature extraction). Feature extraction is a useful technique that makes subsequent machine learning algorithms run faster (because data consumes less disk and memory), and in some cases, more accurate.

There is also an unsupervised learning anomaly detection, such as monitoring abnormal credit card transactions to prevent fraud, capturing defects in manufacturing, and automatically removing outliers (outliers, extreme values, outliers) from the data set. By training the model with normal data and then applying it to the new data, it can tell us whether the new data is normal.

We're going to say the last unsupervised learning is the association rule Learning, whose goal is to dig a lot of data from it to discover the relationships hidden between attributes. For example, the check-in rules are applied to the sales log data of large supermarkets, perhaps to discover a buying habit: people who buy barbecue sauces and chips also tend to buy steaks. As a result, merchants can place these items close to each other.

Cond.....

Machine Learning Classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.