The measure of classification of "pattern Recognition and machine learning"--4.1 patterns

Source: Internet
Author: User

Feature Selection and extraction

Feature selection and extraction is a key problem in pattern recognition before discussing the design of classifier, it has been assumed that a set of samples is given, in which each dimension of each sample is a feature of the sample, and the selection of these features is very important, which strongly affects the design and performance of the classifier. If the different categories, these characteristics of a large difference, it is relatively easy to design a better performance of the classifier.

Feature selection and extraction is an important topic in constructing pattern recognition system. In many practical problems, it is often difficult to find the most important characteristics, or subject to the constraints of objective conditions, they can not be effectively measured, so in the measurement, because of people's psychological role, as long as the conditions permit always want to get more features; , because of the objective needs, in order to highlight some useful information, suppress useless information, and intentionally add some combination of ratios, exponents or logarithms to calculate the characteristics, if the number of measured values are not analyzed, all directly used as a classification feature, not only time-consuming, but also affect the effect of classification, resulting in a "feature dimension disaster" problem.

In order to design a classifier with good effect, it is necessary to analyze the original set of measured values, select or transform the processing, make up an effective recognition feature, and under the premise of guaranteeing certain classification accuracy, reduce the feature dimension, i.e. "Dimension reduction" processing, so that the classifier can realize fast, accurate and efficient classification. In order to achieve the above purposes, the key is to provide the identification characteristics should have a good classification, so that the classifier easy to distinguish. To do this, you need to select the feature. Ambiguous and difficult to distinguish features should be removed, and the features provided should not be duplicated, that is, those features that are strongly correlated and do not add more categorical information are removed.

Description

  In fact, the task of feature selection and extraction should be carried out before the design of the classifier, and it is more helpful to understand the problem by describing the feature selection and extraction after discussing the classifier design from the common pattern recognition teaching experience.

Feature Selection:

  It is from the n measure set {x1, x2,..., xn}, according to a criterion to select a subset for classification, as a descending dimension (m-dimensional, m<n) classification characteristics; so-called feature extraction, which is to make (x1, x2,..., xn) through some transformation, produce m features (y1, y2,..., ym) (m <n), as a new classification feature (or two-times feature), is designed to reduce the dimensionality of the feature space and to achieve an effective classification in order to preserve the identification information as much as possible.

Take cell auto-recognition as an example:

A batch of images, including normal cells and abnormal cells, can be obtained by image input, and our task is to distinguish which cells are normal and which cells are abnormal according to these images, and first identify a group of features that can represent the nature of the cell, for which the total density of cells in the cell nucleus is more than the cell-shaped nucleus.

So the original characteristics can be many (dozens of or even hundreds of), or the original feature space dimension is very high, need to reduce (or compress) dimension to classify; one way is to select some of the most representative features from the original features, called Feature selection; The other way is to transform the original feature into fewer features, called feature extraction, using a mapping (or transformation) method.

4.1 Measure of the classification of the pattern (1) distance and scatter matrix

Knowledge Reserve:

The measure of classification of "pattern Recognition and machine learning"--4.1 patterns

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.