Machine learning processes, conventional algorithms, dimensionality reduction methods

Source: Internet
Author: User

1 Scenario Resolution: A. Data exploration (size of data, missing or garbled data, ETL operation, field type, whether or not the target queue is included)

B. Scene abstraction (it is through the existing data, to dig out the business scenarios can be applied.) Machine learning is primarily used to address scenarios including two classification, multi-classification, clustering, and regression.

C. Algorithm selection (is to determine the algorithm range, multi-algorithm attempts and multi-view analysis to find the most suitable for their own business algorithm)

2 Data preprocessing: sampling, de-noising, normalization (0,1) and data filtering, data mining as a dish, data preprocessing is the process of selecting and cleaning vegetables, this step does not do well will affect the taste of the whole dish.

3. Feature Engineering: Feature abstraction (abstraction of source data into data that can be understood by algorithms), feature importance assessment, feature derivation (feature-derived approach to mining more valuable features) and feature dimensionality reduction (principal component analysis). PCA maps high-dimensional data to low-dimensional space by linear mapping projection, and linear discriminant analysis Lda.

Timestamp, two-valued class problem, multi-valued ordered class problem, multi-value unordered class problem (information castration), multi-value unordered class problem (One-hot encoding), text type, image or speech data (first transform image or speech into matrix structure).

4. Model building, evaluation, tuning
5. Results Output and analysis

General algorithm

Deep learing

The inverse propagation algorithm, also known as BP algorithm (backpropagation algorithm), is the core idea of the supervised Learning algorithm algorithm, which is the chain rule of derivation. BP algorithm is often used to solve the optimization problem in neural networks, which is different from the optimal solution of shallow-layer algorithm, and the BP algorithm can calculate the gradient of each layer iteration by the chain law.

The core idea of automatic coding (Autoencoder) is to generate a function f by training, so that f (x) is approximately equal to X, that is, to get a function that makes the input and output as equal as possible.

There is a systematic study of machine learning algorithms and the common structure of deep learning. Common algorithms are as follows:

Machine Learning Algorithms:

Classification algorithm: KNN,NB,LR,RF,SVM, etc.

Clustering algorithm: K-means,dbscan

Regression algorithm: Linear regression

Text Analysis algorithm: Word segmentation algorithm hmm, keyword extraction algorithm TF-IDF, subject model LDA

Recommended class algorithm: Collaborative filtering CF (UCF/ICF)

Graph algorithm: Label propagation, Shortest path

Commonly used dimensionality reduction methods: To ensure the independence of the vector, reduce the correlation to reduce the amount of computational noise, the results of meaningless or less meaningful fields removed, reduce unnecessary interference. Deep learning common structure: Deep neural network DNN convolutional Neural network CNN (convolution, down sampling, full connection), mainly to the spatial data processing, input layer format unified. Cyclic neural Network (RNN) is commonly used to solve the problem of timing behavior. The input layer format can be non-uniform.

Machine learning processes, conventional algorithms, dimensionality reduction methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.