Machine learning processes, conventional algorithms, dimensionality reduction methods

Last Update:2018-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Scenario Resolution: A. Data exploration (size of data, missing or garbled data, ETL operation, field type, whether or not the target queue is included)

B. Scene abstraction (it is through the existing data, to dig out the business scenarios can be applied.) Machine learning is primarily used to address scenarios including two classification, multi-classification, clustering, and regression.

C. Algorithm selection (is to determine the algorithm range, multi-algorithm attempts and multi-view analysis to find the most suitable for their own business algorithm)

2 Data preprocessing: sampling, de-noising, normalization (0,1) and data filtering, data mining as a dish, data preprocessing is the process of selecting and cleaning vegetables, this step does not do well will affect the taste of the whole dish.

3. Feature Engineering: Feature abstraction (abstraction of source data into data that can be understood by algorithms), feature importance assessment, feature derivation (feature-derived approach to mining more valuable features) and feature dimensionality reduction (principal component analysis). PCA maps high-dimensional data to low-dimensional space by linear mapping projection, and linear discriminant analysis Lda.

Timestamp, two-valued class problem, multi-valued ordered class problem, multi-value unordered class problem (information castration), multi-value unordered class problem (One-hot encoding), text type, image or speech data (first transform image or speech into matrix structure).

4. Model building, evaluation, tuning
5. Results Output and analysis

General algorithm

Deep learing

The inverse propagation algorithm, also known as BP algorithm (backpropagation algorithm), is the core idea of the supervised Learning algorithm algorithm, which is the chain rule of derivation. BP algorithm is often used to solve the optimization problem in neural networks, which is different from the optimal solution of shallow-layer algorithm, and the BP algorithm can calculate the gradient of each layer iteration by the chain law.

The core idea of automatic coding (Autoencoder) is to generate a function f by training, so that f (x) is approximately equal to X, that is, to get a function that makes the input and output as equal as possible.

There is a systematic study of machine learning algorithms and the common structure of deep learning. Common algorithms are as follows:

Machine Learning Algorithms:

Classification algorithm: KNN,NB,LR,RF,SVM, etc.

Clustering algorithm: K-means,dbscan

Regression algorithm: Linear regression

Text Analysis algorithm: Word segmentation algorithm hmm, keyword extraction algorithm TF-IDF, subject model LDA

Recommended class algorithm: Collaborative filtering CF (UCF/ICF)

Graph algorithm: Label propagation, Shortest path

Commonly used dimensionality reduction methods: To ensure the independence of the vector, reduce the correlation to reduce the amount of computational noise, the results of meaningless or less meaningful fields removed, reduce unnecessary interference. Deep learning common structure: Deep neural network DNN convolutional Neural network CNN (convolution, down sampling, full connection), mainly to the spatial data processing, input layer format unified. Cyclic neural Network (RNN) is commonly used to solve the problem of timing behavior. The input layer format can be non-uniform.

Machine learning processes, conventional algorithms, dimensionality reduction methods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning processes, conventional algorithms, dimensionality reduction methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning processes, conventional algorithms, dimensionality reduction methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support