Terms related to data mining (Glossary)

Source: Internet
Author: User
Terms related to data mining (Glossary)
Artificial Neural Network (Artificial Neural Networks)
A non-linear prediction model that is trained and modeled on a bio-neural network.
CART Classification and Regression Trees)
It is a data set classification decision tree technology. It provides a set of rules that can also be used for a new unclassified dataset to predict which records will have a given result. Create a two-way differentiation method to segment a dataset. Compared with CHAID technology, it requires less data preparation.
Automatic Interaction discovery (CHAID Chi Square Automatic Interaction Detection)
It is a data set classification decision tree technology. It provides a set of rules that can also be used for a new unclassified dataset to predict which records will have a given result. A dataset is segmented by creating a "multi-channel differentiation. Compared with CART technology, it requires more data preparation.
Classification)
A process that divides a dataset into mutex groups. The members in each group are as close as possible, while the members in different groups are as far away as possible ", the distance measurement is related to the specified variable you are trying to predict. For example, a typical classification problem is to group A company's database according to its credit value into "good and bad", so that they can be as consistent as possible with the actual credit.
Clustering/clustering)
A process that divides a dataset into mutex groups. The members in each group are as close as possible, while the members in different groups are as far away as possible ", the distance measurement is related to all available variables.
Data cleansing)
A process that ensures that all values in the dataset are consistent and correctly recorded.
Data mining)
The process of extracting hidden prediction information from a large database.
Data navigation)
Observe the details of different dimensions, slices, and layers of a multi-dimensional database. See Online Analytical Processing (OLAP.
Data Visualization)
The process of visualized interpretation of various complex relationships of multidimensional data.
Data Warehouse)
A database system that stores and delivers large amounts of data.
Decision tree)
Tree Structure of a series of decisions. These decisions are rules for generating classification of datasets. See cart and chaid.
Dimension)
In a flat or relational database, each field in the record represents one dimension. In a multidimensional database, one dimension is a collection of similar entities. For example, a multidimensional sales database contains products, time, and city dimensions.
Exploratory Data Analysis)
Use graphical and descriptive statistics technology to "Learn" the structure of a dataset.
Genetic Algorithm (genetic algorithms)
An optimization technique that uses methods similar to genetic combinations, variations, and natural selection in a design based on the concept of natural evolution.
Linear Model)
An analysis model that assumes that the changing factors are linear.
Non-Linear Model)
An analytical model that does not assume that the changing factors being considered are linear.
Linear Regression)
A technique used to find the most suitable linear relationship between a target variable and its prediction factor.
Logistic Regression)
A linear regression that predicts the proportions of a categorical target variable, such as type of customer, in a population.
Nearest Neighbor)
A technique that classifies each record in a dataset based on a combination of the classes of the K record (s) most similar to it in a historical dataset (where K 3 1 ). sometimes called a K-Nearest Neighbor technique.
Multi-dimensional database)
A database system designed for online analysis and processing. The structure is a multi-dimensional supercube with one dimension per axis.
OLAP on-line analytical processing)
You can refer to the array-oriented database application system, which allows users to observe, navigate, operate, and analyze multidimensional databases.
Data alert (outlier)
When the value of a data item exceeds the boundary of the values of most other items in a sample, it is called a warning item. This may indicate that the data is abnormal and needs to be carefully verified; it may carry important information.
Prediction Model)
A structure and processing process used to predict the value of a specified variable on a dataset.
Prospective Data Analysis)
Historical data analysis includes predicting future trends, behaviors, or events.
Retrospective spective Data Analysis)
Perform data analysis on what has happened. It provides insight into trends, behaviors, or events.
Rule Induction)
Extract useful "if-then" rules for statistical data.
Time Series Analysis)
Sequential Analysis of a metric by time slice. Time is usually the main dimension of data.

Trackback: http://tb.blog.csdn.net/TrackBack.aspx? Postid = 556340

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.