The previous article introduced the open source data mining software Weka to do Association rules mining, Weka convenient and practical, but can not handle large data sets, because the memory is not fit, give it more time is useless, so need to carry out distributed computing, Mahout is a based on Hadoop Cloth
With the intensification of market competition, China Telecom is facing more and more pressure, customer churn is also increasing. From the statistics, the number of fixed-line PHS this year has exceeded the number of accounts. In the face of such a grim market, the urgent task is to make every effort to reduce the loss of customers. Therefore, it is necessary to establish a set of models that can predict customer churn rate in time by using data
1 What is data mining?
The most commonly accepted definition of "Data Mining" is the discovery"Models" for Data.
1.1 statistical modeling
Statisticians were the first to use the term "data min
Purpose of collecting web logsWeb log mining refers to the use of data mining technology, the site user access to the Web server process generated by the log data analysis and processing, so as to discover the Web users access patterns and interests, such information on the site construction potentially useful and unde
Spatial Data
Multimedia Data
For example, image data
Description-based retrieval system: keywords, titles, dimensions, etc.
Content-based retrieval system: color composition, texture, shape, object and wavelet transformation.
Time series data and sequence data
Trend Analysis
I plan to organize the basic concepts and algorithms of data mining, including association rules Mining, classification, clustering of common algorithms, please look forward to. Today we are talking about the most basic knowledge of association rule mining.
Association rules minin
In various data mining algorithms, association rule mining is an important one, especially influenced by basket analysis. association rules are applied to many real businesses, this article makes a small Summary of association rule mining. First, like clustering algorithms, association rule
Tags: using SP data, BS, users, technical objects, different methods
First:
Data type,
Different attributes of an object are described by different data types, such as age --> int; birthday --> date. Different types of data mining must be treated differently.
Second:
rule algorithm---AprioriFirst introduce a few professional nounsMining Datasets: The collection of data to be mined. That's a good understanding.Frequent patterns: Patterns that occur frequently in mining datasets, such as itemsets, sub-structures, sub-sequences, and so on. This is how to understand, in short, mining data
I. Concepts
Association Rule Mining: discovering interesting and frequent patterns, associations, and correlations between item sets of a large amount of data, such as the food database and relational database.
Measurement of the degree of interest of association rules:Support,Confidence
K-item set: a set of K items
Frequency of the item set: number of transactions that contain the item set
Frequent Item Se
transaction by user shell+ip+ hostname according to different user's login (all three are the same user) Based on this, the basic principle of mining 2 algorithm for user input command sequence frequent pattern is realized.
The fp-growth algorithm mainly solves the collection of frequent items where the number of occurrences reaches a certain threshold in multiple sets. A FP tree is a compressed representation of input
Several basic concepts and two basic algorithms for association rules are described in the previous few. But actually in the commercial application, the writing algorithm is less than, understands the data, grasps the data, uses the tool to be important, the preceding basic article is to the algorithm understanding, this article will introduce the open source utilizes the
What is the difference between data Mining (mining), machine learning (learning), and artificial intelligence (AI)? What is the relationship between data science and business Analytics?
Originally I thought there was no need to explain the problem, in the End data
only 1. So the count of conditional pattern bases is determined by the minimum count of nodes in the path.Depending on the conditional pattern base, we can get the conditional FP tree for that commodity, for example i5:According to the conditions of the FP tree, we can do a full array of combinations, to get the frequent patterns excavated (here to the commodity itself, such as i5 also counted in, each commodity mining out of the frequent pattern mus
First talk about the problem, do not know that everyone has such experience, anyway, I often met.Example 1, some websites send e-mails to me every few days, each e-mail content is something I do not interest at all, I am not very disturbed, to its abhorrence.Example 2, add a feature of a MSN robot, a few times a day suddenly pop out a window, recommend a bunch of things I don't want to know, annoying ah, I had to stop you.Every audience just want to see what he is interested in, rather than some
courses in the field of Java technology. Primarily Java-related technologies: Struts, Sping, Hibernate, Oracle, SQL Server, Hadoop, Memcache, Html, JavaScript, ActiveMQ.1. Deep mining of Big data2. Big Data storage3. Big Data Processing Solution4. Pure Distributed database: Cassandra5. The combination of cloud computing and database technology6. HDFS7, GANGLIA8.
Theory and method of data-spatial data mining technology
Gejoco
(Information Institute of Southwest Agricultural University 400716)
This paper briefly discusses the theory and characteristics of spatial database technology and spatial data mining technology, this paper a
Ipython is a python interactive shellAnaconda, packaged toolbox, type Eclipse becomes j2ee,android, can be installed on its own, or it can be the next ready versionSymPy Powerful Symbolic Data toolBased on the NumPy library, scipy function library adds many library functions which are commonly used in mathematics, science and engineering calculation. Examples include linear algebra, numerical solutions for ordinary differential equations, signal proce
With the advent of the big data age, the importance of data mining becomes apparent, and several simple data mining algorithms, as the lowest tier, are now being used to make a brief summary of the Microsoft Data Case Library.Appl
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.