Play Big data: Big Data mining Technology (Apriori algorithm, Tanagra tool, decision tree)

Last Update:2016-04-12 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. What is the course (comprehensive introduction)

1.1. Background of the course"Big Data" as the most hot IT industry vocabulary, followed by the Data Warehouse, data analysis, data mining, etc. around the use of big data business value has gradually become the industry's interest in the profit focus. "Big Data" is actually not far from our life, big to micro-Bo's massive user information, small to a Community supermarket monthly sales list, all contain a lot of potential business value. It is because of the rapid growth of data volume, and has far exceeded the ability of people to analyze data. Therefore, the scientific, commercial and other fields urgently need intelligent, automated data analysis tools. In this context, the application of data mining technology, making the analysis of massive data becomes a breeze. 1.2. Introduction to the course contentThis course is called data mining technology. The so-called "in-depth" refers to the principle of data mining and the classical algorithm to start. One is to understand the algorithm, know what the scenario should be used, and the other is to learn the classical idea of the algorithm, you can apply it to other practical projects, and the third is to understand the algorithm, so that the data mining algorithm can be applied to your project development. The so-called "shallow out" refers to the application of data mining algorithm to the actual application. The course will explain the application of the algorithm in three different ways: one is the data mining implemented by Microsoft's SQL Server and Excel tools, and the other is the data mining of famous open source algorithm, such as Weka, Knima, Tanagra and other open source tools; third, using C # The implementation of data mining algorithm is done by language demonstration. According to the actual reference scenario, data mining technology is usually divided into three categories: classifier, correlation analysis, clustering algorithm and so on. This course mainly introduces the classical ideas of these three algorithms and some famous realization forms, and explains the specific application methods by means of some business analysis tools, open source tools, or programming. 1.3. Course Outline 1) Data Mining Overview and dataThis chapter explains the origins, application scenarios and basic processing methods of data mining technology, and explains the basic concepts of data sets and data. p) Visualization and multidimensional Data Analysis (Practice lesson) This chapter explains the basic methods of data visualization, and demonstrates the visual processing of multidimensional data for Excel PivotTables and SQL Server Analysis Service, respectively. (OLAP) 3) classifiers and decision Trees This chapter explains the basic concepts and application methods of classifiers, and concretely analyzes the implementation method of one of the classical algorithms of classifier decision tree. 4) other classifiers (top) This chapter explains two other classic classifier algorithms: rule-based classifiers and distance-based classifiers. 5) other classifiers (bottom) This chapter explains some other common classifier algorithms, such as the improved algorithm based on distance classifier, Bayesian classifier, artificial neural network, support vector machine and combinatorial method. 6) Application of Decision Tree (practice lesson) This chapter demonstrates data mining processing using open source tools such as Weka Explorer, Knime, and Tanagra. Several kinds of data mining algorithms are compared in the demonstration, such as cart decision tree, C4.5 decision tree, Knime decision tree, simple Bayesian classification, combination method (bagging), artificial neural network, rule based classification, etc. 7) Correlation Analysis This chapter explains the common Algorithms of association analysis, namely the Apriori algorithm and the FP growth algorithm. 8) Shopping Cart Data Analysis (Practice lesson) This chapter mainly demonstrates the use of Microsoft's solution to carry out the analysis of the shopping cart data, including the SQL Service Analysis Service of the Association and Excel combined with the SSAS add-in and other methods. Finally, the Weka Knowledgeflow tool is used to analyze the correlation in order to compare the practice of the sixth chapter. 9) Clustering Algorithm This chapter explains the basic principle and common algorithm of clustering algorithm, including K-mean algorithm, hierarchical clustering, density-based clustering algorithm. 10) Clustering Algorithm C # source code implementation (practice lesson) This chapter demonstrates how to implement a clustering algorithm through C # source code. 1.5. Introduction of InstructorsAllen: The top 500 companies in the world have 2 years experience in cloud computing, have many years of development experience, excel in SQL Server database and have some research on data processing, and also have many experience in the development of Web pages and desktop applications such as C + +, C #, jquery. Second, why do you need such a set of courses? 2.1. What do enterprises need? Data mining is a relatively new technology, and the demand for data mining is not fully excavated. In this case, we still see a lot of enterprises have strong demand for such new technology. (Note: The following corporate demand positions are from 51job.) position 1, database engineer position 2, software Development Engineer Position 3, market analyst more corporate recruitment information please refer to: www.51job.com 2.2. Course Learning objectives (what do we offer?) )Goal one. Allows learners to understand and understand the key data mining techniques. Target two. Allows students to quickly grasp the various types of data mining technology application scenarios. Target three. Allows learners to quickly master how common data mining tools are used. Target four. Allows students with a foundation to get started with the code for data mining. 2.3. Course featuresCharacteristics One, the instructor emphasizes in layman's knowledge, from the theory, the principle appears but will return to the actual application. It also takes care of students who want to improve their understanding and knowledge, as well as those who focus on practical applications. Characteristics of the second, practical application of each has a focus, will be from several different styles of software or tools to start to demonstrate. such as the data mining products of mainstream software companies (Microsoft's SQL Service Analysis Service), open source software and tools (Weka, Knime, Tanagra), and mining algorithms through C # code implementation. Take full care of the learner's preferences for application software. Characteristics three, the process of learning the combination of theory and practice, case data have a certain representativeness. And the course provides all the case data for students to modify and debug, in order to consolidate the deepening of learning effect 2.4. Course HighlightsHighlight one, data mining itself is the forefront of technology, Chinese textbooks, the number of courses is very small, such courses are not common in China. Highlights second, the combination of theory and actual combat, in layman's light. That is to take care of the basic students, but also to take care of some experienced students, that is to explain the meticulous, but also sharply, the technology is not vague. Highlights third, the implementation of the code is a line of hand-typed, hands-on step by step to lead students from the beginner to proficient. Highlights four, the practice demonstration involves the software, the tool number is numerous, takes care of the different use habit the student. Highlights five, the whole course is short, but "small, five Zang full". The process is compact and informative. 3. The course is really good, can I learn it? This course will involve many algorithms for data mining. In order to understand the algorithm better, it is suggested that learners can have some basic algorithms. In addition to the application Practices of SQL Service Analysis service, most demos avoid using databases to import data, so there is no specific requirement for knowledge of the database. If the learner wants to understand the code implementation in the last chapter, basic knowledge of C # is required. 1.learn basic algorithms in Java implementation, recommend you learnhttp://www. ibeifeng.com/goods.php?id=329 2,learn basic algorithms in C # implementation, recommend you learnhttp://www. ibeifeng.com/goods.php?id=69 3,Learn the basics of C # and recommend you learnhttp://www. Ibeifeng.com/goods.php?id=7 4. How can I learn how to learn this course and give some advice. 4.1, the timing of the arrangement suggested that the course a total of 10, because the content is relatively compact, it is recommended to speak every day, in-depth understanding of the course content. 4.2, the study request is recommended to follow the course progress carefully watch the study, and use the test data of the course in the corresponding software or tools to practice their own. (Excel, SQL Server and Visual Studio are not available for copyright reasons, but other open source software is available) if you have the basics, it is recommended that you learn the algorithm after you try to implement the algorithm using code, and learn extrapolate 4.3, instructor advice (instructor to learn Recommendation: 1. It is best to watch the video, throw away the video, think carefully about the principles and ideas of each algorithm; If the memory is not deep, you can look back to see the video, so repeated, to achieve the purpose of real understanding and mastery. 2. For the project to combat the part, you must do it yourself, do not listen to the end. 3. A lot of knowledge in the open source community has different views, to learn to use search engines, more around the relevant community. 4. Finally, I wish you to learn something. 5. What can I do after finishing this course? After learning the course, try to ask yourself a few questions: 1.do you have untapped data in your life and work? 2.which data mining models are not yet developed to match? 3.can you try to use data mining methods to discover some of the underlying laws? This course is designed for data, data analysis, and mining directions that can help you learn the idea of data mining, and is not limited to a specific technical expertise. After mastering this technology, you will be able to make your own business data analysis methods and capabilities to a higher level. 6. Student FAQs: Frequently asked questions: What software do you use to learn this tutorial? Does the software come with tutorials, not provided, software, and what version do they use? Instructor Answer: The software of this course is divided into two parts: The first section is Microsoft Office Excel, SQL Server analysis Service (SSAS), and Visual Studio. Visual Studio is primarily used to demonstrate a C # implementation of a hierarchical clustering algorithm and is not limited to a specific version. The required versions of Excel combined with SQL Server are as follows: &NBSP;EXCEL 2007 with SQL Server 2005 & nbsp excel 2007 with SQL Server 2008 excel 2 010 with SQL Server 2012 The second part is a Open source software and tools, and a link to download the tool will be available in the course. FAQ Two: What is the basis for this course? Instructor Answer: Suitable for students interested in data analysis. It is suggested to have some knowledge about basic algorithm, database and so on. Open source software and tool interface for English, mainly word-based, no special requirements for English. Frequently asked Questions three: Where is this technology generally available? Instructor Response: This data is a technical point in the field of data analysis,is not a specific tool. First of all, the idea of data analysis and mining can be borrowed from any data analysis scenario in life or work, and secondly, the methods of analysis and mining can be widely used in market positioning, customer relationship analysis, project development and other fields. Mastering it will allow you to control the data more handy, the former (money) path Unlimited. Frequently asked Questions four: What are the jobs that can be done in this technology? lecturer: More and more companies are beginning to enter the big data trend. The main positions are data analysis specialists and data mining engineers, but many jobs include data mining as a plus for talent selection. It is believed that the importance of data mining in the future will increase with the development of Internet and cloud computing. can be engaged in a number of positions: such as: Database engineer, software Development Engineer, market analysis specialist

Play Big Data: Big Data mining Technology (Apriori algorithm, Tanagra tool, decision tree)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More