This course is a basic course for Big data engineers and cloud computing engineers , as well as a course that all computer professionals must master.
Without mastering data structures and algorithms, you will find it difficult to master efficient, professional processing tools, and more difficult to handle complex large data processing scenarios.
Consider the following questions:
1, social networking sites (such as Weibo, Facebook), the relationship between people is a huge amount of data, how do you study and deal with this problem?
2. What is the index function of the database? Why organize indexes using data structures such as hashes, B + trees, and heap tables?
3, why Linux virtual Memory management module, the use of red and black trees to deal with VMA search?
4. Why can search engines return search results in milliseconds?
5, how do you design the city road, to ensure that the minimum cost can be achieved throughout the city connectivity?
If you are still confused about the above questions, or if your plan is specious, then this course is for you.
In this course, you will not only answer the above questions, you can also answer:
1. Why HBase uses the bloomfilter algorithm to deal with the problem of whether the block is already cached.
2. Why the concept of tree and node is used in zookeeper to describe the dependence and coordination of distributed system.
3. Why LEVELDB uses the jump table and LSM tree structure to optimize performance.
In addition, many of the classical ideas in data structures and algorithms are well worth understanding and useful for those who have a strong interest in the computer industry.
First, the curriculum development environment
Operating system: Linux CentOS 7
Ide:intellij Idea 14
Main references: Princeton algorithm 4th edition English version, algorithm introduction 3rd edition English version
Other references: Linux kernel source code, JDK source code, wiki English station, etc.
Description Language: Java
Ii. Introduction to the contents of the course
The importance of data structure and algorithm in the field of computer science and it is self-evident.
It is not only the computer professionals should master a basic course, but also engaged in databases, data processing practitioners should be proficient in a technology.
This course is designed for big data engineers and cloud computing engineers with the following features, which are often too theoretical, practical, knowledge, and case-new in the course of data structure in universities:
1. Emphasize the application of the project, try to avoid the mathematical symbol description, but when the use of mathematical symbols to describe the more strong semantic use and do a detailed explanation.
2. A variety of data structures, highlighting the actual needs of the project, from the practice and successful use of cases (such as operating systems, databases, large database processing framework, micro-blog, etc.), to guide the use of data structures, accurate positioning of the value of data structure, and strive to enable students to the knowledge landing, apply.
3. For difficult to understand the algorithm and some extremely important ideas, such as recursion, divide and conquer the strategy, the use of PPT illustrations decomposition steps, PPT sketch explanation, pseudo-code description explanation, source code comment explanation, source code single-step debugging and tracking means, so that students can understand the algorithm, grasp the algorithm, the use of algorithms.
4. In order to ensure the professionalism of the cited knowledge and take into account the actual large data processing company's daily research and development status, the use of reference materials mainly for the international well-spoken English books, papers, senior or self-developed blog, etc., and with the Chinese interpretation, and strive to master the best possible professional knowledge.
5. The whole source code, highlighting, considering the proficiency of students may vary, so the use of the popular language in Java to describe and write codes, so that all students can read and learn.
Third, the main content of the course:
1. Data structures and Algorithms overview
2. Arrays, lists, queues, stacks and other linear tables
3. Recursive and non-recursive traversal of two tree, BST, AVL tree and binary tree
4.b+ Tree
5. Skip the table
6. Diagram, graph storage, graph traversal
7. Graph, graph, lazy and positive premium Manaus algorithm, Kruskal algorithm and MST, single source shortest path problem and Dijkstra algorithm
8. and search set and indexed priority queue, binary heap
9. Genetic algorithm preliminary and TSP problem
10. Internal sorting (direct insertion, selection, hill, heap sorting, quick-row, merge, etc.) algorithm and optimization in practice
11. External Sorting and optimization (file encoding, data encoding, I/O mode and JVM features, multithreading, multi-thread merging, etc.)
12. Hash table, Trie tree, inverted Index, distributed index Preliminary (map-reduce)
Lecturer Hao:
He has studied in Zhong Ke and CAs, and is familiar with the development, architecture, design and optimization of service-side, distributed system and big Data processing framework.
Senior Development engineer, Big Data engineer.
First, Introduction
1th: What is a data structure?
2nd: What is an algorithm?
Second, linear table
3rd: Linear tables (arrays, linked lists, queues, stacks)
4th: Linux Work queue and JDK thread pool
Three, the tree
5th: Nonlinear structure, tree, binary tree
6th: Balance tree, AVL tree
7th: B + Tree and database index
Iv. Fig.
8th: The concept and storage of graphs
9th: The Traversal of graphs
10th: Minimum Spanning tree (MST), prim algorithm, Kruskal algorithm
11th: Single source shortest path and Dijkstra algorithm
12th: Approximate solution of TSP by genetic algorithm
Five: Sort
13th: Select Sort, insert sort, hill sort
14th: Heap Sort, priority queue
15th: Quick Sorting and optimization
16th: Merging Sorting and optimization
17th: Merge sort and external sort
18th: Optimization and extension of external sorting
Six: Find
19th: Hash table, binary lookup, trie tree, Ternery tree, search engine and inverted index, centralized index and distributed index, Map-reduce preliminary
1. Mastering data structures and algorithms used in the practice of data processing
2. Train data Processing thinking
3, Training algorithm realization ability
4. Develop a vision to understand the position and value of data structure and algorithm in operating system, Internet, database, mass data processing scene
5, knowledge landing, learn to use data structure and algorithms and related knowledge to analyze practical problems, the ability to solve practical problems
6, for deep, comprehensive, solid grasp the big data processing technology lay the foundation
Big data is so capricious. First-quarter data structures and algorithms (front-line experience, authoritative information, knowledge fresh, practical, full source)