Data structure and algorithms

Last Update:2017-05-29 Source: Internet

Author: User

Tags alphabetic character benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the content of the research of data structure:

1 Logical Structure:

A, linear structure: there is a one-to-one linear relationship between the data elements in the structure;

B, tree structure: There is a one-to-many hierarchical relationship between data elements;

C, Graph structure: There are many-to-many arbitrary relationships between data elements.

2 extension and basic algorithm of data structure:

A, string: short name of the string;

B, array: An array is a data type, which is a sequential storage structure;

C, find: Data structure to be combined with the algorithm is meaningful, the search algorithm is the application of data structure in the algorithm, in the real life is often used to find;

3 Storage structure:

A storage structure (physical structure) is a storage image of a logical structure in a computer, an implementation (storage representation) of a logical structure in a computer, which contains representations of data elements and representations of relationships.

The logical structure is related to the storage structure: the image that stores the structure logical structure and the image of the element itself. The logical structure is abstract, and the storage structure is implemented, both of which combine to establish the structural relationship between the data elements.

The storage structure generally has two methods: Sequential storage and linked list storage.

4 Set of operations:

The data structure is discussed in order to achieve the desired operation in the computer, a set of operations applied to the data element constitutes the set of operations of the dataset, so the set of operations is an important part of the structure.

Second, the basic concept of data structure:

1. Data

Data is a set of values that describe objective things, and all the other symbols that can be entered into a computer and can be processed by a computer. In short, data is the information stored in the computer.

2. Data elements and data items

The data element is the basic unit of data composition, which is the individual of the data collection, which is usually considered and processed as a whole in the computer. A data element can consist of one or more data items, which are the smallest units with independent meanings (no re-segmentation). As each student's information is a data item, it contains a number of numbers, names and other data items.

3. Data Objects

A data object is a collection of data elements of the same nature and is a subset of the data. For example: An integer data object is a collection of n={0,1,2,...}, the alphabetic character data object is set c={' A ', ' B ', ' C ',..., ' Z '}.

4. Data type

A data type is a collection of values of the same nature as well as a generic term that defines a set of operations on that value collection. The value collection determines the range of values for this type, and the operations collection determines the set of operations allowed in that type.

5. Abstract data type

Abstract data types are data types that are based on a class of logical relationships. The definition of an abstract data type depends on the existence of a set of logical attributes, regardless of how it is represented and implemented within the computer.

Three, algorithm and its description

1, the concept and characteristics of the algorithm

A, there is poor: an algorithm should be able to end after a number of steps, and each step is completed in a limited period of time;

B, certainty: Each step of the algorithm must have the exact meaning, can not produce two semantics;

C, Feasibility: Each step of the algorithm should be able to effectively execute, and to obtain a definite result;

D, input: refers to the data obtained from the outside world when the algorithm executes. An algorithm can have 0 or more inputs.

E, output: An algorithm has one or more outputs, the output of the algorithm after the data processing results. The algorithm without output is meaningless.

2, the requirements of algorithm design

A, correctness: the algorithm for all legitimate input data can be satisfied with the results of the requirements;

B, readability: The readability of the algorithm refers to the difficulty of the algorithm reading comprehension, the readability of the algorithm is easy to communicate, is conducive to the debugging and modification of the algorithm;

C, robustness: for illegal input data, the algorithm can give the corresponding response rather than produce unpredictable consequences;

D, efficiency and low storage requirements: Efficiency refers to the execution time of the algorithm. For multiple algorithms to solve the same problem, the algorithm with short execution time is efficient. Storage requirements refer to the maximum amount of memory required during the execution of the algorithm. The smaller the storage demand, the higher the efficiency of the algorithm.

3, the analysis of the algorithm

A, the time complexity of the algorithm: the execution time of an algorithm is roughly equal to the sum of all statement execution times.

b, the spatial complexity of the algorithm: the use of spatial complexity as a measure of the storage space required by the algorithm.

Four, linear table

1, definition: linear representation of a finite sequence containing n nodes, expressed as: (A1,a2,a3...,an);

2, Characteristics: Linear table has two major characteristics of uniformity and order: for uniformity, each data element of the same linear table must have the same data type and length, and the order is reflected in the position of each data element in the linear table only depends on their ordinal, the relative position of the data element is linear.

A non-null linear table has the following characteristics:

A, there is only one starting node A1, there is no direct pre-trend, and only a direct successor A2;

b, there is only one terminal junction an, there is no direct successor, and only a direct forward an-1;

C, the rest of the internal nodes have cut only a direct forward and a direct successor.

3. Basic operation

A, the length of the table: the number of elements in the linear table;

B, traverse: Scan (read) each element of the table from left to right (or reverse);

C, search by number: Find the first element in the table;

D, search by feature: Find a linear table by a specific value;

E, insert: Insert a new element in the first place (before the first element i);

F, Delete: Delete the first element in the original table;

G, sort: Rearrange the elements in a table by ascending (descending) The order of the element's characteristic values.

V. Stacks and queues

Stacks and queues are special forms of linear tables.

1, Stack: Stack is a special linear table, and only one end of the table allows insertion and deletion, this is the concept of the stack. Allows the insertion of one end called the top of the stack, the other end is called the bottom of the stack, the element inserted into the top of the stack is called "the stack", said delete the top element of the operation called "Out of the stack." The stack is also known as a LIFO table (LIFO). Can analogy examinee hand in the examination paper, after handed the examination paper teacher will be preferred;

Basic operation of the stack:

A, stack initialization: The stack is empty stack;

B, the judgment is null: if the stack is empty, then returns True, otherwise returns false;

C, the length of the stack: Returns the number of elements of the stack;

D, into the stack: will be an element under the push stack;

C, out of the stack: the top of the stack of the elements of the stack;

E, read stack top: Returns the top element of the stack.

2, queue: As with the stack, the queue is also a special linear table, it is only allowed in the team first out of the team operation, in the queue at the end of the team, because the team first out of the queued elements, so it is called a "FIFO" table, referred to as FIFO (first out).

Basic operations of the queue:

A, queue initialization: The queue is empty team;

B. Determine if the queue is empty: Returns True if the queue is empty, and returns false instead;

C, the length of the queue: Returns the number of elements of the queue;

D, read team head: Returns the value of the queue element;

E, queue: Insert an element into the tail of the team;

F, out team: Remove the first element from the queue.

Six, tree

The data elements of a tree structure are characterized by branching and layering. Tree structure is widely existed in the objective world, such as family genealogy, all kinds of social organization structure can be expressed in tree image. In the computer field, the directory tree in the operating system, the organization of information in the database is also used in the tree structure.

1, the basic concept of the tree:

A finite set of n nodes consisting of a tree, T is an empty tree when n=0, otherwise it has the following two characteristics in any non-empty tree T:

A, there is only a specific node, it has no precursor node, which is called the heel node (root node);

b, the remaining nodes can be divided into M-disjoint subsets of T1,t2, ... Tm, in which each subset is itself a tree, and is called a tree of words;

2, the tree representation method:

A, the visual representation method: tree-shaped;

b, nested sets (venturi) notation: 6.3 (a);

C, recessed (lock in) notation: 6.3 (b), as shown in;

D, generalized table (nested parentheses) notation: 6.3 (c).

3. Common terms for trees:

A, node: A, B, C, D, etc.;

B, the degree of the node: the degree of the node refers to the number of words that a node owns (the degree of A is 3,c 1);

C, the degree of the tree: the maximum tree of nodes in the tree.

D, leaves: zero degrees of nodes called leaves (E, F, K, L, H, I, J);

E, Branch node: degree is not zero node, generally in addition to leaf node, the other is branch node;

F, children and parents: the child of the node is called the node, and accordingly, the node is called the child's parents.

G, brother: The child of the same parent is called a brother by each other.

H, ancestors and descendants: The ancestor of a node is all nodes from the root to the branch of the node. Accordingly, any node in a subtree that is rooted in a node is called a descendant of that node.

I, the level of the node: the level of the node from the root of the definition, with the level of the node is 1, its sea sub-nodes of the level of 2, and so on, arbitrary nodes of the level of the parent node level plus 1.

J, Cousin: Parents in the same layer of the knot is the cousin of each other.

K, Depth of tree: the maximum level of nodes in a tree is called the depth of the tree. The depth of the tree shown in 6.2 (b) is 4.

L, ordered tree and unordered tree: the subtree of each node in the tree is regarded as a sequence from left to right, then it is called an ordered tree, otherwise it is an unordered tree;

M, Forest: The forest is a finite set of M-disjoint trees. For each node in the tree, the collection of its subtrees is the forest; Conversely, if the nodes of each tree in the forest are given the same parent node, a tree is obtained.

4, the basic operation of the tree:

A, initialization: the tree T is initialized to a tree;

B, Judgment tree empty: To determine whether a tree is empty, null to return True, otherwise return false;

C, seeking root node: Return the root node of the tree;

D, for Parents node: Return x parent node, if x is the root node, return null;

E, for children to knot point: To find the knot X of the child node, if the node x is a leaf node, live without the child node I, then return to empty;

F, inserting subtree: Place the subtree with the root y as the sub-tree of the node x in the tree T;

G, delete subtree: Delete the sub-tree of node x in tree T;

H, traversal tree: From the root node, in a certain order to access all nodes in the tree;

5, the storage structure of the tree

In order to store the data information of each node in the tree, it is necessary to uniquely reflect the logical relationship between the nodes in the tree. The usual way to store a tree in an array: parent (array) notation.

The parent (array) notation is a sequential storage structure for the tree, this notation uses a one-dimensional array to store information about the tree, storing the nodes in the tree in a one-dimensional array in order from top to bottom, left to right, and each array element containing information about the node itself and the location of the parent of the node. , which is the parent's subscript value.

A, binary tree: N A finite set of nodes, which is either empty or composed of a root node and two disjoint two-tree trees, respectively, of the Saozi right subtree;

It is characterized in that each node has at most two sub-trees (degrees <=2), the subtree of the binary tree has left and right points, the number of times can not be reversed.

The basic operation of binary tree: Initialize, judge whether the binary tree is empty, find the root node, seek the parents knot, seek the height of the binary tree, find the left child of the knot, seek the right child of the knot, and traverse the binary tree;

Binary tree Storage structure: one-dimensional array can be implemented, will be the complete binary tree is numbered I node element is stored in an array of elements labeled I.

Binary Tree Traversal: The number of times (pre-order, first order) traversal "DLR", in sequence (middle order) traversal "LDR" and followed by the Order (POST) traversal "LRD".

c, binary tree application---huffman tree:

Huffman tree, also known as the optimal binary tree, is a class of two-fork tree with the shortest length of the weighted path.

The efficiency of the correlation algorithm depends not only on the position of the element in the binary tree, but also on the frequency of the element's access. If you can make the element with high frequency of access less than the number of comparisons, it can improve the efficiency of the algorithm, this is the Huffman tree to solve the problem.

Several basic concepts:

1. Path: The path to the two nodes of a node in the tree that is formed between the branches of another node. Not all nodes in the tree have paths, such as there is no path between the sibling nodes, but there is a path from the root node to any node.

2, Path length: The number of branches on the path is called the path length between two nodes;

3. The path length of the tree: the sum of the path length from the root node to each node in the tree;

4, node of the right: to the tree node has given a certain meaning of the number, known as the right of the node;

5. The path length of the node: the product of the length of the path between the node and the root and the right of the node.

6. The length of the tree with the right path: the sum of the weighted path length of all leaf nodes in the tree;

7. Huffman tree (optimal binary tree): In all two-fork trees with n leaf nodes with weights of W1,W2,...,WN, the two-fork tree with a weighted path length WPL is called the optimal binary tree or Huffman tree;

Seven, find

Find algorithms: Order lookup, binary lookup, block lookup, binary tree sort tree lookup, hash table lookup.

1, Sequential search algorithm: Time review degree is O (n);

2, Binary search algorithm: Also called two points to find. The biggest feature: The lookup table is an ordered table, the program can exclude half of the non-conforming data each time. Time complexity: O (LOG2N), the efficiency of binary lookup is much higher than the sequential lookup;

3, block lookup algorithm: Also known as index order lookup, is between the order lookup and binary find a compromise between the search method, it does not require all the records in the table in order, but requires the table records block order, the basic idea is: first from the Lookup Index table, Index table is an ordered table, you can use two-point lookup or sequential lookup, To determine which piece of the unknown Origin node is in, and then order the lookup in the determined block, which can only be searched in order because the block is unordered. The efficiency of block lookups is between sequential lookups and binary lookups.

4. Hash table lookup: A hash table lookup is a way to find a relationship between the data and its memory address. A hash function is a corresponding relationship between the data and the physical address, which can be used to reduce the number of lookups and improve the search efficiency.

Construct the hash function principle: A, the function itself is simple to calculate; b, the probability of k,h (k) corresponding to a different address in the keyword set is equal, that is, any one of the recorded keywords can be distributed as evenly as possible by the calculation of the hash function, in order to minimize the conflict.

The hash lookup must address two main issues: A, construct a hash function that is simple to compute and as few as possible, and B. gives a method for dealing with conflicts.

How to construct a hash function:

A, direct addressing method;

B, Square take the Chinese law;

C, the number of sub-analytical method;

D, in addition to the remainder method;

E, random number method.

Ways to handle Conflicts:

A, open addressing law;

B, zipper method.

Eight, sort

Insert Sort: Direct insert sort and hill sort;

Exchange sort: bubble sort and quick sort;

Select sort: Select Sort directly

Merge sort.

1 Direct Insertion Sort: The keyword is still ordered after it is inserted into ordered sequence, time complexity: O (N2);

20 percent semi-insertion sort: can reduce the number of keyword comparisons, the number of keywords to compare the most N/2 times, the number of mobile records and direct insertion of the same sort, so the time complexity is still O (N2);

3 Hill sort: Insert sort after grouping. Time Complexity of O (n1.3);

4 Bubble Sort: Compare the keywords of the adjacent records of the sorted sequence so that the records of the smaller keywords move forward and the records of the two larger keywords move backward. Time Complexity of O (N2);

5 Quick sort: i,j two pointers, with the first digit as the benchmark, the J pointer from right to left fast scan sequence, scan to the position of the I pointer less than the datum, and then change the I pointer from left to right fast scan to move to the position of the J pointer, so alternating until i=j, At this point I left is less than the benchmark, I right is greater than the benchmark, then put the datum into I position, complete the sorting, Time complexity: O (NLOG2N);

6 Direct Selection Sort: Find the smallest and first record interchange position in N Records, then find the smallest record in the remaining N-1 records with the 2nd record Exchange location, time complexity: O (N2);

7 Merge Sort: Merges N ordered subsequence into an ordered sequence, the time complexity of the algorithm is: O (nlog2n).

Data structure and algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More