Data structure (C #) Concepts

Source: Internet
Author: User

Chapter 1

1. Data)

Data is the carrier of information in the external world. It can be recognized, stored and processed by computers and is the raw material for computer program processing. Computer programs process a variety of data, such as integer, real, or plural data, or non-numeric data, such as characters, text, graphics, images, and sounds.

2. Data Element and data item)

Data elements are the basic unit of data. They are usually considered and processed as a whole in computer programs. Data elements are also called elements, nodes, vertices, records, and so on. A data element can be composed of several data items. A data item is an inseparable minimum unit of data that contains an independent meaning. A data item is also called a field or domain ). There are two types of data items: primary items and composite items.

3. Data Object)

A data object is a set of data elements of the same nature and a subset of data.

4. Data Type)

Data type is a concept in advanced programming languages, and is the value range of data and the sum of operations on data. Data types can be divided into two types: non-structured atomic types, such as basic types in C # Language (integer, real, complex, and so on), and structure types, its components can be composed of multiple structural types and can be decomposed. The components of the structure type can be non-structured or structured.

5. Data Structure)

A Data Structure is a collection of data elements that have one or more specific relationships with each other. In any problem, data elements are not isolated, but have a certain relationship. This relationship is called structure ). According to the different characteristics of the relationship between data elements, there are usually four basic data structures: (1) set; (2) linear structure; (3) tree structure; (4) graphic structure ).

The formal definition of a data structure is DS, a binary group. DS = (D, R) Where D is a finite set of data elements, R is a finite set of relationships between data elements.

The data structure includes the logical structure and physical structure of the data. The physical structure of data, also known as the storage structure, is the representation and storage of data in computers, including the representation and storage of data elements and the representation and storage of the relationship between data elements.

Data storage structures include sequential and chained storage structures. The sequence storage structure expresses the logical relationship of data elements by the relative position of data elements in computer memory, generally, logically adjacent data elements are stored in storage units adjacent to physical locations. In C #, arrays are used to implement the sequential storage structure. Because the storage space allocated by arrays is continuous, arrays are inherently capable of sequential data storage structure. The linked storage structure does not require logical adjacent data elements to be stored in adjacent locations. The Data Element in the chain storage structure is called a node. The address domain is attached to the node to store the address of the node adjacent to the node to realize the logical relationship between the nodes. This address is called reference, and this address domain is called reference domain ).

6. Algorithms

From the above section, we know that algorithms are closely related to data structures and programs. When designing a program, first determine the corresponding data structure, and then design the corresponding algorithms based on the data structure and the needs of the problem. Due to space limitations, we will only introduce the features of the algorithm, the evaluation criteria of the algorithm, and the time complexity of the algorithm.

1.2.1 algorithm features

An algorithm (algorithm) is a description of the steps for solving a specific type of problem. It is a finite sequence of commands. Each command indicates one or more operations. An algorithm should have the following five features:

1) finity: An algorithm always ends after a poor step is executed, that is, the execution time of the algorithm is limited.

2) unambiguousness: each step of an algorithm must have a definite meaning, that is, it has no ambiguity and can only have the same output for the same input.

3) input: An algorithm has zero or multiple inputs. It is the amount given before the algorithm starts. These inputs are data objects in a data structure.

4) Output: An algorithm has one or more outputs, and there is a specific relationship between these outputs and inputs.

5) realizability: each step in an algorithm can be achieved through a limited number of basic operations that have been implemented.

The meaning of an algorithm is very similar to that of a program, but the two are different. A program may not necessarily meet the need for poverty. For example, the operating system will never stop as long as the entire system is not damaged. In addition, a program can only be described in computer languages. That is to say, the commands in the program must be executable by machines, and the algorithms may not be described in computer languages, natural Language, block diagram, and pseudocode can all describe algorithms.

7. time complexity of the algorithm

The time complexity of an algorithm is the correspondence between the running time of the algorithm and the problem scale. An algorithm consists of the control structure and the original operation. The execution time of an algorithm depends on the comprehensive effect of the two.

8. Concept of Set

A set is a whole composed of some fixed and different members (member) or elements. A member is taken from a larger range and is called a base type ). The number of members in a collection is called cardinality ). Each member of the set, or a base element of the base type, is also a set. We call a set member the subset of the set. Each member in the subset belongs to the set. A set without elements is called an empty set (empty set, also known as a null set.

Set features

1) certainty: Any object can be accurately determined to be an element or not in the Set;

2) Interaction: Elements in the set cannot be repeated;

3) No sequence: the element in the set is irrelevant to the sequence.

9. Recursion

An algorithm calls itself to complete part of its work. When solving some problems, an algorithm needs to call itself. If an algorithm calls itself directly or indirectly, it is called a recursive algorithm ). Depending on the calling method, it can be divided into direct recursion and indirect recursion ).

10. Interface

1) Interface Definition

An interface is defined as a convention. The class or structure that implements an interface must comply with this Convention. Simply put, an interface is a protocol that is followed when classes interact. This kind of ability to think of an object as multiple types is usually called multiple inheritance ). Common Language Runtime (CLR) supports single-implementation inheritance and multi-interface inheritance. Single implementation Inheritance refers to a type that can only have one base type. Multiple interface inheritance refers to a type that can inherit multiple interfaces, while an interface is an abstract of mutual interaction between classes ), abstract The content that requires interaction between classes and define them as interfaces to better control the Logical Interaction between classes.

An interface only contains member definitions, but does not contain Member implementations. The interface does not inherit from any system. object derived type. An interface is just an abstract type that contains a set of virtual methods. The member implementation must be implemented in the inherited class or structure. Interface members include static methods, indexers, constants, events, and static constructors. They do not contain any instance fields or instance constructors. Therefore, an interface cannot be instantiated. The class that implements the interface must implement each member of the interface strictly according to its definition.

11. interfaces and abstract classes

Abstract class and interfaces have many similarities in definition and function. To use abstract classes or interfaces in a program, you must compare the specific differences between abstract classes and interfaces. Abstract classes are classes that cannot be instantiated and must be inherited from them. abstract classes can be implemented or not. Subclass can only inherit from one abstract class. Abstract classes are mainly used for closely related objects. Abstract classes are used if you want to design a large functional unit or create multiple versions of a component. An interface is a fully abstract set of members and is not implemented. The class or structure can inherit multiple interfaces. The interface is most suitable for providing general functions for irrelevant classes. If you want to design small and concise functional blocks, use interfaces. Once an interface is created, it cannot be changed. If you need a new version of the interface, you must create a new interface.

12. Generic programming

Generic type is the most powerful function of. NET Framework 2.0. The main idea of generics is to completely separate algorithms from data structures, so that a defined algorithm can act on multiple data structures to achieve highly reusable development. You can define a type-safe data structure through generics without the need to use actual data types. This will significantly improve performance and produce higher quality code because data processing algorithms can be reused without the need to copy specific types of code.

(1) performance problems. (2) type security. (3) work efficiency.

13. Benefits of generics

Generics enable code reuse. Types and internal data can be changed without causing code expansion, regardless of the value type or reference type. Code can be developed, tested, and deployed at one time. Code can be reused by any type (including future types), and all have compiler support and type security. Because the Generic Code does not forcibly pack or unpack the value type, or forcibly convert the reference type downward, the performance is significantly improved. For value types, the performance is generally increased by 200%. For reference types, the performance can be expected to be increased by up to 100% when accessing this type (of course, the performance of the entire application may be improved, it may not improve ).

Chapter 2

1. Linear table

A linear table is the simplest, most basic, and most commonly used data structure. A linear table is an abstract of a linear structure. A linear structure is characterized by a one-to-one linear relationship between data elements in the structure. This one-to-one relationship refers to the positional relationship between data elements, that is, (1) except for the data element at the first position, there is only one data element before the positions of other data elements; (2) Except for the data element at the last position, there is only one element behind the position of other data elements. That is to say, data elements are arranged one by one. Therefore, we can think of a linear table as a data structure of a sequence of data elements.

2. Linear table definition

A linear table (list) is a finite sequence composed of n (n ≥ 0) data elements of the same type. Pay attention to two concepts for this definition: one is "finite", which means that the number of data elements in a linear table is limited, each data element in a linear table has its own position ). Second, "same type" means that all data elements in a linear table belong to the same type.

3. sequence table definition

In a computer, the simplest and most natural way to save a linear table is to put the elements in the table one by one into the sequential storage unit, which is the sequential storage of the linear table ). The sequential storage of a linear table refers to storing data elements of a linear table in a sequential space with an address in the memory. A linear table stored in this way is called a sequence list ), an ordered table is characterized by adjacent data elements in the table that are stored in the memory.

4. linked Storage)

Such a linear table is called a linked list ). Linked lists do not require logically adjacent data elements to be adjacent to physical storage locations. Therefore, you do not need to move data elements when inserting or deleting a linked list, however, it also loses the advantage that sequential tables can be stored randomly.

5. Linked List

It uses a group of arbitrary storage units to store data elements in a linear table (these storage units can be continuous or non-continuous ).

The two parts constitute the storage image of the data element, which is called a node ). The domain that stores the information of data elements is called the data domain of the node ), the domain that stores the address information of the data elements adjacent to it is called the reference domain of the node ). Therefore, a linear table forms a chain through the reference domain of each node, which is the origin of the chain table name.

If the reference domain of a node only stores the storage address of the direct successor node of the node, the linked list is called a singly linked list ).

Chapter 3 stack and queue

Stack and queue are two important data structures, which are widely used in software design. The stack and queue are also linear structures. The logical relationships between data elements and data elements in the linear tables, stacks, and queues are identical. The difference is that the operations of linear tables are unrestricted, stack and queue operations are restricted. Stack operations can only be performed at one end of the table, queue insertion is performed at one end of the table, and other operations are performed at the other end of the table. Therefore, stack and queue are called linear tables with limited operations.

Stack is divided into sequential stack and chain stack. An ordered stack is represented by an array and a chain stack is represented by a single link, which is a simplification of a single link.

2. Queue

A queue is a linear table with the insert operation limited to the end of the table while other operations limited to the header of the table. The end of the table for insertion is called a rear, and the header for other operations is called a front ). When no data element exists in a pair column, it is called an empty pair column (empty Queue ).

The queue is usually counted as Q = (A1, A2 ,..., An) Q is the 1st Letter of the English word queue. A1 indicates the head element and an indicates the end element. The n elements are based on A1, A2 ,..., An is in the same order as an is in the same order. A1 is the first and an is the last one. Therefore, column operations are performed according to the first in first out or last in last out principle. Therefore, A queue is also called a FIFO table or a lilo table.

The condition for determining whether a team is empty is rear = front. The condition for determining that the team is full is: (Rear + 1) % maxsize = front. The number of data elements in the cyclic queue can be calculated by the formula (rear-front + maxsize) % maxsize.

Add 1 to the team end indicator and change it to rear = (Rear + 1) % maxsize.

The Add 1 operation of the team head indicator is changed to: Front = (front + 1) % maxsize

2.1 chain queue

Another storage method of queues is chain storage, which is called chain queue ). Like a chain stack, a chain queue is usually represented by a single-chain table, which simplifies a single-chain table. Therefore, the node Structure of the chain queue is the same as that of the single-chain table.

3. Tree

Binary Tree is a finite set of n (n ≥ 0) nodes of the same type. A binary tree with n = 0 is called an empty binary tree. For any non-empty binary tree with N> 0:

(1) There is only one special node called the root node of a binary tree, and the root node does not have the parent node;

(2) If n> 1, in addition to the root node, the other nodes are divided into two sets of different TL and TR, and TL and TR are itself a binary tree, the left subtree and the right subtree are called the binary tree respectively ).

(1) Full Binary Tree: If a binary tree has only nodes with a degree of 0 and a degree of 2, and nodes with a degree of 0 are on the same layer, the binary tree is a full binary tree, and the number of nodes of the full binary tree with a depth of K is 2k-1.

(2) Complete Binary Tree: a binary tree with a depth of K and N nodes. if and only when each node has a depth of K, A full binary tree is called a full binary tree when numbers from 1 to n are matched one by one. A full binary tree is characterized by a leaf node only appearing on the maximum two layers of a hierarchy, in addition, the maximum level of the child of the left branch of a node is equal to or greater than 1 of the child of the right branch.

3.1 nature of Binary Trees

Property 1 A non-empty binary tree has a maximum of 2i-1 nodes on layer I (I ≥1 ).

Property 2 If the depth of the required empty tree is 0, a binary tree with a depth of K can have a maximum of 2k-1 nodes (k ≥ 0 ).

The depth K of the Complete Binary Tree with N nodes is log2n + 1.

Property 4 for a non-empty Binary Tree, if the number of nodes with a degree of 0 is N0, and the number of nodes with a degree of 2 is N2, N0 = n2 + 1.

Property 5 for a Complete Binary Tree with N nodes, if the number of all nodes starts from 1 in the order of top to bottom and left to right, for nodes with the serial number of I, there are:

(1) If I> 1, the number of the parent node of the node with the serial number I is I/2 ("/" indicates division); if I = 1, the node is the root node without any parent node.

(2) If 2I ≤ n, the number of the left child node of the node is 2I; If 2I> N, the node has no left child.

(3) If 2I + 1 is less than or equal to N, the number of the right child node of the node is 2I + 1. If 2I + 1> N, the node has no right child.

3.2 binary tree traversal

1. Sequential traversal (DLR)

The basic idea of sequential traversal is: first access the root node, then traverse its left subtree in sequence, and then traverse its right subtree in sequence.

2. Sequential traversal (LDR)

The basic idea of central order traversal is: first traverse the left subtree of the root node in the central order, then access the root node, and finally traverse its right subtree in the middle order.

3. Post-order traversal (LRD)

The basic idea of post-order traversal is: first traverse the left subtree of the root node in a descending order, then traverse the right subtree of the root node in a descending order, and finally access the root node.

4. Level order)

The basic idea of sequence traversal is that the sequence of sequence traversal nodes is first accessed by the first node, which is the same as the sequence of queue operations. Therefore, during the sequence traversal, set a queue and reference the root node into the queue. When the queue is not empty, perform the following three steps cyclically:

(1) retrieve a node reference from the queue and access the node;

(2) If the left subtree of the node is not empty, the left subtree of the node is referenced into the queue;

(3) If the right subtree of the node is not empty, reference the right subtree of the node to the queue;

5.4

5.4.1 Basic Concepts

First, several basic concepts used to define the Harman tree are given.

(1) path: the branch from one node in the tree to another node constitutes the path between the two nodes.

(2) path length: number of branches in the path.

(3) path length of tree: The sum of the path length from the root node of the tree to each node. In a binary tree with the same number of nodes, the path length of the Complete Binary Tree is the shortest.

(4) Weight of node: in some applications, assign a meaningful number of nodes to the tree.

(5) Weight path length of node: the product of the length of the path from the node to the root node of the tree and the weight of the node.

(6) weighted path length (WPL): the sum of the weighted path lengths of all leaf nodes in the tree.

Σ = n1kk. kwpllw

Where, wk is the weight of the K leaf node, and lk is the path length of the K leaf node. In the binary tree shown in Figure 5.17 (A), the path length of Node B is 1, and the path length of node C and D is 2, the path length of node E, F, and G is 3, the path length of node H is 4, and the path length of node I is 5. The path length of the tree is 1 + 2*2 + 3*3 + 4 + 5 = 23. If the permissions of Node B, C, D, E, F, G, H, and I are respectively 1, 2, 3, 4, 5, 6, 7, and 8, the length of these nodes is 1*1, 2*2, 2*3, 3*4, 3*5, 3*6, 4*7, 5 * 8, the length of the tree's weighted path is 3*5 + 3*6 + 5*8 = 73.

So what is the Harman tree?

A Huffman tree is also called an optimal binary tree. It refers to a binary tree with the minimum length of the weighted path for a group of leaf nodes with fixed weights.

User-Defined algorithm. The description is as follows:

(1) Based on the given n weights {W1, W2 ,..., Wn}, construct n Binary Tree sets with only root nodes F = {T1, T2 ,..., Tn };

(2) Select the binary tree with the minimum weight of the two root nodes from set F as the left and right Subtrees to construct a new binary tree, the weights of the root node of the new binary tree are the sum of the weights of the left and right child root nodes.

(3) Delete these two shards in set F and add the new binary tree to set F;

(4) Repeat the above steps until there is only one binary tree in the collection, and this binary tree is the Harman tree.

The tree structure is a very important non-linear structure. The data elements in the tree structure are called nodes. They have one-to-many relationships, both hierarchical and branch relationships. The tree structure can be tree or binary tree.

A tree is recursively defined. A tree consists of a root node and several Subtrees that do not conflict with each other. The structure of each subtree is the same as that of a tree. Generally, a tree refers to an unordered tree. There are usually four methods for logical representation of a tree: intuitive notation, concave notation, generalized table notation, and nested notation. There are three storage methods for the tree: parent-parent notation, child linked list notation, and child sibling notation.

Binary Tree definition is also recursive. A binary tree consists of one root node and two child trees that do not conflict with each other. The structure of each child tree is the same as that of a binary tree. Generally, a binary tree refers to an ordered tree. Important Binary Trees include full Binary Trees and full Binary Trees. There are five binary trees. There are three types of binary tree storage structures: sequential storage structure, binary linked list storage structure, and triples linked list storage structure. This book provides the C # Implementation of the binary linked list storage structure. Binary tree traversal methods include first-order traversal (DLR), middle-order traversal (LDR), last-order traversal (LRD), and level-order traversal (level order ).

A forest is a collection of M (M ≥ 0) trees. Trees, forests, and binary trees can be converted to each other. There are two ways to traverse a tree: First and last. The first and second ways to traverse a forest are two.

A table store is a set of leaf nodes with fixed weights that have the minimum length of the weighted path. The user tree can be used to solve optimization problems and is widely used in data communication and other fields.

1. Intuitive Representation

It is like a tree in daily life. The entire figure is like a inverted tree, which is constantly expanded from the root node. The root node is on the top and the leaf node is on the bottom.

2. Concave Representation

Each node corresponds to a rectangle. The rectangles at all nodes are right aligned. The root node is represented by the longest rectangle. The rectangles at the same layer have the same length. The higher the hierarchy, the shorter the rectangle length.

3. Generalized table Representation

In the form of a generalized table, the root node is placed at the top, and its subtree nodes are enclosed by a pair of parentheses. the subtree nodes are separated by commas. The generalized table of the tree is as follows:

(A (B (E, F, G), C (H), D (I, j )))

4. nested Representation

It is similar to the literary Graph Representation in mathematics.

Figure 6

A graph structure is a non-linear structure that is more complex than a tree structure. The nodes in the tree structure are one-to-many relationships with obvious hierarchy and branch relationships between nodes. The nodes at each layer can be related to multiple nodes at the next layer, but they can only be related to one node at the previous layer. Vertices in a graph (the data elements in the graph are called vertices) are many-to-many relationships, that is, the relationships between vertices are arbitrary, and any two vertices in the graph may be related. That is to say, there is no obvious hierarchical relationship between the vertices of the graph. This relationship exists in a large number in the real world.

According to the definition of the minimal spanning tree, the following three conditions must be met to construct a minimal spanning tree with n vertices in an undirected connected network:

(1) The minimum spanning tree constructed must contain N vertices;

(2) The minimum spanning tree constructed has only n-1 edges;

(3) No loop exists in the constructed minimal spanning tree.

There are many methods to construct the minimal spanning tree. There are two typical methods: prim algorithm and Kruskal algorithm.

2. prim algorithm

Assume that G = (V, E) is an undirected connected network, where V is the set of vertices in the network and E is the set of edges in the network. Set two new sets U and T, where U is the set of the smallest Spanning Tree vertex of G and T is the set of the smallest Spanning Tree edge of G. The idea of the prim algorithm is to set the initial value of the set u to u = {u1} (assuming that the Minimum Spanning Tree is constructed from the vertex U1), and the initial value of the set T is t = {}. Select the edge (u, v) with the minimum weight value from the weighted edge of all vertices u and vertex v, and add vertex V to the set U, add an edge (u, v) to set T. Repeat until u = V, the Minimum Spanning Tree is constructed. In this case, the set U stores all vertices of the minimal spanning tree, and the set T stores all edges of the minimal spanning tree.

3. Kruskal Algorithm

The basic idea of the cruise Karl algorithm is to select the edges in an undirected connected network with n vertices based on their weights, if the edge is selected so that the Spanning Tree does not form a loop, add it to the tree. If a loop is formed, discard it. This continues until the tree contains n-1 edges.

The following describes the Topology Sorting Algorithm:

(1) Select a vertex with an inbound degree of 0 in the directed graph (that is, a vertex without a precursor). Because the vertex does not have any precondition, the vertex is output;

(2) Delete all arcs whose names end with it;

(3) Repeat (1) and (2) until the vertex with an input degree of 0 is not found, and the topological sorting is completed.

If there are still vertices in the graph but no vertices with an inbound degree of 0, there is a loop in the AOV network. Otherwise, there is no loop.

According to the definition of heap, the heap has the following two properties:

(1) The root node of the maximum heap is the biggest key-code-off node in the heap. The root node of the minimum heap is the smallest key-code node in the heap. We call the heap root node record the heap top record.

(2) For the maximum heap, from the root node to the path of each leaf node, the sequence of nodes is in descending order. For the minimum heap, from the root node to the path of each leaf node, the sequences of nodes are incremental and ordered.

The heap sorting process is: there are N records. First, the N records are built into a heap based on the key code, and the heap top record is output to obtain the maximum (or minimum) key code of N records). Then, output the remaining n-1 records to the top record of the heap, and obtain the record with a large (or small) number of key-value requests in N records. After such repetition, a sequence ordered by key codes can be obtained.

Therefore, two problems need to be solved to achieve heap sorting:

(1) how to build n record sequences into a heap based on key codes;

(2) how to adjust the remaining n-1 records after the heap top record is output to make it a new heap based on the key code.

Sorting is an important operation in computer programming,

The process of forming a sequence of key codes recorded. Sorting methods are classified into internal sorting and external sorting by memory involved. Internal sorting refers to the storage of records in the memory and the adjustment of the relative location between records in the memory, there is no internal or external data exchange. External

Memory to adjust the relative position between records, data needs to be exchanged between the internal and external storage. Sorters

The positional relationship between records with the same key code value before and after the stable sorting method remains unchanged, and the positional relationship between records with the same key code value before and after the unstable sorting method changes. This chapter mainly introduces common internal sorting methods, including three simple sorting methods: Direct insertion sorting, Bubble sorting, and simple selection sorting, the time complexity of the three sorting methods in the best case is O (n), and the time complexity in the average and worst cases is O (n2 ), and they are all stable sorting methods.

The average performance of the quick sorting method is the best, and the time complexity is O (nlog2n). Therefore, the quick sorting method is most suitable when the sorted sequence has been randomly distributed by key code. However, the time complexity of quick sorting in the worst case is O (n2 ). The quick sorting method is an unstable sorting method.

The time complexity of the heap sorting method does not change in the best, average, and worst cases. It is O (nlog2n) and requires less auxiliary space than the quick sorting method. Heap sorting is also an unstable sorting method. The time complexity of the Merge Sorting method does not change in the best, average, and worst cases. It is O (nlog2n), but the necessary auxiliary space is greater than the heap sorting method, however, the Merge Sorting method is a stable sorting method. The preceding sorting methods are sorted by comparing key codes and moving records.

In general, sorting adopts the sequential storage structure (except for the base sorting method). When there are many records, you can use the chained storage structure, however, fast sorting and heap sorting are difficult to implement on the linked list.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.