Data Structure Overview

Source: Internet
Author: User

Part 1:Data Structure Overview

 

Original article: Part 1: An Introduction to Data Structures

 

Introduction:
This document describes how to use data structures on the. NET platform.ArticleIs divided into six parts. This is the first part of this article. this article attempts to examine several data structures, some of which are included in. some of the base class libraries of Net Framework are created by ourselves. if you are not familiar with these terms, we can regard the data structure as an abstract structure or class, which is usually used to organize data and provide operations on data. the most common and well-known data structure is array, which contains a set of continuous data and is accessed through indexes.

Before reading this article, let's take a look at the main content of these six parts. if you have any idea or think there is something missing in this article, I hope you can contact me through e-mail (mitchell@4guysfromrolla.com) to share your thoughts. if you have time, I'm glad to put your suggestions in the appropriate part. If necessary, add part 7 to this series of articles.

Part 1: first introduce the data structure inAlgorithmDesign importance. the advantage and disadvantage of determining the data structure lies in its performance. we will strictly analyze the performance of the data structure. this section also describes. net frameword two common data institutions: array and arraylist. we will examine the operation methods and efficiency of its structure.

Part 2: we will continue to analyze the arraylist structure from more details. We will also introduce the queue class and Stack class. like arraylist, both queue and Stack store a set of continuous data sets. net Framework library. unlike arraylist, stack and queue can only read data in the sequence specified in advance (first-in-first-out and first-out), while arraylist can obtain any data item. we will use the exampleProgramTo test the queue and stack, and implement them by extending the arraylist class. then, we need to analyze the hash table hashtable, which can directly access data like arraylist. The difference is that it is indexed by key (string.

Arraylist is an ideal data structure for direct data reading and storage. It is also a candidate solution that supports data search. in the third part, we will examine the binary tree structure. For data search, it is more effective than arraylist .. net Framework does not contain this built-in data structure, so we need to create it ourselves.

The efficiency of binary tree search is subject to the order of data inserted into the tree. if we insert ordered or approximately ordered data, in fact, it is not as efficient as arraylist. in order to combine these two advantages, in Part 4, we will examine an interesting random data structure-skiplist. skiplist not only retains the efficiency of binary tree search, but also has little impact on the efficiency of input data sequence.

In Part 5, we turn our attention to the data structure that is usually used to express the image. graph is a collection of nodes and edges between nodes. for example, a map can be displayed in the form of a map. A city is a node, while a highway is an edge connecting nodes. many practical problems can be abstracted into graphs. Therefore, graphs are often used as data structures.

Finally, we will talk about reprisent sets (indicating sets ?) And disjoint sets (non-correlated set, that is, the intersection is null ?) A set is a set of unordered data. A non-correlated set is an element that does not share with another set. collections and non-correlated sets are often used in programming. we will describe it in detail in this section.

Data Structure Performance Analysis

When we are thinking about a special application or program problem, most developers (including myself) focus on algorithms to solve the problem at hand, or add a cool feature to your application to enrich your experience. we seldom seem to hear that someone is excited about the data structure he uses. however, the data structure used in a specific algorithm can greatly affect its performance. the most common example is to find an element in the data structure. in the array, the time consumed by the search process is proportional to the number of elements in the array. use a binary number or skiplists (I cannot find a proper translation. As mentioned above, it contains a set of random numbers, and may think of appropriate Chinese characters after reading the following part ), the ratio of time consumption to the number of data is linearly reduced (sub-linear, I am poor ). when we need to search for a large amount of data, the selection of data structures is particularly important to the program performance, the difference is even several seconds, or even several minutes.

Since the data structure used in the algorithm affects the efficiency of the algorithm, it is especially important to compare the efficiency of various data structures and select a better method. as a developer, the first thing we need to pay attention to is how the data structure performance changes with the increase in the amount of stored data? That is to say, when a new element is added to the data structure, how does it affect the running time of the data structure?

Considering such a situation, we use system in the program. io. directory. the getfiles (PATH) method returns a list of objects and stores them in a specific string array directory. suppose you need to search for this array to determine whether an XML file exists in the file list (that is, the extension is. XML files). One way is to scan (scan, or traverse) the entire array. When an XML file is found, an identifier is set.CodeIt may be like this:

Using system;
Using system. collections;
Using system. IO;

Public class myclass
{
Public static void main ()
{
String [] FS = directory. getfiles (@ "C: \ Inetpub \ wwwroot ");
Bool foundxml = false;
Int I = 0;
For (I = 0; I <fs. length; I ++)
If (string. Compare (path. getextension (FS [I]), ". xml", true) = 0)
{
Foundxml = true;
Break;
}

If (foundxml)
Console. writeline ("XML file found-" + FS [I]);
Else
Console. writeline ("No XML files found .");

}
}

Now let's take a look at the worst case. When there is no XML file in the list or the XML file is at the end of the list, we will search all the elements of this array. to analyze the array efficiency, we must ask ourselves, "Suppose there are n elements in the array. If I add a new element, it will grow to n + 1 element, what is the new running time? (Term "running time"-running time, which cannot be considered as the absolute time consumed by the program running, but refers to the number of steps required by the program to complete the task. in array, the running time is considered to be the number of steps required to access the array element .) To search for a value in an array, it is possible to access every element of the array. If the array contains N + 1 elements, the system performs n + 1 check. That is to say, the time spent in searching an array is equivalent to the number of array elements.

When the length of a Data Structure tends to be infinite, the efficiency of its structure is analyzed. We call this analysis method asymptotic analysis ). The commonly used symbols in progressive analysis are uppercase O (big-OH), which describe the performance of traversing arrays in the form of O (n. O is the expression of the big-Oh symbol in the Glossary. N represents the number of execution steps that increase linearly with the length of the array.

To calculate the running time of an algorithm in a code block, follow these steps:

1. Steps for determining the algorithm running time. As mentioned above, for arrays, a typical step should be to read and write the arrays. For other data structures. In particular, you should consider the steps of the data structure itself, which is irrelevant to operations in the computer. The preceding code block is used as an example. The running time should only count the number of times the array is accessed. You do not need to consider creating and initializing variables and comparing the time when two strings are equal.
2. Find the line of code that meets the computing runtime conditions. Set 1 above these rows.
3. Determine whether the rows with values set to 1 are included in the loop. If yes, change 1 to 1 and multiply the maximum number of cycles executed. If two or multiple loops are nested, the same multiplication will continue for the loop.
4. Find the maximum value for each line, which is the running time.

Now we follow this step to mark the above code block. First, we can determine the code lines related to the computing Run Time, And Then mark the two lines of code accessed by the array FS according to step 2. One line is the array element as the string. parameters of the compare () method. One line is on the console. in the writeline () method. We marked the two rows as 1. Then, according to step 3, the string. Compare () method is in the loop, and the maximum number of loops is n (because the array length is N ). Therefore, change Mark 1 of the row to n. Finally, the running time is the maximum value n marked as O (n ). Meaning the time complexity in data structures)

O (N), or linear time (linear-Time), represents one of the running times of multiple algorithms. Others include O (log2 N), O (n log 2 N), O (n2), O (2n), and so on. We don't need to care about these complicated big-Oh marks. The smaller the value in the brackets, the better the performance of the data structure. For example, the time complexity (here I still think the time complexity is better than the running time) is more efficient than the O (log n) algorithm, because log n <n.

Note:

We need to study the following mathematical knowledge. Here, Log a B Another representation is Ay = B . Therefore, Log24 = 2 Because 22 = 4 . Log2n The growth rate is higher than that of a single N It is much slower. In the third part, we will examine the time complexity O (log2n) Binary Tree Structure. (This comment is not very interesting !)

In this series of articles, we will calculate each new data structure and their progressive operation runtime, and compare the differences between other data structures in runtime through similar operations.

Array: a linear, directly accessible, Single Data Structure

In programming, arrays are the simplest and most widely used data structures. Arrays in all programming languages share the following attributes:
1. The data in the array is stored in a continuous memory;
2. All elements of the array must be of the same data type, so the array is considered a single data structure (homogeneous data structures );
3. Direct Access to array elements. (In many data structures, this feature is unnecessary. For example, the data structure skiplist described in the fourth part of this article. To access a specific element in the skiplist, you must search for other elements until the search object is found. However, for an array, if you know that you want to find the I-th element, you can access it through arrayname [I .) Many languages specify that the subscript of an array starts from 0, so the access to the I-th element should be the arrayname [I-1].

The following are common operations on Arrays:
1. Allocate space
2. Data Access
3. redimensioning)

When declaring an array in C #, the array is null ). The following code creates an array variable named booleanarray whose value is null ):

Bool [] boolleanarray;

When using this array, you must use a specific number to allocate space to it, as shown below:

Booleanarray = new bool [10];

The general expression is:

Arrayname = new arraytype [allocationsize];

It allocates a continuous memory space in the CLR hosting heap, which is sufficient to accommodate array elements with the Data Type of arraytypes and the number of allocationsize. If arraytype is of the value type (: for example, int type), The arraytype values with allocationsize unboxed (unboxed) are created. If arraytype is of the reference type (such as string type), allocationsize arraytype reference type values are created. (If you are not familiar with the differences between value type and reference type, managed heap, and stack, refer to "understanding. NET Common type system ")

To help understand the internal storage mechanism of arrays in. NET Framework, see the following example:

Arrayname = new arraytype [allocationsize];

This allocates a contiguous block of memory in the CLR-managed heap large enough to hold the allocationsize Number of arraytypes. if arraytype is a value type, then allocationsize Number of unboxed arraytype values are created. if arraytype is a reference type, then allocationsize Number of arraytype references are created. (if you are unfamiliar with the difference between reference and value types and the managed heap versus the stack, check out understanding. net's common type system .)

To help hammer home how the. NET Framework stores the internals of an array, consider the following example:

Bool [] booleanarray;
Fileinfo [] files;

Booleanarray = new bool [10];
Files = new fileinfo [10];

Here, booleanarray is a value type system. boolean array, while the files array is a reference type system. Io. fileinfo array. Figure 1 shows the situation of CLR hosting heap after these four lines of code are executed.



 
Figure 1: sequential storage of array elements in the managed heap

Remember that the ten elements in the files array point to the fileinfo instance. Figure 2 emphasizes this (hammers home this point, some slang feelings, do not know how to translate ), shows the memory distribution if we allocate some values to the fileinfo instance in the files array.
 


Figure 2: sequential storage of array elements in the managed heap

All arrays in. Net Support read and write operations on elements. The syntax format for accessing array elements is as follows:

// Read an array element
Bool B = booleanarray [7];

// Write an array element, that is, assign a value.
Booleanarray [0] = false;

The running time for accessing an array element is represented as O (1), because the access time for it remains unchanged. That is to say, no matter how many elements are stored in the array, the time spent searching for an element is the same. The running time remains unchanged because the array elements are stored continuously. When searching and locating, you only need to know the starting position of the array in the memory, the size of each element, and the index value of the element.

In managed code, the array search is slightly more complex than the actual implementation, because accessing each array in CLR requires that the index value be within its boundary. If the array index exceeds the boundary, an indexoutofrangeexception is thrown. This boundary check helps ensure that we do not accidentally go beyond the boundary of the array when accessing the array and enter another memory zone. In addition, it does not affect the array access time, because the time required to perform the boundary check does not increase with the increase of array elements.

Note: if there are many array elements, the index boundary check will slightly affect the application's execution performance. For unmanaged code, this boundary check is ignored. For more information, see Chapter 14th of applied Microsoft. NET Framework programming by Jeffrey Richter.

When using arrays, you may need to change the array size. You can create a new array instance based on the specified length and copy the content of the old array to the new array. This process is called redimensioning. The following code is used:

Using system;
Using system. collections;

Public class myclass
{
Public static void main ()
{
// Create an int array containing three elements
Int [] fib = new int [3];
FIB [0] = 1;
FIB [1] = 1;
FIB [2] = 2;

// Re-allocate the array with a length of 10
Int [] temp = new int [10];

// Copy the FIB array content to the temporary Array
FIB. copyto (temp, 0 );

// Assign a temporary array to fib
FIB = temp;
}
}

In the last line of the Code, FIB points to an int32 array containing 10 elements. The element values from 3 to 9 in the FIB array (Note: The subscript starts from 0) are 0 (int32 type) by default ).

When we want to store data of the same type (originally heterogeneous types -- heterogeneous data type, I suspect it is incorrect) and only need to directly access the data, the array is a good data structure. The time complexity of searching unordered arrays is linear. The array structure is acceptable when we operate small arrays or query small arrays. However, when your application needs to store a large amount of data and perform frequent query operations, many other data structures are more suitable for your work. Let's take a look at some data structures that will be introduced next in this article. (If you want to search for an Array Based on an attribute and the array is sorted by this attribute, you can use binary search to search for it, its time complexity is O (log n), which is the same as the time complexity of searching in a binary tree. In fact, the array class contains a static method binarysearch (). For more information about this method, see my previous article "effectively searching ordered arrays ".

Note:. NET Framework also supports multi-dimensional arrays. Like a one-dimensional array, the access time of multi-dimensional arrays to data elements remains unchanged. Recall that the time complexity of the query operation in the one-dimensional array of n elements described earlier is O (n ). For an nxn two-dimensional array, the time complexity is O (n2), because every search requires checking N2 elements. Similarly, the time complexity of K-dimensional array search is O (NK ).

Arraylist: An array that stores different types of data and increases automatically.

Specifically, arrays are limited in design because one-dimensional arrays can only store data of the same type, and specific lengths must be defined for arrays when arrays are used. In many cases, developers require more flexible arrays. They can store different types of data without worrying about the allocation of array space. The. NET Framework base class library provides a data structure that meets such conditions-system. Collections. arraylist.

The following code is an example of arraylist. Note that arraylist can be used to add any type of data without the need to allocate space. All of these are controlled by the system.

Arraylist countdown = new arraylist ();
Countdown. Add (5 );
Countdown. Add (4 );
Countdown. Add (3 );
Countdown. Add (2 );
Countdown. Add (1 );
Countdown. Add ("Blast off! ");
Countdown. Add (New arraylist ());

In a deeper sense, arraylist uses the system. array object that stores objects. Since all types are derived directly or indirectly from objects, an array of the natural object type can also store any type of elements. Arraylist creates an array of 16 object elements by default. Of course, we can also customize the arraylist size by using parameters in the constructor or setting the capacity attribute. The add () method is used to add new elements. The array automatically checks its capacity. If a new element is added, the capacity increases exponentially.

Like array, arraylist can be directly accessed through indexes:

// Read access
Int x = (INT) countdown [0];
String y = (string) countdown [5];

// Write access
Countdown [1] = 5;

// Argumentoutofrange exception will occur
Countdown [7] = 5;

Since arraylist stores object-type elements, the specified type conversion should be displayed when reading elements from arraylist. Note that if the array element you access exceeds the length of the arraylist, the system will throw the system. argumentoutofrange exception.

Arraylist provides auto-growth flexibility not available for standard arrays, but this flexibility is at the cost of performance, especially when we store value types-such as system. int32, system. double, system. boolean. They are continuously stored in the managed heap in unboxed form (unboxed form. However, the internal mechanism of arraylist is an array of referenced object objects. Therefore, even if only the value type is stored in arraylist, these elements are still converted to the reference type through boxing. 3:
 

Figure 3: arraylist for storing objects referenced by continuous Blocks

Use the value type in arraylist to perform additional boxing and unboxing operations. When your application is a large arraylist, and frequent read/write operations will greatly affect the program performance. As shown in 3, for the reference type, the memory allocation of arraylist and array is the same.

Compared to arrays, the auto-Growth of arraylist does not cause any performance degradation. If you know the exact number of elements stored in arraylist, you can use the arraylist constructor to initialize the capacity to disable its auto-growth function. When you do not know the specific capacity of the array, You have to manually change the size of the array when the inserted data element exceeds the length of the array.

A typical computer science problem is: when the program runs beyond the cache space, the best new space should be allocated. One solution is to add 1 at a time based on the original allocated space. For example, if an array is initially allocated with five elements, the length of the array is increased to 6 before the first element is inserted. Obviously, this solution saves the maximum amount of memory space, but the cost is too high, because every time a new element is inserted, a redistribution operation is required.

Another solution is the opposite, that is, each allocation is increased by 100 times based on the original size. If the array is initially allocated with five elements, the array space will increase to 6th before 500 elements are inserted. Obviously, this solution greatly reduces the number of redistribution operations. However, when a very small number of data elements are inserted, there will be hundreds of unused elements, which is a waste of space!

The Approximate running time of arraylist is the same as that of a standard array. Even if the arraylist operation is highly open, especially the storage value type, the relationship between the number of elements and the cost of each operation is the same as that of the standard array.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.