Test the data structure-Part 2: queue, stack, and hash table

Source: Internet
Author: User
Test the data structure-Part 2: queue, stack, and hash table

Original article:Part 2: The Queue, Stack, and Hashtable

Ben
This article is the second part of the "Data Structure" series. It examines the three most researched data structures: Queue, Stack, and Hashtable ). Zheng
As we know, Quenu and Stack are actually a special ArrayList that provides storage for a large number of different types of data objects, but the order of access to these elements is limited.
Hashtable provides an array-like data abstraction for more flexible index access. The array needs to be indexed by the ordinal number, while Hashtable allows
You can use any object to index data items.

Directory:

Introduction

Working Process of "Queuing order"

"Reverse queuing order"-stack data structure

Ordinal Index Limit

System. Collections. Hashtable class

Conclusion

Introduction

In the first part, we learned what the data structure is, evaluated their respective performance, and learned about the impact of the selected data structure on specific algorithms. In addition, we also understand and analyze the basic knowledge of the data structure, and introduce the most common data structure: array.

The array stores the same type of data and indexes it by ordinal number. The actual value of the array is stored in a continuous memory space, so the specific elements in the read/write array are very fast.

Because
It has the same architecture and fixed length,. Net
The Framework base class Library provides the ArrayList data structure, which can store different types of data without explicitly specifying the length. As mentioned above, ArrayList is essentially
Is an array that stores objects. Each time you call the Add () method to Add elements, the internal object array must check the boundary. If the boundary is exceeded, the array will automatically increase its length by multiple.

In the second part, we will continue to examine two types of array structures: Queue and Stack. Similar to ArrayList, they are also adjacent memory blocks to store different types of elements. However, they are limited when accessing data.

Later, we will gain an in-depth understanding of the Hashtable data structure. Sometimes, we can regard Hashtable as an associative array. It is also a set that stores different types of elements, but it can use any object (such as string) instead of a fixed ordinal number.

Working Process of "Queuing order"

If you want to create different services, such services are programs that respond to multiple requests through multiple resources; when processing these requests, how to determine the order of response becomes a major challenge for creating a service. There are two solutions:

"Queuing order" Principle

"Priority-based" processing principles

When you shop in a store or withdraw money from a bank, you need to wait in queue for the service. The "Queuing order" principle specifies that services can be enjoyed earlier than those listed later. The Service Order is determined based on the priority level principle. For example, in the emergency room of a hospital, patients with critical life will receive a doctor's diagnosis first, instead of having to know who is going first.

Imagine that you need to build a service to process the requests received by the computer. Because the requests received far exceed the processing speed of the computer, therefore, you need to put these requests in the buffer according to the order they are submitted.

I
A solution is to use ArrayList to specify the position of the task to be executed in the array by using an integer variable called nextJobPos. When new work requests enter, we simply use
Add () method of ArrayList to the end of ArrayList. When you are preparing to process a buffer task, you can use nextJobPos to obtain the task in
The position value of the ArrayList to obtain the task and add nextJobPos to 1. The following program implements the algorithm:

Using System;
Using System. Collections;
Public class JobProcessing

{

Private static ArrayList jobs = new ArrayList ();
Private static int nextJobPos = 0;
Public static void AddJob (string jobName)

{
Jobs. Add (jobName );

}

Public static string GetNextJob ()

{

If (nextJobPos> jobs. Count-1)

Return "no jobs in buffer ";

Else

{

String jobName = (string) jobs [nextJobPos];

NextJobPos ++;

Return jobName;

}

}

Public static void Main ()

{

AddJob ("1 ");

AddJob ("2 ");

Console. WriteLine (GetNextJob ());

AddJob ("3 ");

Console. WriteLine (GetNextJob ());

Console. WriteLine (GetNextJob ());

Console. WriteLine (GetNextJob ());

Console. WriteLine (GetNextJob ());

AddJob ("4 ");

AddJob ("5 ");

Console. WriteLine (GetNextJob ());

}

}

The output result is as follows:

1

2

3

NO JOBS IN BUFFER

NO JOBS IN BUFFER

4

This
The method is easy to understand, but the efficiency is terrible and unacceptable. Because, even if the task is processed immediately after it is added to the buffer, the length of the ArrayList will still be added to the buffer.
The number of tasks is increasing. Assume that it takes one second to add and remove a task from the buffer. This means that every time the AddJob () method is called within one second, the ArrayList
Add () method. As the Add () method is continuously called, the length of the array in the ArrayList will increase exponentially as required. Five minutes later
The length of the internal array is increased to 512 elements, but there is only one task in the buffer zone. According to this trend, as long as the program continues to run, the job continues to enter, ArrayList
The length will naturally continue to grow.

The ridiculous result is that the space of the previously processed tasks in the buffer zone is not recycled. That is
Yes. When the first task is added to the buffer and processed, the first element space of the ArrayList should be reused. Think about the workflow of the above Code. When two jobs are inserted --
AddJob ("1") and AddJob ("2") -- ArrayList space 1:
Figure 1: ArrayList after executing the first two lines of code

Note that the ArrayList contains 16 elements, because the default length of ArrayList during initialization is 16. Next, call the GetNextJob () method to remove the first task. Result 2:


Figure 2: ArrayList after the GetNextJob () method is called

When
When AddJob ("3") is executed, we need to add a new task to the buffer zone. Apparently, the space of the first element in the ArrayList (index 0) is reused. At this time, the space of the first element in the 0 index is placed in the third
Tasks. But don't forget, when we execute AddJob ("3"), we also execute AddJob ("4"), and then call the GetNextJob () method twice. If we
If the third task is placed at 0 index, the fourth task is placed at index 2, and the problem occurs. III:
Figure 3: a problem occurs when a task is placed on a zero index.

Now GetNextJob () is called, and the second task is removed from the buffer. The nextJobPos Pointer Points to index 2. Therefore, when GetNextJob () is called again, the fourth task is removed before the third task, which is contrary to our "sort order" principle.

Question
The crux of the problem is that ArrayList is a linear sequence that reflects the task list. Therefore, we need to add the new task to the right sequence of the task to ensure that the current processing order is correct. At any time
At the end of the ArrayList, The ArrayList will multiply. If an unused element is generated, the GetNextJob () method is called.

The solution is to make our ArrayList ring. The ring array does not have a fixed start point and end point. In the array, we use variables to maintain the starting and ending points of the array. Ring Array 4:


Figure 4: Ring Array

In
In the ring array, the AddJob () method adds a new task to the index endPos (Note: endPos is generally called the tail pointer), and then "increments" The endPos value.
The GetNextJob () method obtains the task based on the header pointer startPos, points the header pointer to null, and increments the value of startPos. The reason why I add the word "increment"
The above quotation marks are used because the "increment" mentioned here is not just as simple as adding the variable value to 1. Why can't we simply add 1? Consider this example: When endPos is equal to 15, if
If endPos is added to 1, endPos is equal to 16. When AddJob () is called, it tries to access the element with the index of 16, and an exception occurs.
IndexOutofRangeException.

In fact, when the endPos is equal to 15, the endPos should be reset to 0. Use the increment Function to check that if the passed variable value is equal to the length of the array, It is reset to 0. The solution is to evaluate the modulo (remainder) of the variable value pair's array length value. The code for the increment () method is as follows:

Int increment (int variable)

{

Return (variable + 1) % theArray. Length;

}

Note: The modulo operator, such as x % y, returns the remainder of x divided by y. The remainder is always between 0 and y-1.

The advantage of this method is that the buffer will never exceed 16 Element Spaces. But what if we want to add a new task with more than 16 element spaces? Like the Add () method of ArrayList, we need to provide the auto-growth capability of the ring array to increase the length of the array in multiples.

System. Collection. Queue class

Just
As we described earlier, we need to provide a data structure that can insert and remove element items according to the "Queuing order" principle and maximize the use of memory space, the answer is to use the data structure.
Queue. In. Net
The Framework base class library has the built-in class-System. Collections. Queue class. Like AddJob () and
The GetNextJob () method. The Queue class provides the Enqueue () and Dequeue () methods to implement the same functions.

Queue
Class internally creates a ring array that stores the object, and uses the head and tail variables to indicate the head and tail of the array. By default, the initial capacity of Queue is 32.
You can also use the constructor to customize the capacity. Since the Queue is built into an object array, you can put any type of elements into the Queue.

Enqueue
() The method first checks whether there is sufficient capacity in the queue to store new elements. If yes, add the element directly and increase the index tail progressively. Here, tail uses the modulo operation to ensure that tail does not
Exceeds the length of the array. If the space is insufficient, queue expands the array capacity based on the specific growth factor. The default growth factor is 2.0, so the length of the internal array is doubled. You can also construct
This growth factor is customized in the function.

The Dequeue () method returns the current element based on the head index. Then, point the head index to null and then "increment" The head value. Maybe you only want to know the value of the current Header element, rather than the output Queue (dequeue, out-of-column), the Queue class provides the Peek () method.

Queue
It is important not to have random access like ArrayList. That is to say, we cannot directly access the third element before the first two elements are columns. (Of course, the Queue class
Provides the Contains () method, which allows you to determine whether a specific value exists in the queue .) If you want to access data randomly, you can only use
ArrayList. Queue is most suitable for this situation, that is, you only need to process the elements stored in the accurate order of receipt.

Note: You can call Queues a fifo data structure. FIFO indicates First In, First Out, which is equivalent to "First come, first served )".

Translation
Note: In the data structure, we usually call the queue as the first-in-first-out data structure, while the stack is the first-out data structure. However, this article does not use First in, first
The concept of out, But first come, first
Served. It is not suitable for translation first or first processing. This article introduces this concept by taking queuing as an example to simply translate it into "Queuing order ". I think
Those in the queue should be able to understand the meaning. For the stack, only the name "anti-queuing order" is used to represent (First Come, Last
Served ). I hope all of you can better translate this word to replace me. Why not translate it into "first-in-first-out" and "first-out? I mainly consider English here
Served has a wide range of meanings. At least we can think of it as processing data, so it is not as simple as output. So I simply avoided the meaning of this word.

"Reverse queuing order"-stack data structure

Queue
The data structure uses the ring array that stores objects internally to implement the "Queuing order" mechanism. Queue provides the Enqueue () and Dequeue () methods for data access.
Q. "Queuing sequence" is often used to deal with real-world problems, especially for services, such as web servers, print queues, and other programs that process multiple requests.

In
Another commonly used method in programming is "first come, last served )". Stack is such a data structure. In. Net
The Framework base class library contains the System. Collection. Stack class. Like Queue, Stack also stores object-type data pairs.
The internal ring array of the image. Stack uses two methods to access data: Push (item) and Push data into the Stack. Pop () pops up the data Stack and returns its value.

A Stack can be visually represented by a vertical combination of data elements. When an element is pushed into the stack, the new element is placed at the top of all other elements. when it pops up, the new element is removed from the top of the stack. The following two images demonstrate the stack pressure and stack exit processes. First, press data 1, 2, and 3 into the stack in order, and then pop up:

Figure 5: Push three elements into the stack

Figure 6: Stack after all elements are displayed

Note
The default capacity of the Stack class is 10 elements, not 32 Elements of the Queue. Like Queue and ArrayList, the Stack capacity can also be customized based on the constructor.
Like ArrayList, the Stack capacity automatically doubles. (Recall: Queue can set the growth factor based on the options of the constructor .)

Note: Stack is usually called the "LIFO first-in-first-out" or "LIFO first-in-first-out" data structure.
STACK: a common metaphor in Computer Science
In real life, there are many examples similar to Queue: DMV (I don't know its abbreviation, forgive me for being ignorant and unaware of it) and print task processing. However, in real life, it is difficult to find an example similar to Stack, but it is a very important data structure in various applications.

Set
Think about the computer language we use for programming, such as C #. When the C # program is executed, CLR (when the common language is running) will call the Stack to trace the function module.
Function. I understand the meaning of the author not only indicates the function, but in fact many compilers call the stack to determine its address. When a function module is called, the related information is pushed to the heap.
Stack. When the call ends, the stack is displayed. The data at the top of the stack is the information of the currently called function. (To view the execution status of the function call stack, you can
Create a project under Studio. Net, set a breakpoint, and perform debugging. When the breakpoint is executed, it will be in the debugging window
(Debug/Windows/Call Stack.

Limit on ordinal Index

In the first part, we will talk about the array as a collection of the same type of data and index it by ordinal number. That is, the time for accessing element I is a fixed value. (Remember that this quantitative time is marked as O (1 ).)

Also
Xu didn't realize that, in fact, we always have a special liking for ordered data ". For example, employee database. Each employee uses the social security
Number) is its unique identifier. The social security number format is DDD-DD-DDDD (D range is digit 0-9 ). If we have an array randomly arranged to store all employee information
Employees with the social security number 111-22-3333 may traverse all elements of the array-that is, perform O (n) operations. A better way is to sort according to the social security number and reduce the search time to O
(Log n ).

Ideally, we would like to execute O (1) times to find information about an employee. One solution is to create a giant array with the actual social security number value as its entry point. In this way, the starting and ending points of the array are 000-00-0000 to 999-99-9999, as shown in:

Figure 7: A giant array that stores all 9-digit numbers

For example
As shown in the figure, each employee's information includes name, phone number, salary, and so on, and is indexed by its social security number. In this way, the time for accessing any employee information is a fixed value. The disadvantage of this solution is the extreme space.
Waste-a total of 109, that is, 1 billion different social security numbers. If the company has only 1000 employees, this array uses only 0.0001% of the space. (From another perspective, if you want
Array to make full use of, maybe your company has to hire 1/6 of the world's population .)

Use a hash function to compress the ordinal index.

Obviously, it is unacceptable to create an array of 1 billion elements to store the information of 1000 employees. However, we urgently need to increase the data access speed to reach a constant time. One option is to use the last four digits of the employee's social security number to reduce the span of the social security number. In this way, the span of the array only needs to be from 0000 to 9999. Figure 8 shows the compressed array.

Figure 8: Compressed Array

This solution not only ensures that the access time is a constant value, but also makes full use of the storage space. The last four digits of the social security number are random. We can use the four digits in the middle, or 1st, 3, 8, or 9 digits.

Converts 9-digit numbers to 4-digit numbers into hashing ). Hash conversion can convert an indexers space into a hash table ).

Hash functions implement hash conversion. In the example of the social security number, the hash function H () is represented:
H (x) = the last four digits of x

The input of the hash function can be any nine-digit Social Security number, and the result is the last four digits of the social security number. In mathematical terms, this method of converting nine digits into four digits is called hash element ing, as shown in Figure 9:

Figure 9: Hash function Diagram

Figure
9. clarified a kind of behavior in the hash function-collisions ). That is, when we map elements of a relatively large set to a relatively small set, the same value may appear.
For example, all the last four digits of the social security number 0000 are mapped to 0000. So 000-99-0000,113-14-0000,933-66-0000, there are many other
All will be 0000.

Let's look at the previous example. If we want to add a new employee with the social security number 123-00-0191, what will happen? Apparently, an attempt to add the employee will cause a conflict because there is already an employee at location 0191.

Mathematical annotation: the hash function is more described as f: A-> B in terms of mathematical terms. | A |> | B |, function f is not A one-to-one ing relationship, so there is A conflict between them.

Display
However, conflicts may cause some problems. In the next section, we will look at the relationship between the hash function and the occurrence of the conflict, and then simply commit several mechanisms for dealing with the conflict. Next, we will focus on
System. Collection. Hashtable class, and provides the implementation of a hash table. We will understand the hash functions and conflict resolution mechanisms of the Hashtable class
And some examples of using Hashtable.

Avoid and resolve conflicts

When we add data to a hash table, a conflict is a cause of damage to the entire operation. If there is no conflict, the insert element operation is successful. If there is a conflict, You need to determine the cause. As conflicts increase the cost, our goal is to minimize conflicts as much as possible.

Ha
The frequency of conflicts in Greek functions is related to the data distribution transmitted to hash functions. In our example, assuming that the social security number is randomly allocated, using the last four digits is a good choice. However, if
The employee ID is allocated based on the year of birth or birth address of the employee, because the birth year and address of the employee are obviously not evenly allocated, the last four digits will cause a larger conflict due to a large number of duplicates.

Note:
Statistical knowledge is required for analyzing hash function values, which is beyond the scope of this article. If necessary, we can use K-dimensional (k
To avoid conflicts, it can map a random value from the hash function domain to any specific element and limit it to 1/k. (If this makes you more confused
Don't worry !)

We will select the appropriate hash function method as collision avoidance. Many studies have designed this field because the selection of hash functions directly affects the overall performance of the hash table. In the next section, we will introduce the use of hash functions in the. Net Framework Hashtable class.

Yes
There are many ways to handle conflicts. Collision
The object to be inserted into the hash table is put into another space, because the actual space is occupied. One of the simplest methods is called linear mining"
(Linear probing), the implementation steps are as follows:
1. When you want to insert a new element, use the hash function to locate it in the hash table;
2. Check whether an element already exists in this position in the table. If the content of this position is empty, insert and return. Otherwise, move to step 3.
3. if the address is I, check whether I + 1 is empty. If it is occupied, check I + 2. So on, find a location where the content is empty.

Example
For example, if we want to insert the information of five employees into the hash table: Alice (333-33-1234), Bob (444-44-1234), Cal
(555-55-1237), Danny (000-00-1235), and Edward
(111-00-1235 ). After the information is added, 10 is shown:

Figure 10: five employees with similar social security numbers

Alice
The "hash (used as a verb here, )" of the social security number is 1234, so the storage location is 1234. Next, Bob's social security number is also "hashed" to 1234, but because the location is 1234
Alice's information already exists, so Bob's information is placed in the next location -- 1235. Then, add the Cal. The hash value is 1237,1237 is left empty, so the Cal is placed
1237. Next is Danny and the hash value is 1235. If 1235 is in use, check whether location 1236 is empty. Danny is put there. Finally, add
Edward's information. The hash value is also 1235. 1235 is occupied, check 1236 is also occupied, and then check 1237 until 1238 is checked, the location is blank, so
Edward was placed at the 1238 position.

When you search for a hash table, the conflict still exists. For example, for the hash table shown above, we want to access
Edward's information. Therefore, we hash Edward's social security number 111-00-1235 to 1235 and start searching. However, we found Bob at location 1235, not
Edward. So we searched 1236 again and found Danny. Our linear search continues to find the location where we find Edward or where the content is null. We may conclude the result.
If the social security number is 111-00-1235, the employee does not exist.

Although linear mining is simple, it is a good strategy to solve conflicts, because it will
Causes similar aggregation (clustering ). If we want to add 10 employees, the last four of them will be 3344. So there is 10 consecutive spaces, from 3344 to 3353.
. Search for any of the 10 employees to search for the cluster location space. In addition, the length of the cluster space is increased for employees who add any hash value in the range of 3344 to 3353. Fast
For query, we should make the data evenly distributed, instead of forming a cluster in a few places.

A better mining technique is quadratic probing. The step size of each position space is increased by square times. That is, if location s is occupied, first check s + 12, then check S-12, s + 22, S-22, s + 32 and so on, instead of starting from s + 1, s + 2 ...... Linear growth. Of course, secondary mining will also lead to similar aggregation.

In the next section, we will introduce the third conflict resolution mechanism, dual-level hash, which is applied to the. Net Framework hash table class.

System. Collections. Hashtable class
. Net Framework
The base class library includes the implementation of the Hashtable class. When we want to add an element to a hash table, we must not only provide the element (item), but also provide the key for this element ). Key and
Item can be of any type. In the employee example, the key is the employee's social security number, and the item is added to the hash table through the Add () method.

To obtain the elements (items) in the hash table, you can use the key as the index, just as using the ordinal number as the index in the array. The following C # applet demonstrates this concept. It adds some elements to the hash table using the string value as the key. And access specific elements through keys.

Using System;
Using System. Collections;

Public class HashtableDemo
{
Private static Hashtable ages = new Hashtable ();

Public static void Main ()
{
// Add some values to the Hashtable, indexed by a string key
Ages. Add ("Scott", 25 );
Ages. Add ("Sam", 6 );
Ages. Add ("Jisun", 25 );

// Access a participant key
If (ages. ContainsKey ("Scott "))
{
Int scottsAge = (int) ages ["Scott"];
Console. WriteLine ("Scott is" + scottsAge. ToString ());
}
Else
Console. WriteLine ("Scott is not in the hash table ...");
}
}
The ContainsKey () method in the program is used to determine whether a qualified element exists based on a specific key and return a Boolean value. The Hashtable class contains the keys property and returns a set of all the keywords used in the hash table. This attribute can be accessed through traversal, as shown below:

// Step through all items in the Hashtable
Foreach (string key in ages. Keys)
Console. WriteLine ("Value at ages [" "+ key +"] = "+ ages [key]. ToString ());

We need to realize that the order of inserting elements is not necessarily the same as the order of keys in the keyword set. The keyword set is based on the elements corresponding to the stored keywords. The running result of the above program is:

Value at ages ["Jisun"] = 25
Value at ages ["Scott"] = 25
Value at ages ["Sam"] = 6

Even if the order inserted to the hash table is Scott, Sam, and Jisun.

Hash Functions of the Hashtable class

Hashtable
The hash function in the class is more complex than the hash value of the social security number we introduced earlier. First, remember that the value returned by the hash function is the ordinal number. For example, the Social Security number is easy to implement, because the social security number itself is
Number. We only need to extract the last four digits to obtain the appropriate hash value. However, the Hashtable class can accept any type of value as the key. As in the preceding example, the key is a character.
String type, such as Scott or Sam ". In this example, we naturally want to understand how the hash function converts a string to a number.

This
The GetHashCode () method is a wonderful conversion, which is defined in the System. Object Class. The default value of GetHashCode () in the Object class
A unique integer is returned to ensure that the object is not modified during its lifecycle. Since each type is derived directly or indirectly from the Object, the object can access
Method. Naturally, strings or other types can be represented by unique numeric values.

The definition of the hash function in the Hashtable class is as follows:

H (key) = [GetHash (key) + 1 + (GetHash (key)> 5) + 1) % (hashsize-1)] % hashsize

This
The GetHash (key) in is the return value of the GetHashCode () method called for the key by default (although you can customize it when using Hashtable
GetHash () function ). GetHash (key)> 5 indicates that the hash value of the key is obtained, and 5 digits are moved to the right, which is equivalent to dividing the hash value by 32. % Operator is
The modulo operators described earlier. Hashsize refers to the length of the hash table. Because the modulo is to be performed, the final result H (k) is between 0 and the hashsize-1. Since
Hashsize is the length of the hash table, so the result is always within the acceptable range.

Conflict Solution in the Hashtable class

When
When we add or retrieve an element in a hash table, a conflict occurs. When inserting an element, you must find the position where the content is null. When getting the element, you must find it even if it is not in the expected position. Front me
We briefly introduced two conflict resolution mechanisms-linear and secondary mining. In the Hashtable class, a completely different technology is used to become a rehasing
Double hashing ).

The principle of binary hash is as follows: there is a hash function (H1 ...... Hn. When we want to add or retrieve elements from a hash table, we first use the hash function H1. If a conflict occurs, use H2 until Hn. Each hash function is extremely similar. The difference is the multiplication factor they choose. Generally, the hash function Hk is defined as follows:
Hk (key) = [GetHash (key) + k * (1 + (GetHash (key)> 5) + 1) % (hashsize-1)] % hashsize

Note:
When hashsize is mined, each location in the hash table is exactly accessed once. That is, for a given key
Hi and Hj are not used at the same time. The binary hash formula is used in the Hashtable class, which must be: (1 + (GetHash (key)>
5) + 1) % (hashsize-
1) and hashsize are mutually prime numbers. (The two numbers are mutually prime numbers, indicating that the two do not have a common prime factor .) If hashsize is a prime number, ensure that the two numbers are prime numbers.

The dual hash mechanism is better than the first two mechanisms to avoid conflicts.

Call factor and extended hash table

The Hashtable class contains a private member variable loadFactor, which specifies the maximum ratio between the number of elements in a hash table and the total number of table positions. For example, if loadFactor is equal to 0.5, only half of the space in the hash table stores the element value, and the remaining half is empty.

Ha
The hex constructor allows you to specify the loadFactor value in the form of overloading. The value range is 0.1 to 1.0. Note that no matter what value you provide, the range cannot exceed 72%.
Even if the value you pass is 1.0, the loadFactor value of the Hashtable class is still 0.72. Microsoft regards the best loadFactor value as 0.72.
LoadFactor is 1.0, but the system automatically changes it to 0.72. Therefore, it is recommended that you use the default value 1.0 (in fact, 0.72 is a bit confusing, isn't it ?)

Note:
It took me several days to consult Microsoft developers. Why should I use automatic conversion? I don't understand why they don't directly set the value between 0.072 and 0.72. Finally, I wrote
The Hashtable class development team answered the question and published the cause of the question. In fact, this team has tested and found that if loadFactor exceeds 0.72
It seriously affects the performance of hash tables. They hope developers can better use hash tables, but may not remember the irregular number 0.72. On the contrary, if 1.0 is the best value, developers will be more easy to remember.
Live. As a result, although there is a little sacrifice in functionality, it makes it easier for us to use data structures without headaches.

When adding new elements to the Hashtable class, check to ensure that the ratio of elements to space size does not exceed the maximum. If the value exceeds, the hash tablespace is expanded. The procedure is as follows:
1. The location space of the hash table increases exponentially. To be precise, the position space value increases from the current prime value to the next largest prime value. (Think back to the working principle of the binary hash mentioned above. The location Space Value of the hash table must be a prime number .)
2. in binary hash, all element values in the hash table depend on the location Space Value of the hash table, therefore, all values in the table also need a dual Hash (because the location space value is increased in the first step ).

Fortunately, the Add () method in the Hashtable class hides these complex steps and you do not need to care about its implementation details.

Tune
The effect of load factor on the conflict is determined by the overall length of the hash table and the number of times the table is mined. Load
The larger the factor, the more intensive the hash table, the less space it will be. Compared with a relatively sparse hash table, the more times it will be mined. If no precise analysis is performed, the expected number of times of mining operations when a conflict occurs
It is about 1/(1-lf). Here, lf refers to the load factor.

As mentioned above, Microsoft sets the default call factor of the hash table to 0.72. Therefore, for each conflict, the average number of mines is 3.5. Since this number has nothing to do with the actual number of elements in the hash table, the progressive access time of the hash table is O (1), which is obviously far better than the O (n) of the array ).

Finally, we need to realize that the expansion of hash tables will be at the cost of performance loss. Therefore, you should estimate the total number of elements that may last be accommodated in your hash table in advance and construct the hash table with appropriate values to avoid unnecessary expansion.

Trackback: http://tb.blog.csdn.net/TrackBack.aspx? PostId = 164290

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.