Search is one of the most basic applications in data structures, and in Python there is a very simple way to do this:
15in [3,5,4,1,76]
False
But this is just a form of search, and here are a few forms of search for records:
First, sequential sorting
Follow the list of elements to find the return true, not found return false
defSequential_search (A_list, item): POS= 0Found= False whilePos< Len(a_list) and notFoundifA_list[pos]==Item:found= True Else: Pos=Pos+1 returnFoundtest_list=[1,2, +,8, -, +, the, -,0]Print(Sequential_search (Test_list,3))Print(Sequential_search (Test_list, -))
FalseTrue
The time complexity of sequential sorting is the following table, the best and worst-case time Complexity is O (1) and O (n), respectively, corresponding to the first and last element to be found in the list. For average cases, the calculation is (1+2+3+...+n)/n = (1+n)/2, so its time complexity is O (N/2)
Seeing the diagram above, item existence is not the same as the time complexity of a non-existent average situation. Because if it doesn't exist, search must traverse all the elements each time, so the average time complexity is O (n).
But for the original list of elements that are ordered (ordered list), the situation is not the same. Because the elements are ordered, once you find two connected elements that are smaller than him and larger than him, you can directly conclude that the return is false, so the average time complexity is O (N/2)
The table of time complexity is as follows:
defOrdered_sequential_search (A_list, item): POS= 0Found= FalseStop= False whilePos< Len(a_list) and notFound and notStopifA_list[pos]==Item:found= True Else:ifA_list[pos]>Item:stop= True Else: Pos=Pos+1 returnFoundtest_list=[0,1,2,8, -, -, +, +, the,]Print(Ordered_sequential_search (Test_list,3))Print(Ordered_sequential_search (Test_list, -))
FalseTrue
Two or two points search
First make it clear that the binary search is also for ordered list .
The sequential search is looking down from the beginning, and for ordered list, the binary search first inspects the middle element in the list, compares it with the size of the item we need to find, then determines the left and right area of the item, then makes a simple recursive (recursion), Until you find the owning item or the recursive list, there is only one element left.
First, use iterations (iteration) to solve this problem:
defBinary_seach (A_list, item): First= 0Last= Len(a_list)-1Found= False whileFirst<Last and notFound:midpoint=(First+Last//2 ifA_list[midpoint]==Item:found= True Else:ifItem<A_list[midpoint]: Last=Midpoint-1 Else: First=Midpoint+1 returnFound
Then, use recursion (recursion) to solve the problem again:
defBinary_search (A_list, item):if Len(a_list)==0:# for recursion (recursion), be sure to give the end condition at the beginning return False Else: Midpoint= Len(a_list)// 2 ifA_list[midpoint]==Item:found= True Else:ifItem<A_list[midpoint]:# Time is spent on this statement returnBinary_search (A_list[:midpoint-1], Item)Else:returnBinary_search (A_list[midpoint+1:], Item)returnFoundtest_list=[0,1,2,8, -, -, +, +, the,]Print(Binary_search (Test_list,3))Print(Binary_search (Test_list, -))
FalseTrue
Binary Search Time complexity: for this algorithm, time consumption is spent on the comparison element size, each time the comparison of elements let us less than half of the list, when the length of the list is n, we can be used to compare the number of log (n) times, for this explanation, I would like to take out This figure in the introduction to algorithms is straightforward.
This is a diagram for the divide-and-conquer strategy (Divide-and-conquer), but it also helps to understand that each time the element is compared to a layer of the tree, the number of comparisons is the number of layers of the tree.
Of course, all of this is done without considering the slice operation time of the list in Python. In fact, the operating time complexity of slice should be O (k), in theory, binary search consumption is not strictly log time.
PS: The binary algorithm needs to sort the list first, in fact, in the case of a small list length, the time spent in a sort operation may not be worth at all. This time, the method of searching directly in order may be more efficient.
Hashes (hash) search
Basic idea: if we knew beforehand where an element was supposed to be, then we could just find him.
The hash table (hash table, also known as hash list) works like this. As shown, the hash table has m=11 slots (slots) and each slot has its corresponding name (0,1,2,.... 10), each slot can be empty full, empty time padding None
, slots and the mapping of individual elements (mapping) is determined by the hash function (hash functions), assuming we have an integer element 54,26,93,17,77,31 need to put into the hash table, We can use the hash function h(item) = item%11
to determine the name of the slot, which represents the length of the 11
hash table.
Once we put the elements into the hash table, for example, we can calculate the load rate (load factor) for this hash λ = number_of_item / table_size
. For the following table, the load rate is 6/11.
knock on the blackboard! when we want to search for an item, we just need to apply the item to the hash function, figure out the item corresponding to the name in the hash table, and then find the element in the hash table that corresponds to the name of the slot to compare. The time complexity of this search process is O (1).
Conflict (collisions): If the integer set has 44 and 222 numbers, the two numbers divided by 11 is 0, then a drop violation occurs.
3.1 Hash Function selection
The main purpose of choosing a hash function is to reduce the occurrence of conflicts.
h(k) = k mod m
- The Multiplication method
h(k) = m(k*A mod 1)
The result is to be rounded down
kA mod 1
k*A
the number of decimal points represented
There are also the following two types:
The folding method for constructing hash functions begins by dividing the item into equal-size pieces (the last Piece may is not of equal size). These pieces is then added together to give the resulting hash value. For example, if our item is the phone number 436-555-4601, we would take the digits and divide them into groups of 2 (43 , 65, 55, 46, 01). After the addition, we get 210 + + + + +. If We assume our hash table have one slots, then we need to perform the extra step of dividing by one and keeping the Remaind Er. In this case 210%11 are 1, so the phone number 436-555-4601 hashes to slot 1. Some folding methods Go one step further and reverse every other piece before the addition. For the above example, we get 10 + + + + + = 219 which gives 219%11
Another numerical technique for constructing a hash function is called the Mid-square method. We First square the item, and then extract some portion of the resulting digits. For Ex-ample, if the item were, we would first compute 442 = 1, 936. By extracting the middle, digits, and performing the remainder step, we get 5 (93%11). Table 5.5 shows items under both the remainder method and the Mid-square method. You should verify so understand how these values were computed.
PS: hash table does not become the mainstream of storage and search. Although the time complexity of the hash table search is O (1). This is because if the hash function is too complex, the name of the compute slot takes a long time and becomes less efficient.
3.2 Conflict resolution (collision)
- Open addressing and linear probing (linear probing)
When we need to put the number in the hash table to 54, 26, 93, 17, 77, 31, 44, 55, 20), and the hash function is h(item)=item%11
. As you can see, the name of the number 77,44,55 is 0. So when we place 44, since slot 0 is already occupied by 77, then we put 44 in the position of the next slot based on the linear probe (slot by slot), and if it is still filled, continue down until an empty slot is found. According to this logic, 44 insert slot 1,55 into slot 2. This method rehash the formula: rehash(pos) = (pos + 1)%size_of_table
.
However, the last number 20 should be placed in slot 9, but since slot 9 is full, and after the slot 10,0,1,2 is full, then 20 can be placed in the slot 3 position.
Once we have constructed such a hash table, when we are looking for it, we calculate the name of item based on the given hash function, and if it is not found in the slot name, continue to the next slot until the item is found to return TRUE or return false until an empty slot is encountered.
However, this can lead to a problem, that is, element aggregation (clustering), as we just example, all the elements are stacked around the slot 0, which will cause many elements to be placed in conflict and then use linear detection to shift backwards, although in this technically feasible, But it severely reduces efficiency.
One way to solve the clustering is to increase the distance of the probe (probing), for example, when we place 44, we do not place the latter slot 1 in slot 0, but we put it in slot 3, we can define this rule as "plus 3"prob
. Rehash's formula changed torehash(pos) = (pos + 3)%size_of_table
What's more, we can set this step as a variable number of two times (h+1, H+4, h+9, h+16 ... )
The second method is to create a linked list on the conflicting slots, as shown in:
Advantage: This method is more efficient because each slot has its corresponding element stored in it.
3.3 Code Building
The following Python code uses two lists to create a hash table. A list stores the key value of item in all slots (because this item uses a string type, it needs to be converted to an integer, corresponding to key), and the other parallel list stores the corresponding data.
It is important that the size of the hash table being a prime number (prime number) so, the collision resolution algorithm can be as eff Icient as possible.
class HashTable: def__init__(self): self=11 self=[None]*self.size self= [None]*self.size
put(key, data)
Functions represent the process of placing elements into a slot, and the main part is the process of resolving conflicts
defPut Self, key, data): Hash_value= Self. hash_function (Key,Len( Self. Slots))if Self. Slots[hash_value]== None: Self. Slots[hash_value]=Key Self. Data[hash_value]=DataElse:if Self. Slots[hash_value]==Key Self. Data[hash_value]=Data#replace Else: Next_slot= Self. Rehash (Hash_value,Len( Self. Slots)) while Self. Slots[next_slot]!= None and Self. Slots[next_slot]!=Key:next_slot= Self. Rehash (Next_slot,Len( Self. Slots))if Self. Slots[next_slot]== None: Self. Slots[next_slot]=Key Self. Data[next_slot]=DataElse:# Self.slots[next_slot] = = Key Self. Data[next_slot]=Data#replace defHash_function ( Self, key, size):returnKey%SizedefRehash Self, Old_hash, size):return(Old_hash+1)%Size
get(key)
The function represents the process of taking out.
defGet Self, key): Start_slot= Self. hash_function (Key,Len( Self. Slots)) data= NoneStop= FalseFound= FalsePosition=Start_slot while Self. slots[position]!= None and notFound and notStopif Self. slots[position]==Key:found= TrueData= Self. data[position]Else: Position= Self. Rehash (position,Len( Self. Slots))ifPosition==Start_slot:#遍历完所有槽Stop= True returnDatadef __getitem__( Self, key):return Self. Get (Key)def __setitem__( Self, key, data): Self. put (key, data)
__getitem__()
For me this little white is still a bit difficult to understand, here refers to the Liaoche tutorial corresponding content:
Similar __xxx__
properties and methods are used in Python for special purposes, such as the __len__
method returns the length. In Python, if you call len()
a function to try to get the length of an object, in fact, len()
inside the function, it automatically calls the method of the object __len__()
To act as a list to take out elements according to subscript, you need to implement the __getitem__()
method
The corresponding method is to treat the __setitem__()
object as a list or dict to assign a value to the set.
Of course, there is one more problem to be solved for this operation, and that is how to implement the conversion of string to int. In fact, Python gives a ord()
function
ord()
: it returns the corresponding ASCII value as an argument with a character (a string of length 1 )
defhash(a_string): =0 forinrange(len(a_string)): =+ord(a_string[pos]) return key
Example:
h== [‘bird‘,‘dog‘,‘goat‘,‘chicken‘,‘lion‘,‘tiger‘,‘pig‘,‘cat‘= []forin data: keys.append(hash(animal))print(keys)
[417, 314, 427, 725, 434, 539, 320, 312]
forinenumerate(keys): = data[i]print(h.slots)print(h.data)
[725, 539, 320, None, 312, 434, 314, None, None, 427, 417][‘chicken‘, ‘tiger‘, ‘pig‘, None, ‘cat‘, ‘lion‘, ‘dog‘, None, None, ‘goat‘, ‘bird‘]
The time complexity is obviously different for successful finding and not finding.
For successfully found scenarios, the average time complexity of a hash table with a linear exploration strategy is: $\frac{1}{2} (1+\frac{1}{(1-\LAMBDA)}) $
For cases not successfully found, the time complexity is:
$\frac{1}{2} (1+\frac{1}{(1-\LAMBDA) ^{2}}) $
If we apply a hash table with chaining, the time complexity of a successful search is:
$ + \frac{??} {2}$
The time complexity that is not searched is:??
Python data structure Application 4--Search