This time there are two programming problems: one is to find two numbers and the number that satisfies a certain value, and the other is to find the median.
2sum Problems
Problem description
The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)
Solution:
The data size is 1000000, and each number needs to be recycled once, and each number finds the matching y value. The next step is the key. If hash is used to divide the majority of data segments by size into 2 ^ 15 data segments, you only need to traverse two data segments for each X, and the data is sparse, each data segment may have only one or two values, so the algorithm complexity is O (n ).
The specific implementation is as follows:
from time import clockstart=clock()def myhash(val): return val>>15f=open('algo1-programming_prob-2sum.txt','r')valnew=[True for x in range(6103503)]tlist=[0 for x in range(-10000,10000+1)]tmp=f.read()f.close()print('read complete')vallist=[int(val) for val in tmp.split()]vallist=set(vallist)print('convert to set@int complete')minval=min(vallist)for val in vallist: val_key=myhash(val-minval) if valnew[val_key]==True: valnew[val_key]=[val] else: valnew[val_key].append(val)print('hash complete',len(valnew),len(vallist))for val in vallist: firkey=myhash(-10000-val-minval) seckey=myhash(10000-val-minval) if firkey<len(valnew): if valnew[firkey]!=True: for tmp in valnew[firkey]: if tmp+val in range(-10000,10000+1): tlist[tmp+val+10000]=1 if firkey<len(valnew): if valnew[seckey]!=True: for tmp in valnew[seckey]: if tmp+val in range(-10000,10000+1): tlist[tmp+val+10000]=1print('output: ',sum(tlist))finish=clock()print finish-start##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##480.193410146##user@hn:~/pyscripts$ python 2sum_hash.py##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##183.92
480 s is used in Win32, but only 180 s is required in Debian. The number of Forum users reaches 0.53 s, and I still have a lot of room for improvement.
Median
Problem description:
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.
In addition to the method for sorting the median of each new array, two heaps can be used to quickly complete the operation. During the continuous arrival of data, we need to constantly maintain two heap so that the size difference between the two heap is not greater than 1. One is the smallest heap, and the other is the largest heap, half stores large and small existing data respectively.
In python, only heapq provides the minimum heap, but the maximum heap can be obtained by obtaining the value.
This time I implemented two algorithms, with significant speed gaps. Implementation Algorithm:
from time import clockfrom heapq import heappush,heappopstart=clock()f=open('Median.txt','r')tmp=f.read()f.close()data=[int(val) for val in tmp.split()]out=[0 for x in range(len(data))]#rudeway with high complexity#17s running timedef rudeway(data,out): for ind in range(len(data)): b=data[0:ind+1] b.sort() out.append(b[(len(b)+1)/2-1]) return sum(out)%10000#print(rudeway(data,out))#use heapq, minus(min heap)=max heap#0.231407100855sdef heapway(data,out): lheap=[] rheap=[] out[0]=data[0] tmp=sorted(data[0:2]) out[1]=tmp[0] heappush(lheap,-tmp[0]) heappush(rheap,tmp[1]) for ind in range(2,len(data)): if data[ind]>rheap[0]: heappush(rheap,data[ind]) else: heappush(lheap,-data[ind]) if len(rheap)>len(lheap): heappush(lheap,-heappop(rheap)) if len(lheap)>len(rheap)+1: heappush(rheap,-heappop(lheap)) out[ind]=-lheap[0] return sum(out)%10000 print(heapway(data,out))finish=clock()print finish-start