Algorithms Part 1-Question 6-2sum median-number and Median

Source: Internet
Author: User

This time there are two programming problems: one is to find two numbers and the number that satisfies a certain value, and the other is to find the median.

2sum Problems

Problem description

The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)

Solution:

The data size is 1000000, and each number needs to be recycled once, and each number finds the matching y value. The next step is the key. If hash is used to divide the majority of data segments by size into 2 ^ 15 data segments, you only need to traverse two data segments for each X, and the data is sparse, each data segment may have only one or two values, so the algorithm complexity is O (n ).

The specific implementation is as follows:

from time import clockstart=clock()def myhash(val):    return val>>15f=open('algo1-programming_prob-2sum.txt','r')valnew=[True for x in range(6103503)]tlist=[0 for x in range(-10000,10000+1)]tmp=f.read()f.close()print('read complete')vallist=[int(val) for val in tmp.split()]vallist=set(vallist)print('convert to set@int complete')minval=min(vallist)for val in vallist:    val_key=myhash(val-minval)    if valnew[val_key]==True:        valnew[val_key]=[val]    else:        valnew[val_key].append(val)print('hash complete',len(valnew),len(vallist))for val in vallist:    firkey=myhash(-10000-val-minval)    seckey=myhash(10000-val-minval)    if firkey<len(valnew):        if valnew[firkey]!=True:            for tmp in valnew[firkey]:                if tmp+val in range(-10000,10000+1):                    tlist[tmp+val+10000]=1    if firkey<len(valnew):        if valnew[seckey]!=True:            for tmp in valnew[seckey]:                if tmp+val in range(-10000,10000+1):                    tlist[tmp+val+10000]=1print('output: ',sum(tlist))finish=clock()print finish-start##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##480.193410146##user@hn:~/pyscripts$ python 2sum_hash.py##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##183.92

480 s is used in Win32, but only 180 s is required in Debian. The number of Forum users reaches 0.53 s, and I still have a lot of room for improvement.
Median

Problem description:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.

In addition to the method for sorting the median of each new array, two heaps can be used to quickly complete the operation. During the continuous arrival of data, we need to constantly maintain two heap so that the size difference between the two heap is not greater than 1. One is the smallest heap, and the other is the largest heap, half stores large and small existing data respectively.

In python, only heapq provides the minimum heap, but the maximum heap can be obtained by obtaining the value.

This time I implemented two algorithms, with significant speed gaps. Implementation Algorithm:

from time import clockfrom heapq import heappush,heappopstart=clock()f=open('Median.txt','r')tmp=f.read()f.close()data=[int(val) for val in tmp.split()]out=[0 for x in range(len(data))]#rudeway with high complexity#17s running timedef rudeway(data,out):    for ind in range(len(data)):        b=data[0:ind+1]        b.sort()        out.append(b[(len(b)+1)/2-1])    return sum(out)%10000#print(rudeway(data,out))#use heapq, minus(min heap)=max heap#0.231407100855sdef heapway(data,out):    lheap=[]    rheap=[]    out[0]=data[0]    tmp=sorted(data[0:2])    out[1]=tmp[0]    heappush(lheap,-tmp[0])    heappush(rheap,tmp[1])    for ind in range(2,len(data)):        if data[ind]>rheap[0]:            heappush(rheap,data[ind])        else:            heappush(lheap,-data[ind])        if len(rheap)>len(lheap):            heappush(lheap,-heappop(rheap))        if len(lheap)>len(rheap)+1:            heappush(rheap,-heappop(lheap))        out[ind]=-lheap[0]    return sum(out)%10000        print(heapway(data,out))finish=clock()print finish-start

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.