Algorithms Part 1-Question 6-2sum median-number and Median

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This time there are two programming problems: one is to find two numbers and the number that satisfies a certain value, and the other is to find the median.

2sum Problems

Problem description

The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)

Solution:

The data size is 1000000, and each number needs to be recycled once, and each number finds the matching y value. The next step is the key. If hash is used to divide the majority of data segments by size into 2 ^ 15 data segments, you only need to traverse two data segments for each X, and the data is sparse, each data segment may have only one or two values, so the algorithm complexity is O (n ).

The specific implementation is as follows:

from time import clockstart=clock()def myhash(val):    return val>>15f=open('algo1-programming_prob-2sum.txt','r')valnew=[True for x in range(6103503)]tlist=[0 for x in range(-10000,10000+1)]tmp=f.read()f.close()print('read complete')vallist=[int(val) for val in tmp.split()]vallist=set(vallist)print('convert to set@int complete')minval=min(vallist)for val in vallist:    val_key=myhash(val-minval)    if valnew[val_key]==True:        valnew[val_key]=[val]    else:        valnew[val_key].append(val)print('hash complete',len(valnew),len(vallist))for val in vallist:    firkey=myhash(-10000-val-minval)    seckey=myhash(10000-val-minval)    if firkey<len(valnew):        if valnew[firkey]!=True:            for tmp in valnew[firkey]:                if tmp+val in range(-10000,10000+1):                    tlist[tmp+val+10000]=1    if firkey<len(valnew):        if valnew[seckey]!=True:            for tmp in valnew[seckey]:                if tmp+val in range(-10000,10000+1):                    tlist[tmp+val+10000]=1print('output: ',sum(tlist))finish=clock()print finish-start##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##480.193410146##user@hn:~/pyscripts$ python 2sum_hash.py##read complete##convert to set@int complete##('hash complete', 6103503, 999752)##('output: ', ***)##183.92

480 s is used in Win32, but only 180 s is required in Debian. The number of Forum users reaches 0.53 s, and I still have a lot of room for improvement.
Median

Problem description:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.

In addition to the method for sorting the median of each new array, two heaps can be used to quickly complete the operation. During the continuous arrival of data, we need to constantly maintain two heap so that the size difference between the two heap is not greater than 1. One is the smallest heap, and the other is the largest heap, half stores large and small existing data respectively.

In python, only heapq provides the minimum heap, but the maximum heap can be obtained by obtaining the value.

This time I implemented two algorithms, with significant speed gaps. Implementation Algorithm:

from time import clockfrom heapq import heappush,heappopstart=clock()f=open('Median.txt','r')tmp=f.read()f.close()data=[int(val) for val in tmp.split()]out=[0 for x in range(len(data))]#rudeway with high complexity#17s running timedef rudeway(data,out):    for ind in range(len(data)):        b=data[0:ind+1]        b.sort()        out.append(b[(len(b)+1)/2-1])    return sum(out)%10000#print(rudeway(data,out))#use heapq, minus(min heap)=max heap#0.231407100855sdef heapway(data,out):    lheap=[]    rheap=[]    out[0]=data[0]    tmp=sorted(data[0:2])    out[1]=tmp[0]    heappush(lheap,-tmp[0])    heappush(rheap,tmp[1])    for ind in range(2,len(data)):        if data[ind]>rheap[0]:            heappush(rheap,data[ind])        else:            heappush(lheap,-data[ind])        if len(rheap)>len(lheap):            heappush(lheap,-heappop(rheap))        if len(lheap)>len(rheap)+1:            heappush(rheap,-heappop(lheap))        out[ind]=-lheap[0]    return sum(out)%10000        print(heapway(data,out))finish=clock()print finish-start

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Algorithms Part 1-Question 6-2sum median-number and Median

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Algorithms Part 1-Question 6-2sum median-number and Median

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support