Algorithms:
Design and Analysis, Part 1
本次作業是演算法課程中最難的一次。我想,除了演算法之外,還牽涉到實現的問題。因為很多程式設計語言都無法處理極大次數的遞迴調用。
題目說明
Download the text file here. Zipped version here.
(Right click and save link as)
The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is
directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11th row
looks liks : "2 47646". This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646
Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.
Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100,
then your answer should be "500,400,300,200,100". If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be "400,300,100,0,0".
WARNING: This is the most challenging programming assignment of the course. Because of the size of the graph you may have to manage memory carefully. The best way to do this depends on your programming language and environment, and we strongly suggest that
you exchange tips for doing this on the discussion forums.
演算法實現
演算法的實現較為簡單,分為三步,第一步求圖的轉置,第二步DFS遍曆轉置後的圖,得到拓撲序列,第三步使用這個拓撲序列對圖進行DFS,每次DFS得到的點即為一個SCC/強聯通集合。
初步使用python實現的代碼如下:
def firstdfs(vertexind): global fs,isexplored,visitordered,mapDictT if len(mapDictT[vertexind])>0: for ind in mapDictT[vertexind]: if not isexplored[ind-1]: isexplored[ind-1]=True firstdfs(ind) visitordered[fs-1]=vertexind #print(str(vertexind)+' fs: '+str(fs)) fs=fs-1def seconddfs(vertexind): global s,secisexplored,header,mapDict if len(mapDict[vertexind])==0:return for ind in mapDict[vertexind]: if not secisexplored[ind-1]: secisexplored[ind-1]=True seconddfs(ind) header[s-1]+=1maplength=875714#maplength=8f=open('SCC.txt','r')mapDict={x:[] for x in range(1,maplength+1)}mapDictT={x:[] for x in range(1,maplength+1)}for line in f.readlines(): tmp=[int(x) for x in line.split()] mapDict[tmp[0]].append(tmp[1]) mapDictT[tmp[1]].append(tmp[0])f.closefs=maplengthisexplored=[False for x in range(1,maplength+1)]secisexplored=[False for x in range(1,maplength+1)]visitordered=[0 for x in range(1,maplength+1)]header=[0 for x in range(1,maplength+1)]for ind in range(1,maplength+1): if not isexplored[ind-1]: #print('Begin from: '+str(ind)) isexplored[ind-1]=True firstdfs(ind)print('Second DFS')for ind in visitordered: if not secisexplored[ind-1]: s=ind secisexplored[ind-1]=True seconddfs(ind)header.sort(reverse=True)print(header[0:20])
用來測試的圖儲存在文字檔中,測試用檔案內容如下:
1 22 62 32 43 13 44 55 46 56 77 67 88 58 7
注意,maplength測試時要改成8。輸出的前五個應該是
3,3,2,0,0
Python的迭代次數限制
Python有預設的函數迭代次數限制,預設一般不超過1000,如果超過此次數會造成棧溢出錯誤。使用下列代碼可以更改預設迭代限制並顯示
import syssys.setrecursionlimit(80000000)print(sys.getrecursionlimit())
使用下列代碼可以測試實際能達到的迭代次數
def f(d):if d%500==0:print(d)f(d+1)f*(1)
使用上述代碼測試,更改限制後,win8系統下面python3.3記憶體8G最大迭代次數4000左右,debian6系統下python3.2記憶體16G最大迭代次數26000左右,記憶體均未耗盡。儘管次數限制放寬,但由於某些原因還是受到限制。這種情況下不會報告棧溢出錯誤,但程式同樣會crash。
問題在哪裡呢?到論壇看了一下別人的討論,才明白還有可能棧size不夠。
完善版本的程式
把棧size設定為64M之後就ok了。
完整代碼如下:
import sys,threadingsys.setrecursionlimit(3000000)threading.stack_size(67108864)def firstdfs(vertexind): global fs,isexplored,visitordered,mapDictT if len(mapDictT[vertexind])>0: for ind in mapDictT[vertexind]: if not isexplored[ind-1]: isexplored[ind-1]=True firstdfs(ind) visitordered[fs-1]=vertexind #print(str(vertexind)+' fs: '+str(fs)) fs=fs-1def seconddfs(vertexind): global s,secisexplored,header,mapDict if len(mapDict[vertexind])==0:return for ind in mapDict[vertexind]: if not secisexplored[ind-1]: secisexplored[ind-1]=True seconddfs(ind) header[s-1]+=1def sccmain(): global mapDict,mapDictT,fs,isexplored,visitordered,s,secisexplored,header maplength=875714 #maplength=11 f=open('SCC.txt','r') mapDict={x:[] for x in range(1,maplength+1)} mapDictT={x:[] for x in range(1,maplength+1)} for line in f.readlines(): tmp=[int(x) for x in line.split()] mapDict[tmp[0]].append(tmp[1]) mapDictT[tmp[1]].append(tmp[0]) f.close fs=maplength isexplored=[False for x in range(1,maplength+1)] secisexplored=[False for x in range(1,maplength+1)] visitordered=[0 for x in range(1,maplength+1)] header=[0 for x in range(1,maplength+1)] for ind in range(1,maplength+1): if not isexplored[ind-1]: #print('Begin from: '+str(ind)) isexplored[ind-1]=True firstdfs(ind) print('Second DFS') for ind in visitordered: if not secisexplored[ind-1]: s=ind secisexplored[ind-1]=True seconddfs(ind) header.sort(reverse=True) print(header[0:20])if __name__ =='__main__': thread=threading.Thread(target=sccmain) thread.start()