Bioinformatics principles of Operation IV-bomb: DNA sequence assembly (greedy algorithm)
Principle: Bioinformatics (Sun Yu)
General idea:
1. Find the most weighted edge;
2. Remove the edge with the starting vertex as the maximum weight edge;
3. Remove the edge at the end of the maximum weighted edge;
4. Repeat the above steps to obtain all eligible edges;
5. Stitching the resulting edges;
6. Join the outlier (if any).
Attached Python code, if there is a problem I will promptly correct (really not very familiar algorithm)
Simple DNA sequence assembly (greedy algorithm)
Reprint please keep the source!
1 #-*-coding:utf-8-*-2 """3 Created on Mon Dec 4 15:04:394 @author: Zxzhu5 python3.66 """7 fromFunctoolsImportReduce8 defGet_weight (S1,S2):#calculate weights by overlap of two sequences9L =min (len (S1), Len (S2))Ten whileL>0: One ifS2[:L] = = s1[-L:]: A returnL - Else: -L-=1 the return0 - - defPrint_result (S1,S2):#remove two sequences after overlap Merge -Weight =get_weight (S1,S2) +s = S1 +S2[weight:] - #print (s) + returns A at defDir_graph (l,t=3):#a forward graph (with weights greater than or equal to T) that satisfies the condition -Graph = {} - forIinchL: - forJinchL: - ifi!=J: -Weight =get_weight (i,j) in ifWeight >=T: -Key =(I,J) toGraph[key] =Weight + returnGraph - the defRm_path (Graph,path):#The greedy algorithm joins an edge and should remove the same edge as the vertex *Key =Graph.keys () $Rm_key = []Panax Notoginseng forIinchKey: - ifI[1] = = Path[1]orI[0] = =Path[0]: the rm_key.append (i) + forIinchRm_key: A Graph.pop (i) the returnGraph + - defGet_path (Graph,path = []):#get all sides that meet the conditions $ whileGraph: $Max_weight =0 - forIinchGraph.keys (): - ifGraph[i] >Max_weight: theMax_weight =Graph[i] -Cur_path =IWuyi path.append (Cur_path) theGraph =Rm_path (Graph,cur_path) - Get_path (Graph,path) Wu returnPath - About defOut_num (PATH,V):#calculate the degree of a vertex $Count =0 - forIinchPath: - ifI[0] = =V: -Count+=1 A returnCount + the defGet_last_v (path,last_v = None):#get the last side of the line -index =0 $ ifLast_v:#non-random search for vertices with a degree of 0 the forIinchPath: the ifI[1] = =Last_v: the returnI,index the Else: -Index+=1 in returnNone#no vertices found pointing to last_v the Else:#randomly finding vertices with a degree of 0 the forIinchPath: About ifOut_num (path,i[1]) = =0: the returnI,index the Else: theIndex+=1 + - defAssemble (Cur_v,path,new_path = []):#sort edges that meet conditions the whilePath:BayiPath.pop (cur_v[1]) the New_path.insert (0,cur_v[0]) theCur_v = get_last_v (Path,last_v =cur_v[0][0]) - ifCur_v: - Assemble (Cur_v,path,new_path) the Else: theCur_v =get_last_v (path) the Assemble (Cur_v,path,new_path) the returnNew_path - the defAlign_isolated (path,sequence):#Join orphaned vertices theNew_path = reduce (Lambdax,y:x+Y,path) the forIinchsequence:94 ifI not inchNew_path: the new_path.append (i) the returnNew_path the 98 Aboutx ='Ccttttgg' -y ='ttggcaatcact'101W ='AGTATTGGCAATC'102U ='Atgcaaacct'103z ='AATCGATG'104v ='TCACTCCTTTT' theA ='Cataa'106b ='ATCA'107c ='Tgcat'108sequence =[X,y,w,u,z]109Sequence1 =[A,b,c] theGraph = Dir_graph (sequence1,t=2)111Path =Get_path (graph) thePath = [List (i) forIinchPath#change a tuple element in path to a list113 #print (path) theStart = get_last_v (path)#the edge on which the vertex with the starting out is 0 theNew_path = Assemble (Start,path)#the sorted edge the #print (New_path)117New_path = align_isolated (New_path,sequence1)#Join orphaned vertices118 #print (New_path)119result = reduce (Print_result,new_path)#Assembly - Print(Result)
Simple DNA sequence assembly (greedy algorithm)