Simple DNA sequence assembly (greedy algorithm)

Source: Internet
Author: User

Bioinformatics principles of Operation IV-bomb: DNA sequence assembly (greedy algorithm)

Principle: Bioinformatics (Sun Yu)

General idea:

1. Find the most weighted edge;

2. Remove the edge with the starting vertex as the maximum weight edge;

3. Remove the edge at the end of the maximum weighted edge;

4. Repeat the above steps to obtain all eligible edges;

5. Stitching the resulting edges;

6. Join the outlier (if any).

Attached Python code, if there is a problem I will promptly correct (really not very familiar algorithm)

Simple DNA sequence assembly (greedy algorithm)

Reprint please keep the source!

1 #-*-coding:utf-8-*-2 """3 Created on Mon Dec 4 15:04:394 @author: Zxzhu5 python3.66 """7  fromFunctoolsImportReduce8 defGet_weight (S1,S2):#calculate weights by overlap of two sequences9L =min (len (S1), Len (S2))Ten      whileL>0: One         ifS2[:L] = = s1[-L:]: A             returnL -         Else: -L-=1 the     return0 -  - defPrint_result (S1,S2):#remove two sequences after overlap Merge -Weight =get_weight (S1,S2) +s = S1 +S2[weight:] -     #print (s) +     returns A  at defDir_graph (l,t=3):#a forward graph (with weights greater than or equal to T) that satisfies the condition -Graph = {} -      forIinchL: -          forJinchL: -             ifi!=J: -Weight =get_weight (i,j) in                 ifWeight >=T: -Key =(I,J) toGraph[key] =Weight +     returnGraph -  the defRm_path (Graph,path):#The greedy algorithm joins an edge and should remove the same edge as the vertex *Key =Graph.keys () $Rm_key = []Panax Notoginseng      forIinchKey: -         ifI[1] = = Path[1]orI[0] = =Path[0]: the rm_key.append (i) +      forIinchRm_key: A Graph.pop (i) the     returnGraph +  - defGet_path (Graph,path = []):#get all sides that meet the conditions $      whileGraph: $Max_weight =0 -          forIinchGraph.keys (): -             ifGraph[i] >Max_weight: theMax_weight =Graph[i] -Cur_path =IWuyi path.append (Cur_path) theGraph =Rm_path (Graph,cur_path) - Get_path (Graph,path) Wu     returnPath -  About defOut_num (PATH,V):#calculate the degree of a vertex $Count =0 -      forIinchPath: -         ifI[0] = =V: -Count+=1 A     returnCount +  the defGet_last_v (path,last_v = None):#get the last side of the line -index =0 $     ifLast_v:#non-random search for vertices with a degree of 0 the          forIinchPath: the             ifI[1] = =Last_v: the                 returnI,index the             Else: -Index+=1 in         returnNone#no vertices found pointing to last_v the     Else:#randomly finding vertices with a degree of 0 the          forIinchPath: About             ifOut_num (path,i[1]) = =0: the                 returnI,index the             Else: theIndex+=1 +             - defAssemble (Cur_v,path,new_path = []):#sort edges that meet conditions the      whilePath:BayiPath.pop (cur_v[1]) the New_path.insert (0,cur_v[0]) theCur_v = get_last_v (Path,last_v =cur_v[0][0]) -         ifCur_v: - Assemble (Cur_v,path,new_path) the         Else: theCur_v =get_last_v (path) the Assemble (Cur_v,path,new_path) the     returnNew_path -  the defAlign_isolated (path,sequence):#Join orphaned vertices theNew_path = reduce (Lambdax,y:x+Y,path) the      forIinchsequence:94         ifI not inchNew_path: the new_path.append (i) the     returnNew_path the             98  Aboutx ='Ccttttgg' -y ='ttggcaatcact'101W ='AGTATTGGCAATC'102U ='Atgcaaacct'103z ='AATCGATG'104v ='TCACTCCTTTT' theA ='Cataa'106b ='ATCA'107c ='Tgcat'108sequence =[X,y,w,u,z]109Sequence1 =[A,b,c] theGraph = Dir_graph (sequence1,t=2)111Path =Get_path (graph) thePath = [List (i) forIinchPath#change a tuple element in path to a list113 #print (path) theStart = get_last_v (path)#the edge on which the vertex with the starting out is 0 theNew_path = Assemble (Start,path)#the sorted edge the #print (New_path)117New_path = align_isolated (New_path,sequence1)#Join orphaned vertices118 #print (New_path)119result = reduce (Print_result,new_path)#Assembly - Print(Result)

Simple DNA sequence assembly (greedy algorithm)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.