Smith-waterman algorithm and its Java implementation

Last Update:2017-06-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Smith-waterman algorithm is a dynamic programming algorithm proposed by Smith and Waterman in 1981 to find and compare local similarity regions, and many later algorithms are developed on the basis of this algorithm. This is a two-sequence local comparison algorithm, the two unknown sequence is arranged, through the letter matching, delete and insert operation, so that two series reached the same length, in the course of operation, as far as possible to keep the same letter corresponding to the same position. When two sequences are compared, the optimal alignment of a sub-fragment in a sequence is found. This alignment method may reveal a number of matched sequence segments, which would have been submerged by some completely unrelated residues.

The algorithm is simply described as:
1) for each base pair or residual base pair assignment. Giving positive values of the same or similar, giving negative values to different or vacant spaces;
2) Initialize the edge element of the matrix with 0;
3) The score value in the matrix is added, and any score value less than 0 is replaced by 0;
4) through the dynamic programming method, from the matrix of the largest score unit to start backtracking search;
5) Continue, until the cell with a score of 0 stops, the unit of this backtracking path is the optimal alignment sequence.
From the above, the Smith-waterman algorithm is mainly divided into two steps. Calculates the score matrix and finds the best similar fragment pair. After the scoring matrix is obtained, the local maximal similarity fragment pairs are found by the method of dynamic programming backtracking: first find the largest element in the score matrix. Then follow the element's original path step-by-step backwards until it goes back to 0 o'clock and stops.

Here is an example of the original paper from Smith-waterman.
1) We assume that the two sequences that need to be matched are s1=aaugccauugacgg,s2=acagccucgcuuag.
2) First, compute the matching degree matrix H. Find the tuple H (10,8) with the highest score (3.3) in the matrix and begin the backtracking process.
3) The idea of backtracking is very simple, is to check the tuple above the tuple, the left, and the upper right, to see if its score is equal to the top-4/3, or left-4/3, or left +1, or left-1/3. In short, just look at the tuple as "who's coming from."
4) The critical condition of the backtracking termination is that a tuple has a score of 0, which means that we have not found a substring that matches the two strings.
5) After the entire backtracking process is complete, the following substring is found:

Aaugccauug
Acagcc-ucg

Here is the source code written in the Java language:

Import Java.io.bufferedreader;import Java.io.ioexception;import Java.io.inputstreamreader;import    Java.util.arraylist;import Java.util.iterator;import Java.util.stack;public class Swsq {private int[][] H;    Private int[][] IsEmpty;                      private static int SPACE;                       The space matches the score of the private static int match;                    Two letters with the same score private static int dismach;    Two letters different score private int MAXINDEXM, MAXINDEXN;    Private stack<character> stk1, stk2;                        Public String subSq1, SUBSQ2;        The two substrings with the highest similarity are public swsq () {stk1 = new stack<character> ();        STK2 = new stack<character> ();        SPACE =-4;        MATCH = 3;    Dismach =-1;        } private int Max (int A, int b, int c) {int maxn;        if (a >= b) maxn = A;        else MAXN = b;        if (Maxn < c) MAXN = C;        if (MAXN < 0) MAXN = 0;    return MAXN;   } private void Calculatematrix (string s1, string s2, int m, int n) {//Calculated score Matrix if (M = = 0) H[m][n] = 0;        else if (n = = 0) H[m][n] = 0;            else{if (isempty[m-1][n-1] = = 1) calculatematrix (S1, S2, m-1, n-1);            if (isempty[m][n-1] = = 1) calculatematrix (S1, S2, M, n-1);            if (isempty[m-1][n] = = 1) calculatematrix (S1, S2, m-1, N); if (S1.charat (m-1) = = S2.charat (n-1)) h[m][n] = max (h[m-1][n-1] + MATCH, h[m][n-1] + SPACE, h[m-1][n            ] + SPACE);        else H[m][n] = max (h[m-1][n-1] + Dismach, h[m][n-1] + space, H[m-1][n] + space);    } Isempty[m][n] = 0;        } private void Findmaxindex (int[][] H, int m, int n) {//find subscript int Curm, Curn, I, J, Max for the highest-scoring tuple in the score matrix H;        Curm = 0;        Curn = 0;        max = h[0][0]; for (i = 0; i < m; i++) for (j = 0; J < N; j + +) if (h[I][J] > Max) {max = h[i][j];                    Curm = i;                Curn = j;        } MAXINDEXM = Curm;    Maxindexn = Curn;         } private void TraceBack (string s1, string s2, int m, int n) {//backtracking to find the most similar subsequence if (h[m][n] = = 0) return;            if (h[m][n] = = H[m-1][n] + SPACE) {Stk1.add (S1.charat (m-1));            Stk2.add ('-');        TraceBack (S1, S2, m-1, N);            } else if (h[m][n] = = h[m][n-1] + SPACE) {stk1.add ('-');            Stk2.add (S2.charat (n-1));        TraceBack (S1, S2, M, n-1);            } else {Stk1.push (S1.charat (m-1));            Stk2.push (S2.charat (n-1));        TraceBack (S1, S2, m-1, n-1);        }} public String altostring (arraylist<character> A) {StringBuilder sb = new StringBuilder ();        for (Character a:a) {sb.append (a.tostring ());    } return sb.tostring (); } public void FinD (string s1, string s2) {//initmatrix (S1.length (), s2.length ());        int I, J;        H = new Int[s1.length () + 1][s2.length () + 1];        IsEmpty = new Int[s1.length () + 1][s2.length () + 1];        for (i = 0; I<=s1.length (), i++) for (j = 0; J<=s2.length (); j + +) isempty[i][j] = 1;        Calculatematrix (S1, S2, S1.length (), s2.length ());        Findmaxindex (H, H.length, h[0].length);        TraceBack (S1, S2, MAXINDEXM, MAXINDEXN);        arraylist<character> arr1 = new arraylist<> ();        arraylist<character> arr2 = new arraylist<> ();        while (!stk1.empty ()) Arr1.add (Stk1.pop ());        SUBSQ1 = altostring (arr1);        while (!stk2.empty ()) Arr2.add (Stk2.pop ());    SUBSQ2 = altostring (ARR2);        public static void Main (string[] args) throws IOException {swsq x = new SWSQ ();        String S1 = "AAUGCCAUUGACGG";        String s2 = "Acagccucgcuuag";      X.find (s1, S2);  System.out.println ("----------------------------");        System.out.println (S1);        SYSTEM.OUT.PRINTLN (S2);        System.out.println ("----------------------------");        System.out.println (X.SUBSQ1);    System.out.println (X.SUBSQ2);   }}

Smith-waterman algorithm and its Java implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Smith-waterman algorithm and its Java implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Smith-waterman algorithm and its Java implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support