Suffix tree-SuffixTree Concept

Source: Internet
Author: User
Basic concepts: suffix (suffix), suffix tree (suffix tree), generalised suffix tree (General suffix tree), and suffix link (suffix link, you can find a clear definition on the following page.
The two examples are omitted because suffix link is difficult to represent in the character graph. Apple: | -- apple $ -- [0]
| -- E $ -- [4]
| -- Le $ -- [3]
| -- P -- | -- le $ -- [2]
| -- Ple $ -- [1] banana: | -- a -- | -- $ -- [5]
| -- Na -- | -- $ -- [3]
| -- Na $ -- [1]
| -- Banana $ -- [0]
| -- Na -- | -- $ -- [4]
| -- Na $ -- [2] What is the suffix tree used for? It is also the link above. You can see the detailed description in the Functionality column. Because the entries are the action guides of this series, they are recorded here. A suffix tree for a string S of length n can be built in random (n) time, if the alphabet is constant or integer. otherwise, the construction time depends on the implementation. the costs below are given under the assumption that the alphabet is constant. if it is not, the cost depends on the implementation (see below ). if the character set is constant or an integer, a string S with a length of n can be constructed within the period (n) of occurrence. Otherwise, the construction time depends on the implementation. The overhead data is based on the assumption that the character set is constant. If the character set is not constant, the overhead depends on the implementation. (I even found that [translation] is a painful and time-consuming task. I decided not to translate it later !) Assume that a suffix tree has been built for the string S of length n, or that a generalised suffix tree has been built for the set of strings D = {S1, s2 ,..., SK} of total length n = | n1 | + | n2 | +... + | nK |. you can:

  • Search for strings:

    • Check if a string P of length m is a substring in O (m) time.
    • Find all z occurrences of the patterns P1,..., Pq of total length m as substrings in O (m + z) time.
    • Search for a regular expression P in time expected sublinear on n.
    • Find for each suffix of a pattern P, the length of the longest match between a prefix of P [I... m] and a substring in D in milliseconds (m) time. this is termed the matching statistics for P.
  • Find properties of the strings:
    • Find the longest common substrings of the string Si and Sj in seconds (ni + nj) time.
    • Find all maximal pairs, maximal repeats or supermaximal repeats in seconds (n + z) time.
    • Find the Lempel-Ziv decomposition in milliseconds (n) time.
    • Find the longest repeated substrings in seconds (n) time.
    • Find the most frequently occurring substrings of a minimum length in seconds (n) time.
    • Find the shortest strings from Σ that do not occur in D, in O (n + z) time, if there are z such strings.
    • Find the shortest substrings occurring only once in seconds (n) time.
    • Find, for each I, the shortest substrings of Si not occurring elsewhere in D in seconds (n) time.

The suffix tree can be prepared for constant time lowest common ancestor retrieval between nodes in between (n) time. You can then also:

  • Find the longest common prefix between the suffixes Si [p .. ni] and Sj [q .. nj in values (1 ).
  • Search for a pattern P of length m with at most k mismatches in O (kn + z) time, where z is the number of hits.
  • Find all z maximal palindromes in centers (n), or centers (gn) time if gaps of length g are allowed, or equals (kn) if k mismatches are allowed.
  • Find all z tandem repeats in O (nlogn + z), and k-mismatch tandem repeats in O (knlog (n/k) + z ).
  • Find the longest substrings common to at least k strings in D for k = 2 .. K in seconds (n) time.
The next step is to use C ++ or Python to build the Ukkonen method of the suffix tree.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.