MEME (motif-based sequence analysis tools) instructions for use

Source: Internet
Author: User

MEME (motif-based sequence analysis tools) instructions for use2011-05-27 ~ ADMIN

Meme is a tool for searching the functional domain from a stack of sequences. For example, when you get a lot of chip-chip or chip-seq data, when you analyze the position of the peaks, you can get some of the sequences represented by these peaks, which are the fragments that the protein and DNA protect. So using meme to search for a very similar sequence fragment might be a functional domain.

Therefore, the input of meme must have at least one sequence file in Pearson/fasta format.

Command: Meme <dataset> [optional arguments]

Here <dataset> is the sequence file, must be pearson/fasta format, file format example:

          >icya_manse insecticyanin A FORM (BLUE biliprotein)          Gdifypgycpdvkpvndfdlsafagawheiak          LPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDA          >lacb_bovin Beta-lactoglobulin PRECURSOR (BETA-LG)           Mkclllalaltcgaqalivtqtmkgldi          qkvagtwyslamaasdislldaqsaplrvyveelkptpegdleillqkw

The Fasta file uses ">" to Annotate, followed by the first word as a sequence name followed by some descriptive text. The line is then serialized until the next comment symbol expires.

Meme can read the weights in the Fasta file. The weights are a single line of comments, starting with the >weight comment symbol, and note that weight must all be capitalized. followed by a number between 0~1. These numbers are arranged in the order of the sequence.

          >weights 0.5.5 1.0           >seq1          Gdifypgycpdvkpvndfdlsafagawheiak          >seq2          Gdmfcpgycpdvkpvgdfdlsafagawhelak          >seq3          qkvagtwyslamaasdislldaqsaplrvyveelkptpegdleillqkw

Here is a description of the relevant parameters of meme:

Output Position:

The default value is the meme_out/directory. If this directory does not exist, a new one will be created. The output file will have Meme.html,meme.xml, Meme.txt, meme.xsl and some logo images. Of course you can also set the output location.

    • -o <output dir> Output directory name, if it already exists, do not overwrite the directory;
    • -OC <output dir> Output directory name, if it already exists, overwrite the directory;
    • -text only output meme.txt files.

DNA or protein:

Meme can process DNA sequences and protein sequence files, but cannot process sequences of these two different formats at the same time. So you have to specify whether it's DNA or protein. Meme the default value is protein.

For DNA sequences, it can contain ACGT, and bdhkmnrsuvwy*-

For protein sequences, it can include Acdefghiklmnpqrstvwy, and buxz*-

The other characters, meme, are all converted to X (unknown).

    • -dna sequence as DNA sequence
    • -protein sequences for protein sequences

Functional Area Distribution :

In general, you must have some idea of the possible distribution of functional domains. The default value is that each feature field can appear at most one time in each sequence, or it does not appear.

    • -mod <string> Distribution Types
      • Oops each feature field appears once in each sequence and occurs only once. This mode is the fastest and most sensitive of operations. However, if not every sequence contains a functional domain, there may be incorrect results.
      • Zoops Each functional domain appears at most one time in each sequence, and may not occur. This mode of operation is faster, the sensitivity is slightly weaker.
      • ANR The number of occurrences of each functional field in each sequence is variable. This mode is the slowest operation and may take more than 10 times times more time. However, this parameter may be helpful in situations where the functional distribution is completely unknown.

One of the parameters associated with it is

    • -maxsites

, which will be discussed later.

Search options:

Log likelihood ration (LLR) and E-value are two important bases for meme to search for the sorting of functional domains. LLR = log (pr (sites | motif)/PR (sites | back)). Among them, Pr (sites | motif) is the possibility of each remnant of a functional domain sequence, assembled, is a distance matrix, called position-specific probability Matrix (PSPM). The PR (sites|back) is the possibility that the functional domain appears in the background sequence, if you provide a background sequence file, with the parameter

    • -bfile

Related.

The search options were:

    1. Number of functional domains
      • -nmotifs <n> Total Search for how many functional domains will stop. The default value is 1.
      • -evt <p> Stop the search if E-value is greater than <p>. The default value is infinity.
    2. Number of occurrences of the functional domain
      • -nsites <n>
        -minsites <n>
        -maxsites <n>
        After the nsites is set, meme searches for a functional domain and then stops searching for the functional domain and goes to the next functional domain search. The minimum and maximum numbers of occurrences are set by Minsites and Maxsites. The default value is-minsites:2,-maxsites:zoops: The total number of sequences, the total number of anr:5 multiples, or the minimum value between 50. For oops, these two parameters do not work. For ANR, if you do not set, the meme will search for up to 50 functional domains.
      • -wnsites <n> the weight setting for each search to the functional domain, between 0~1 [0..1]. The default value is 0.8.
    3. Functional area Length
      • W <n>
        -MINW <n>
        -maxw <n>
        The function field length value. If-W is specified, only the specified length of the functional domain is attempted. Otherwise, set the maximum and minimum values. The default value is-MINW 8,-MAXW 50
      • -nomatrim
        -WG <a>
        -ws <a>
        -noendgaps
        Parameters related to sequence alignment,-WG Gap deduction-ws Empty sub-deduction-noendgaps no deduction

    4. Background model
      • -bfile <bfile>
    5. Priority model
      • -PSP <pspfile>
        Background models and priority models can be generated using meme Suite's Psp-gen tools.
    6. The positive and negative meaning of DNA sequence and the possibility of palindrome
      • -revcomp whether to search the complementary chain, the default does not search, plus this parameter becomes the search
      • -pal priority palindrome structure, default does not search.
    7. Expectation maximization (EM) algorithm
    8. Expectation maximization (EM) initialization
    9. Expectation maximization (EM) Branch Search
      The above three groups, because they are more complex, will not tell.

MEME (motif-based sequence analysis tools) instructions for use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.