Biopython Module
1. Assigning values to a sequence
Bio.Alphabet.IUPAC provides a basic definition of protein, DNA, and RNA, and provides the ability to extend and customize basic definitions.
1.1 DNA Alphabet
Basic Letter: Iupacunambiguousdna
Ambiguous letters of each possible: Iupacambiguousdna
Modified Base: Extendediupacdna
1.2 Protein alphabet
The basic Iupacprotein class contains 20 of the most common amino acids
The Extendediupacprotein class contains other amino acid elements in addition to 20 common amino acids
Such as:
>>> from BIO.SEQ import Seq
>>> from Bio.alphabet import IUPAC
>>> my_seq = seq ("AGTACACTGGT", Iupac.unambiguous_dna) #基本的DNA字母表
>>> My_seq
Seq (' Agtacactggt ', Iupacunambiguousdna ())
>>> My_seq.alphabet
Iupacunambiguousdna ()
2 Sequence Seq Method
1) String method
Len (MY_SEQ)
Iterations such as:
>>> for index, Enumerate (MY_SEQ):
.. print index, letter
Lower ()
Upper () Change with lowercase
Count () such as: Seq ("AAAA"). Count ("AA")
Slices such as: My_seq[0::3] Take the first base of the codon
STR () Converts a Sequence object into a string such as:
>>> Str (MY_SEQ)
' GATCGATGGGCCTATATAGGATCGAAAATCGC
2) complementary, direction complementary
Complement () #互补
Reverse_complement () #反向互补方法
>>> from BIO.SEQ import Seq
>>> from Bio.alphabet import IUPAC
>>> my_seq = seq ("GATCGATGGGCCTATATAGGATCGAAAATCGC", Iupac.unambiguous_dna)
>>> My_seq
Seq (' GATCGATGGGCCTATATAGGATCGAAAATCGC ', Iupacunambiguousdna ())
>>> my_seq.complement () #互补
Seq (' CTAGCTACCCGGATATATCCTAGCTTTTAGCG ', Iupacunambiguousdna ())
>>> my_seq.reverse_complement () #反向互补方法
Seq (' GCGATTTTCGATCCTATATAGGCCCATCGATC ', Iupacunambiguousdna ())
3) transcription
Transcribe () #将序列的T换成U, and adjust the alphabet for RNA, so it needs to be encoded in the chain with this method
The template chain needs to be reverse-complementary and then transcribed such as: Template_dna.reverse_complement (). Transcribe ()
>>> Coding_dna
Seq (' Atggccattgtaatgggccgctgaaagggtgcccgatag ', Iupacunambiguousdna ())
>>> Messenger_rna = Coding_dna.transcribe () #将
>>> Messenger_rna
Seq (' Auggccauuguaaugggccgcugaaagggugcccgauag ', Iupacunambiguousrna ())
4) Reverse transcription of U is T, convert Alphabet to DNA
Back_transcribe ()
5) Translate () translates dna/rna into a protein sequence and converts the protein alphabet
Standard genetic Code table ID 1, 1 by default
Mitochondrial sequence cipher table ID 2
Note: The termination password is translated into *
Translate only to the first terminating codon in the Reading box and then stop (this is more natural) then to_stop=true.
>>> Coding_dna.translate ()
Seq (' maivmgr*kgar* ', Hasstopcodon (Iupacprotein (), ' * '))
>>> coding_dna.translate (to_stop=true)
Seq (' Maivmgr ', Iupacprotein ())
>>> coding_dna.translate (table=2)
Seq (' maivmgrwkgar* ', Hasstopcodon (Iupacprotein (), ' * '))
>>> coding_dna.translate (table=2, To_stop=true)
Seq (' Maivmgrwkgar ', Iupacprotein ())
3.Seq objects are immutable and cannot be changed or deleted again, as needed before being converted to STR or MUTABLESEQ objects
Such as:
>>> mutable_seq = my_seq.tomutable ()
>>> Mutable_seq
Mutableseq (' GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA ', Iupacunambiguousdna ())
>>> new_seq = Mutable_seq.toseq ()
>>> New_seq
Seq (' AGCCCGTGGGAAAGTCGCCGGGTAATGCACCG ', Iupacunambiguousdna ())
4. The string can also be applied to the SEQ method, but with SEQ more canonical data format.
>>> from bio.seq import reverse_complement, transcribe, back_transcribe, translate
>>> my_string = "Gctgttatgggtcgttggaagggtggtcgtgctgctggttag"
>>> reverse_complement (my_string)
' CTAACCAGCAGCACGACCACCCTTCCAACGACCCATAACAGC '
>>> Transcribe (my_string)
' Gcuguuaugggucguuggaaggguggucgugcugcugguuag '
>>> Back_transcribe (my_string)
' Gctgttatgggtcgttggaagggtggtcgtgctgctggttag '
>>> Translate (my_string)
' Avmgrwkggraag* '
Reference Biopython Doc url:http://biopython-cn.readthedocs.io/zh_cn/latest/cn/chr03.html
Biopython module Processing SEQ Sequence method