The CDS sequence is extracted from the GFF file and converted to the amino acid sequence.

Source: Internet
Author: User
Tags faa

The CDS sequence is extracted from the GFF file and converted to the amino acid sequence.

Recently on the principles of bioinformatics, I plan to record assignments in some classes. First assignment: as a question.

Basic Ideas:

1. Read the initial termination position and positive and negative chain information of CDS from GFF. GFF format see http://blog.sina.com.cn/s/blog_8a4f556e0102yd3l.html.

2. Extract the CDS sequence from the FNA file using information such as the start/end position. FNA format see http://boyun.sh.cn/bio? P = 1192.

3. Use the CDS sequence and codon table to obtain and output the FAA file.

Note: When the CDS in GFF is in the negative chain, base complementary pairing is required, that is, reverse complementing (5 'to 3' with 3 'to 5 ').

The python code is provided below. Python3.6

The following problems need to be solved: the comments of the obtained FNA file and the FAA file are incomplete, and the command line parameters are not provided, and the job is handed in and supplemented.

1 # bioinformatics homework 2 import re 3 class CDS2AA (): 4 Pa = re. compile (r's \ s + ') 5 Pa = re. compile (R' [TCAG] TG ') # bacterial start code NTG 6 def _ init _ (self, fna, gff): 7 self. fna = fna 8 self. gff = gff 9 def N2M (self, sequence): 10 hash = {'A': 'T', 't': 'A', 'C': 'G ', 'G': 'C'} 11 sequence = ''. join ([hash [I] for I in sequence]) # convert Positive and Negative chains to 12 return sequence [:-1] 13 def Get_CDS_index (self, line): 14 line = self. pa. spli T (line) 15 CDS = (line [0], line [3], line [4], line [6]) 16 return CDS17 def Seq2AA (self, sequence, hash): 18 AA = hash [sequence [: 3] 19 if self. pa. match (sequence [: 3]): 20 AA = 'M' # Start code 21 for I in range (3, len (sequence)-3, 3 ): 22 AA + = hash [sequence [I: I + 3] 23 return AA24 def CDS2AA (self): 25 fn = open (self. fna, 'R') 26 gf = open (self. gff, 'R') 27 r = open('AA_sequence.txt ', 'w') 28 w = open('CDS.txt', 'w') 2 9 hash_AA ={}# AA hash, sequence2AA30 with open('AA.txt ', 'R') as f: 31 for line in f: 32 line = line. strip () 33 if line: 34 line = self. pa. split (line) 35 hash_AA [line [0] = line [1] # AA hash36 hash_CDS ={}# CDS hash, CDS2sequence37 for line in fn: 38 line = line. strip () 39 if line. startswith ('>'): 40 A = self. pa. split (line) [0]. replace ('>', '') 41 hash_CDS [A] ='' 42 else: 43 hash_CDS [A] + = line44 fn. close () 45 for line in gf: 46 line = line. strip () 47 if 'cds' in line: 48 I = self. get_CDS_index (line) 49 sequence = hash_CDS [I [0] [int (I [1])-1: int (I [2])] 50 if I [3] = '-': 51 sequence = self. n2M (sequence) 52 # w. write (I [0] + '\ n' + sequence +' \ n') 53 s = self. pa. split (line) 54 p = re. compile (r'id = (. *?);. *? Dbxref = (.*?);. *? Name = (.*?);. *? Gbkey = (.*?);. *? Product = (.*?);. *? Protein_id = (.*?); ') 55 m = re. findall (p, line) 56 s = s [0] + '_' + m [0] [0] + m [0] [2] + '\ tdbxref =' + m [0] [1] + '\ tprotein =' + m [0] [4] + '\ tprotein_id =' + m [0] [5] + '\ tgbkey =' + m [0] [3] 57 w. write (s + '\ n' + sequence +' \ n') 58 AA = self. seq2AA (sequence, hash_AA) 59 r. write (I [0] + '\ n' + AA +' \ n') 60 w. close () 61 r. close () 62 63 if _ name __= = '_ main __': 64 fna = 'gca _ 000160075.2 _ ASM16007v2_genomic.fna '65 gff = 'gca _ 000160075.2 _ ASM16007v2_genomic.gff '66 m = CDS2AA (fna, gff) 67 m. CDS2AA ()

I will solve some other problems before submitting my homework. Interesting assignment questions will be uploaded one after another.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.