Keywords Cloud computing Hadoop mapreduce Gene Ontology gene annotation sequence alignment

Wu Haoyu of Nanjing Agricultural University

The main research work of this thesis is as follows: (1) This paper studies the theoretical basis of gene ontology and the application of Go ontology in bioinformatics, especially in the annotation of gene function. This paper analyzes the existing annotation methods of genetic data and the theoretical basis of functional annotation based on homology sequence similarity. (2) The process of gene function annotation based on sequence similarity ratio is studied. The role of scoring matrix and sequence alignment algorithm in discovering homology sequence is studied. The sequence alignment algorithms of point Matrix, Needleman-wunsch and Smith-waterman are studied and compared, and their performance is measured. (3) Creatively proposed the architecture of gene function annotation platform based on Hadoop. By integrating the Go database and other biological databases, the data center of local gene annotation was designed, and a conceptual model for functional annotation was designed to realize the association path between ontology and annotation information. (4) The BLASTP algorithm theory of protein database search algorithm is analyzed, and the running time of each phase of the algorithm is compared. Combining the MapReduce parallel processing framework of Hadoop and the demand of the algorithm in the gene annotation, a parallel protein comparison algorithm is designed CGABLASTP, which is proved by experiments to improve the rate of gene annotation in essence and adapt to the demand of the exponential growth of the bio-sequence.

