Analysis of scale-up analysis on 7 species statistical screening of evolutionary trees and others, amplification of species
Preparations before analysis
# Go To The working directory cd example_PE250
Previous review: we have obtained an evolutionary analysis of the otu sequence and computed Alpha and Beta diversity values simultaneously. This section is the last section. We conduct classification statistics on species, and filter high abundance results for tree presentation, and other results for R statistical analysis to generate 19. the most important annotation information in The OTU table is the species annotation information. Generally, the annotation information of species can be divided into seven levels: border, gate, outline, category, family, genus, and species. Is the smallest level, similar to OTU but different. In addition to comparing the OTU horizontal differences between samples and groups, we can also study the differences at different similar levels and whether they have common variation patterns. Sorting and summarizing Based on the annotation level is a very troublesome process, whether it is Excel or R operations. Here we use the script summarize_taxa.py that comes with QIIME.
# The results are summarized by category, and category, the L2-L6summarize_taxa.py-I result/otu_table4.biom-o result/sum_taxa # summary each level percentage # modify the text header, the format of the table to be read by R is sed-I '/# Const/d; s/# OTU ID. // g'result/sum_taxa/* # format for R read # Use the door as an example to view the result less-S result/sum_taxa/otu_table4_L2.tx
Taking the door as an example, we can see that the OTU of the sample is distributed in 19 doors, and the relative proportion of each door in each sample. For other levels, you can view them yourself. The results in this step will be used for subsequent statistics and plotting. 20. We can find several beautiful evolutionary trees in the article, but there are usually hundreds of OTU trees. It would be ugly to see them directly. Below we will teach you some common ways to filter data and use it to generate beautiful evolutionary trees.
# Select region -- min_count_fraction 0.1%-I result/otu_table4.biom-o temp/region # to obtain the corresponding fasta sequence filter_fasta.py-f result/rep_seqs.fa-o temp/tax_rep_seqs.fa-fa- B temp/otu_table_k1.biom # count the number of sequences, 104, usually about 100, that is, B cells with big data, can also read and clarify the rules and details grep-c '> 'temp/tax_rep_seqs.fa #104 # multi-sequence comparison clustalo-I temp/tax_rep_seqs.fa-o temp/tax_rep_seqs_clus.fa -- seqtype = DNA -- full -- force -- threads = 30 # Build make_phylogeny.py-I temp/temp-o temp/tax_rep_seqs.tree # format conversion to available tree sed "s/'/g" temp/tax_rep_seqs.tree> result/tax_rep_seqs.tree # remove '# obtain the sequence IDgrep'> 'temp/tax_rep_seqs_clus.fa | sed's/> // G'> temp/tax_rep_seqs_clus.id # obtain the species comments of these sequences, used for tree coloring to display different classification information awk 'in in {OFS = "\ t "; FS = "\ t"} NR = FNR {a [$1] = $0} NR> FNR {print a [$1]} 'result/rep_seqs_tax_assignments.txt temp/tax_rep_seqs_clus.id | sed's /; /\ t/G' | cut-f 1-5 | sed's/p _/g; s/c _/g; s/o _/G'> result/tax_rep_seqs.tax
21. Others are some simple format transformations, preparing files for subsequent statistical analysis.
# Converting mappingfile to R-readable Experiment Design: sed's/# // 'ingingfile.txt> result/design.txt # converting the text otu_table format to R-readable sed '/# Const/d; s/# OTU // g; s/ID. // g'result/otu_table4.txt> result/otu_table.txt # The annotation information of the conversion species is separated by tabs, so that R can read sed's/;/\ t/g; s // G' result/rep_seqs_tax_assignments.txt> result/rep_seqs_tax.txt