When using tophat and cufflinks to calculate the expression level of RNA-SEQ data, it is necessary to combine their expression levels when a gene has multiple levels of expression in a sample.
This code was a solution to collapsing duplicate fpkms for a gene.
Collapsefpkmthis code is a solution to collapsing duplicate fpkms for a gene
Problem/issue:
In the Cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I g Et more than one row for the same gene? It's like in some cases the FPKM values from the transcripts corresponding to the same gene does not get summed, although th E transcripts is assigned to the same gene.
Reasons and Solution:
The multiple FPKM problem occurs when genes has transcripts that does not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/HG19. We are aware of this issue and would eventually change the behavior, but for now a simple solution was just to sum the fpkms Since the gene fpkms is just the sum of the transcript fpkms anyways.
Url:
Https://sourceforge.net/projects/collapsefpkm/files/?source=navbar
Merge gene expression levels (merge gene expressions levels, FPKM)