When using tophat and cufflinks to calculate the expression level of RNA-SEQ data, it is necessary to combine their expression levels when a gene has multiple levels of expression in a sample.

In the Cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I g Et more than one row for the same gene? It's like in some cases the FPKM values from the transcripts corresponding to the same gene does not get summed, although th E transcripts is assigned to the same gene.

The multiple FPKM problem occurs when genes has transcripts that does not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/HG19. We are aware of this issue and would eventually change the behavior, but for now a simple solution was just to sum the fpkms Since the gene fpkms is just the sum of the transcript fpkms anyways.



