a.Raw data QC and clean up
Raw data is analyzed by FastQC to ensure all reads pass good quality filter. For each position a BoxWhisker type plot is drawn. The elements of the plot are as follows: The central red line is the median value; The yellow box represents the inter-quartile range (25-75%); The upper and lower whiskers represent the 10% and 90% points; The blue line represents the mean quality; The y-axis on the graph shows the quality scores. The higher the score the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see base calls falling into the orange area towards the end of a read.
b. Alignment to a reference with mapping statistics
Raw data is aligned to reference genome and mapping statistics is summarized in a table showing total reads, mapped reads, % mapped reads, and reads that are mapped in ribosomal, UTR, intronic, intergenic and mRNA sequence.
The plot above shows a good quality RNA seq library with even gene body coverage which is very important for gene fusion and splicing analysis.
This plot showed how two different library construction methods affect the reads distribution in rRNA, coding region, intron and intergenic region. Kapa method including the polyA enrichment and Nugen method did not include.
c.Gene and transcript-based quantitation, TPM/RPKM/FPKM-based quantitation, Raw hit count-based quantitation
d.Differentially Expressed Gene
Here is a example table showing differentially expressed gene after analysis. Based on total gene reads counts, data for each sample is normalized, and expression levels are quantified based on FPKM/RPKM or TPM. All significant differentially expressed genes are summarized in a Excel format, in which results are ranked based on fold change, mean-counts or p-value).
e.Clustering and PCA analysis
Heatmap showed how differentially expressed genes cluster together between different groups. PCA showed the samples in the same group cluster together.
Based on differentially expressed genes, IPA pathway analysis can help you to explore how endpoints of Canonical Pathways may be increased or decreased based on activation or inhibition of molecules within that pathway.
h. Upstream regulator and Gene interaction network
The network was algorithmically constructed by Ingenuity Pathway Analysis (IPA) software on the basis of the functional and biological connectivity of genes. The network is graphically represented as nodes (genes) and edges (the biological relationship between genes).
i.Alternative pre-mRNA splicing
m.fusion genes/transcript detection
n.Final project report with analysis methods, publication-ready graphics, and references