Unlocking the transcriptome with RNA-seq: a computational framework

Date(s) - 09/08/2014
4:00 pm

Jinze Liu, Ph.D., Associate Professor, Department of Computer Science, University of Kentucky


The transcriptome is the RNA molecules present in a cell and may vary in response to disease or environment. Recent studies in cancer have suggested the importance of transcriptionally altered loci as biomarkers for improved diagnosis or therapy.

High-throughput sequencing provides an unprecedented view of the transcriptome, making it possible to move beyond gene expression analysis to the study of alternative splicing.  Using Illumina’s RNA-seq protocol, for example, up to 150bp nucleotides on one or both ends of several hundred million random RNA fragments can be read, sampling both the diversity and abundance of RNA species (isoforms).  However identification and quantification of transcript isoforms is fraught with ambiguity as different isoforms often share sequences of nucleotides longer in size than the read length.  Furthermore, approaches based on a catalog of known isoforms can be misleading as the presence of novel and incomplete transcript isoforms confounds solutions.

To sidestep these issues, we have developed an ab initio framework for the detection and visualization of differential transcription, without the need of transcript or gene annotations. Our approach is built upon the expression-weighted splice graph (ESG), a highly compact representation of the transcriptome derived from RNA-seq. The ESG succinctly encapsulates observed diversity and abundance, and identifies regions within a gene contributing to these features.  We introduce algorithms for detecting differential splicing between samples and quantifying abundance differences at the level of alternative splicing modules in the ESG.  Software components implement the approach including MapSplice for accurate alignment of RNA-seq data and DiffSplice for the detection of differential transcription.  The suite of methods are agnostic about transcript structure, and are – in contrast to many other methods – robust to novel splicing, unexpected transcription, or structural variation of the subject genome.  While this talk will focus on the approach, some applications of the tools will be shown.

Short Bio: 

Dr. Liu is currently an associate professor in the department of Computer Science at the University of Kentucky.  She graduated with PhD from the University of North Carolina at Chapel Hill in 2006. She was a recipient of National Science Foundation early career award in 2011. Her areas of research include data mining and bioinformatics.