Date/Time
Date(s) - 03/17/2014
4:00 pm
Abstract:
Biological data naturally lend themselves to string- and graph-based representations. As a result, combinatorial algorithms that use string and graph models have been at the forefront of driving discovery in life sciences and biomedical informatics. However, the goal of continued deployment of such analytical capabilities is currently being challenged by several Big Data attributes that have come to dominate the data-driven branch of life sciences. Not only are biological sequence and network databases reporting sustained exponential growth rates; but they have also grown in the complexity of information they encode. Consequently, novel algorithmic techniques that can tackle complexity and can lend themselves to be implemented in a scalable fashion are required. In this talk, I will focus on the on-going development of novel parallel approaches for clustering of high-throughput metagenomics data derived from environmental microbial communities. Clustering is a powerful operation that can be used to reveal tightly-knit communities within data that share common characteristics such as homology or function. In the context of metagenomics, clustering results can be used for identifying metabolic pathways and functional annotate microbial communities. The talk will present graph-theoretic models for clustering homology-based graphs, the design of novel and efficient algorithmic heuristics, and their parallelization on shared and distributed memory machines. Experimental results on metagenomics collections comprising of millions of sequences demonstrate significant qualitative improvements in the reported clustering, alongside orders of magnitude reduction in time-to-solution and near-linear scaling observed on thousands of cores of a distributed memory machine and tens of threads of a multicore machine. Time permitting, I will also discuss how our approach ideas can be extended to other big data problems within computational biology and associated challenges.
Short Bio:
Ananth Kalyanaraman is an Associate Professor at the School of Electrical Engineering and Computer Science in Washington State University. He received his Bachelor of Engineering from Visvesvaraya National Institute Technology in Nagpur (India) in 1998, and his MS and PhD from Iowa State University in 2002 and 2006, respectively. His main area of research interest is in high performance computational biology, with focus on developing algorithms that use high-performance computing for data-intensive problems originating from the areas of computational genomics and metagenomics. Ananth is a recipient of a DOE Early Career Award, Early Career Impact Award from Iowa State University, and two best paper awards. He has organized workshops and mini-symposia relating to high performance computational biology at IEEE, ACM and SIAM conferences, and regularly serves on a number of program committees and proposal panels. His research is currently funded by DOE, NSF, and USDA. Ananth is a member of AAAS, ACM, IEEE-CS, and ISCB.
ABSTRACT
Biological data naturally lend themselves to string- and graph-based representations. As a result, combinatorial algorithms that use string and graph models have been at the forefront of driving discovery in life sciences and biomedical informatics. However, the goal of continued deployment of such analytical capabilities is currently being challenged by several Big Data attributes that have come to dominate the data-driven branch of life sciences. Not only are biological sequence and network databases reporting sustained exponential growth rates; but they have also grown in the complexity of information they encode. Consequently, novel algorithmic techniques that can tackle complexity and can lend themselves to be implemented in a scalable fashion are required. In this talk, I will focus on the on-going development of novel parallel approaches for clustering of high-throughput metagenomics data derived from environmental microbial communities. Clustering is a powerful operation that can be used to reveal tightly-knit communities within data that share common characteristics such as homology or function. In the context of metagenomics, clustering results can be used for identifying metabolic pathways and functional annotate microbial communities. The talk will present graph-theoretic models for clustering homology-based graphs, the design of novel and efficient algorithmic heuristics, and their parallelization on shared and distributed memory machines. Experimental results on metagenomics collections comprising of millions of sequences demonstrate significant qualitative improvements in the reported clustering, alongside orders of magnitude reduction in time-to-solution and near-linear scaling observed on thousands of cores of a distributed memory machine and tens of threads of a multicore machine. Time permitting, I will also discuss how our approach ideas can be extended to other big data problems within computational biology and associated challenges.
BIO
Ananth Kalyanaraman is an Associate Professor at the School of Electrical Engineering and Computer Science in Washington State University. He received his Bachelor of Engineering from Visvesvaraya National Institute Technology in Nagpur (India) in 1998, and his MS and PhD from Iowa State University in 2002 and 2006, respectively. His main area of research interest is in high performance computational biology, with focus on developing algorithms that use high-performance computing for data-intensive problems originating from the areas of computational genomics and metagenomics. Ananth is a recipient of a DOE Early Career Award, Early Career Impact Award from Iowa State University, and two best paper awards. He has organized workshops and mini-symposia relating to high performance computational biology at IEEE, ACM and SIAM conferences, and regularly serves on a number of program committees and proposal panels. His research is currently funded by DOE, NSF, and USDA. Ananth is a member of AAAS, ACM, IEEE-CS, and ISCB.